-
Notifications
You must be signed in to change notification settings - Fork 22
/
Copy pathtwo_armed_bandit.Rmd
304 lines (237 loc) · 12.8 KB
/
two_armed_bandit.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
---
title: "Bayes' Rule and the Two Armed Bandit"
runtime: shiny
output: statsr:::statswithr_lab
---
```{r echo=FALSE, message=FALSE}
library(tidyverse)
library(statsr)
```
<div class="instructions">
Complete all **Exercises**, and submit answers to **Questions** in the **Quiz: Week 1 Lab** on Coursera.
</div>
## Background
Some people refer to slot machines as "One-armed Bandits" due to the older style
of machine requiring the player to pull a mechanical handle to play. Statisticians
and mathematicians often develop theories / models based on games of chance which
turn out to be more generally useful. One general class of probability / optimization
problems is known as the multi-armed bandit problem which is based on the following
analogy: A player walks into a casino and sees a wall of slot machines. All of the
machines pay out at different rates - some pay out more often than others, some
pay out less often. Since the player does not know which machines are "good" and
which are "bad", how should he / she play so as to make as much money (or at least
lose as little) as possible?
## Simulated Slots
Today we will examine a simplified case where there are only two machines (one "Good"
and one "Bad"). We will also assume that we know the probability of winning on
the "Good" machine and the probability of winning on the "Bad" machine - what we
don't know is which machine is which.
The Shiny App below will let you simulate playing slot machines when the
probability of winning on the "good" machine is 1/2 and the probability of
winning on the "bad" machine is 1/3. Each time you play, the App will "flip a
coin" and randomly select either Machine 1 or Machine 2 to be the "good" machine,
and the other to be the "bad" machine.
```{r bandit, echo=FALSE, error=TRUE}
bandit_sim()
```
<div class="task exercise">
Use the Shiny App above to play the slot machines a total of *10* times. You can
divide your plays up however you like between the two machines. Once you have
played 10 times, use the results of your plays to decide which machine you think
has the better payout (i.e. the good machine) - click the button on the right
that corresponds to your guess, the App will tell you if you are right. If you
were right or wrong, press the reset button and play again and guess which
machine you think is the good machine. As you are playing, think about what it
is about your results that enabled you make the correct guess.
</div>
<div class="task exercise">
Press the Reset button again, now play *30* times and use those results to guess
which machine is the good one. Do you think it was easier or harder to make a
decision with the additional plays? Why do you think that is?
</div>
Hopefully what you have observed is that as you played the slot machine, initially
it was difficult to determine which machine was which. But as you played more, it
became more and more clear. In particular, each time you played you naturally
reassessed which machine you thought was good. With the initial handful of plays,
your beliefs stayed close to 50-50, potentially with a small bias towards the
machine you had won more on. By the time you got to 30 plays you should have had
a very strong belief about which machine was the "good" one.
This is the way in which we usually interact with the world - we try something
and modify our mental model based on the outcome we have received. This idea of
updating beliefs based on observed data is one of the core tenets of Bayesian
statistics - in the following sections we will work through the probability
calculations and see how they correspond with our intuitive understanding.
## Posterior Probabilities
We will start by examining the result of playing just once. Imagine that you
play Machine 1 and you win, what do we now know about the probability of the two
machines being "good" or "bad"? It is reasonable to believe that each machine is
equally likely to be the "good" machine, we can express our belief as follow:
$$P(M_1 \text{ is Good})=P(M_2 \text{ is Bad})=1/2$$
$$P(M_1 \text{ is Bad})=P(M_2 \text{ is Good})=1/2.$$
We have also been told that the probabilities of winning for each type of
machine are:
$$P(\text{Win on }M_1 ~|~ M_1 \text{ is Good}) = 1/2 \qquad P(\text{Win on }M_1 ~|~ M_1 \text{ is Bad}) = 1/3.$$
We can use these probabilities to calculate the posterior probabilities of
losing for each type of machine:
$$P(\text{Lose on }M_1 ~|~ M_1 \text{ is Good}) = 1/2 \qquad P(\text{Lose on }M_1 ~|~ M_1 \text{ is Bad}) = 2/3.$$
Note that while these probabilities are all for Machine 1, they are exactly the
same as the probabilities for Machine 2. We have seen how we can use Bayes' rule
to calculate $P(M_1 \text{ is Good} ~|~ \text{Win on } M_1)$
$$
\begin{aligned}
P(M_1 \text{ is Good} ~|~ \text{Win on } M_1)
&= \frac{P(\text{Win on } M_1 ~|~ M_1 \text{ is Good})~P(M_1 \text{ is Good})}{P(\text{Win on } M_1)} \\
& = \\
&= \frac{P(\text{Win on } M_1 ~|~ M_1 \text{ is Good})~P(M_1 \text{ is Good})}{P(\text{Win on } M_1 ~|~ M_1 \text{ is Good})~P(M_1 \text{ is Good})+P(\text{Win on } M_1 ~|~ M_1 \text{ is Bad})~P(M_1 \text{ is Bad})} \\
& = \\
&= \frac{1/2 \times 1/2}{1/2 \times 1/2+1/3 \times 1/2} = 0.6
\end{aligned}
$$
<div class="question">
Based on the preceding result, what is the probability that Machine 1 is "Bad"
given you won a game playing on Machine 1?
* 0.3
* 0.4
* 0.5
* 0.6
* 0.7
</div>
<div class="question">
Based on the preceding result, what is the probability that Machine 2 is "Good"
given you won a game playing on Machine 1?
* 0.3
* 0.4
* 0.5
* 0.6
* 0.7
</div>
<div class="question">
Under the Bayesian paradigm, which of the following correctly matches the
probabilities with their names?
* Posterior - $P(M_1 \text{ is Good} ~|~ \text{Win on } M_1)$ <br/> Prior - $P(M_1 \text{ is Good})$ <br/> Likelihood - $P(\text{Win on } M_1 ~|~ M_1 \text{ is Good})$
* Posterior - $P(M_1 \text{ is Good} ~|~ \text{Win on } M_1)$ <br/> Prior - $P(\text{Win on } M_1 ~|~ M_1 \text{ is Good})$ <br/> Likelihood - $P(M_1 \text{ is Good})$
* Posterior - P(\text{Win on } M_1 ~|~ M_1 \text{ is Good})$ <br/> Prior - P(M_1 \text{ is Good} ~|~ \text{Win on } M_1)$ <br/> Likelihood - $P(M_1 \text{ is Good})$
* Posterior - $P(\text{Win on } M_1 ~|~ M_1 \text{ is Good})$ <br/> Prior - $P(M_1 \text{ is Good})$ <br/> Likelihood - $P(M_1 \text{ is Good} ~|~ \text{Win on } M_1)$
</div>
## Bayesian Updating
We have implemented a function for calculating the posterior probability of
Machine 1 and Machine 2 being the "good" machine after one or more plays of
either machine. The function `bandit_posterior` expects a data frame representing
your play history that contains two columns, `machine` which records which machine
was played (e.g. either a 1 or 2) and `outcome` which records whether you won
(`"W"`) or lost (`"L"`). An optional parameter to `bandit_posterior` is `prior`,
a vector of length two that specifies the prior probability of each machine
being "good". If left unspecified, equal prior probabilities (0.5, 0.5) are
assumed. We can repeat the calculation from the previous section using the
following code in the RStudio Console:
```{r}
bandit_posterior(data = data.frame(machine = 1, outcome = "W"))
```
We can also use this function to calculate the posterior probabilities for
additional plays, for example playing Machine 1 twice, first winning and then
losing.
```{r}
bandit_posterior(data = data.frame(machine = c(1,1), outcome = c("W","L")))
```
We have discussed how the Bayesian approach allows for updating procedures where
for each new data observation we are able to use the previous posterior
probabilities as our new prior probabilities and thereby simplify the calculation
(e.g. multiple simple updates can be used instead of one single large calculation).
We can explore this process by **chaining** multiple calls to `bandit_posterior`
together using `%>%`. With `prior = .` in the next call of `bandit_posterior`,
we use the returned posterior values as the prior.
```{r}
# Initiate the two plays on Machine 1
data1 <- data.frame(machine = c(1), outcome = c("W"))
data2 <- data.frame(machine = c(1), outcome = c("L"))
# Use piping and bandit_posterior to calculuate the posterior probabilities after each play
bandit_posterior(data1) %>%
bandit_posterior(data2, prior = .)
```
Note that this exactly matches the probabilities we calculated when we provided
the outcome of two plays all at once.
<div class="question">
Use the `bandit_posterior` function to calculate the posterior probabilities of
Machine 1 and 2 being "good" after playing Machine 1 twice and winning both
times, and then playing Machine 2 three times with 2 wins then 1 loss.
* $P(M_1\text{ is good}~|~\text{data}) = 0.250$, <br/> $P(M_2\text{ is good}~|~\text{data}) = 0.750$
* $P(M_1\text{ is good}~|~\text{data}) = 0.429$, <br/> $P(M_2\text{ is good}~|~\text{data}) = 0.571$
* $P(M_1\text{ is good}~|~\text{data}) = 0.571$, <br/> $P(M_2\text{ is good}~|~\text{data}) = 0.429$
* $P(M_1\text{ is good}~|~\text{data}) = 0.750$, <br/>$P(M_2\text{ is good}~|~\text{data}) = 0.250$
</div>
```{r Q4}
# Type your code for Question 4 here
```
<div class="question">
What would the posterior probabilities be if we had instead played Machine 2
first, playing three times with 2 wins and 1 loss, and then playing Machine 1
twice and winning both times?
* $P(M_1\text{ is good}~|~\text{data}) = 0.250$,<br/> $P(M_2\text{ is good}~|~\text{data}) = 0.750$
* $P(M_1\text{ is good}~|~\text{data}) = 0.429$,<br/> $P(M_2\text{ is good}~|~\text{data}) = 0.571$
* $P(M_1\text{ is good}~|~\text{data}) = 0.571$,<br/> $P(M_2\text{ is good}~|~\text{data}) = 0.429$
* $P(M_1\text{ is good}~|~\text{data}) = 0.750$,<br/> $P(M_2\text{ is good}~|~\text{data}) = 0.250$
</div>
```{r Q5}
# Type your code for Question 5 here
```
<div class="exercise">
Confirm the updating property we discussed previously by connecting two calls of
`bandit_posterior` in Question 4. The first call calculates the posterior
probability for the first two plays on Machine 1. The second call should use
these values as its prior and then calculate a new posterior using the data from
the subsequent three plays on Machine 2.
</div>
```{r E3}
# Type your code for Exercise 3 here
```
## Back to the Bandits
You may have notice that if you click on the Data tab in the middle of the App
above you are given code for a data frame that represents the results of your
plays within the machine.
<div class="exercise">
**Exercise 4**: Use this data frame with the `bandit_posterior` function to
calculate the exact posterior probability of each machine being "good". Do these
probabilities match with your intuition about which machine was good?
</div>
```{r E4}
# Type your code for Exercise 4 here
```
<div class="exercise">
Reset the simulation and then play at least *50* times. Suppose you pause after
every 10 plays and record the results of that 10 plays, that is, subset the data
into `data[1:10]`, `data[11:20]` and so on to represent the results of each 10
plays. Then use the chaining method we discussed earlier to **update** the
posterior probability **sequentially** after we have finished the next 10 plays.
Observe how the posterior probability changes as more plays are made.
</div>
```{r E5}
# Type your code for Exercise 5 here
```
We can visualize how these posterior probabilities update using the `plot_bandit_posterior` function. This function calculates and plots the
posterior probability after each play. Here is an example to visualize the
posterior probability of the two machines.
```{r}
# Store play results for each Machine into `data`
data <- data.frame(machine = c(rep(1, 20), rep(2, 20)),
outcome = c("L", "W", "W", "W", "L", "L", "L", "W", "W", "L",
"L", "W", "W", "W", "W", "L", "W", "L", "L", "L",
"W", "L", "L", "W", "L", "L", "L", "W", "L", "W",
"L", "L", "W", "L", "L", "L", "W", "W", "L", "W"))
# Calculate posterior probabilities after each play and plot the result
plot_bandit_posterior(data)
```
<div class="exercise">
Plot the result of your last 50 plays using the code `plot_bandit_posterior(data)` as shown in the example above, describe the pattern you see for the two posterior probabilities.
</div>
```{r E6}
# Type your code for Exercise 6 here
```
<div class="question">
From the plot generated in the example above, we can see that the posterior probabilities for Machine 1 and Machine 2 mirror each other. Why will this happen?
* $P(M_1~|~\text{data})$ and $P(M_2~|~\text{data})$ are complementary
* Machine 1 and Machine 2 being "good" are mutually exclusive events
* Both of the above
</div>
<div class="license">
This work is licensed under [GNU General Public License v3.0](https://www.gnu.org/licenses/quick-guide-gplv3.html).
</div>