bayesian-ab-testing-lnl.Rpres


Bayesian A/B Testing
========================================================
author: Alex Yakubovich
date: July 9, 2014
transition: none

```{r, echo=F}
source('~/playfair/shiny/ab_key_metrics_builder/beta_binomial_ab_test.R')
```

Frequentist A/B Testing
========================================================
- **P-value**: Probability of observing the result of a test (or a more extreme one) if there is no effect
- Probability of data given model

----

![Sampling distribution under the null hypothesis](images/p-value.jpeg)


Problems
========================================================
- **No peeking**: unrealistic sample size requirements  
- Makes crude (but valid) approximations.
- Not all mistakes are created equal
- **Inflexible** and hard to interpert

Problems
========================================================
- **No peeking**: unrealistic sample size requirements  
- Makes crude (but valid) approximations.
- Not all mistakes are created equal
- **Inflexible** and hard to interpert

![One does not simply](images/one-does-not-simply-explain.png)

The Bayesian approach
========================================================
- Compute the **probability of the model given the data**
```{r, figs.only=TRUE, echo=FALSE,results='hide'}
alpha <- 4
beta <- 10
conf.level <- .1
curve(dbeta(x, alpha, beta), from=0, to=1, n=1e5, col='darkblue', lwd=5, xlab='True Day 1 Retention', ylab='density', xaxs="i", yaxs="i")
ci <- qbeta(c(conf.level/2, 1-conf.level/2), alpha, beta)
n <- 25
dx <- seq(ci[1], ci[2], length.out=n)
  polygon(c(ci[1], dx, ci[2]), c(0, dbeta(dx, alpha, beta), 0), col='lightgrey', border=NA)
  # draw a white vertical line on "inside" side to separate each section
  #segments(a, 0, a, dbeta(dx, alpha, beta), col="white")

text(.3, 1, labels='90%')

```


Prior Distribution
========================================================
Beliefs about a random variable before seeing the data.
```{r, figs.only=TRUE, echo=FALSE,results='hide'}
curve(dbeta(x, 22, 29), from=0, to=1, n=1e5, col='darkblue', lwd=2, xlab='True Day 1 Retention', ylab='density')
```

Beta Distribution
========================================================
![Beta distribution](images/beta-dist.png)

Likelihood Function
========================================================

Measures how likely it is that the data $D$ was generated by the model.
![Likelihood](images/likelihood.png)

Bayes Rule
========================================================

Tells us how to update our beliefs after seeing the data.

$$ P(W | D) \propto P(D|W) \times P (W)$$

* $W =$ world state 
* $D =$ observed data

Updating our beliefs (n=0)
========================================================
 
```{r, figs.only=TRUE, echo=FALSE,results='hide'}
alpha0 <- 22
beta0 <- 29
curve(dbeta(x, alpha0, beta0), from=0, to=1, n=1e5, col='darkblue', lwd=2, xlab='day 1 retention', ylab='density')
```

Changing our beliefs (n=10)
========================================================
```{r, figs.only=TRUE, echo=FALSE,results='hide'}
p <- .3
n <- 10
y <- round(n * p)

curve(dbeta(x, alpha0 + y, beta0+n-y), from=0, to=1, n=1e5, col='darkblue', lwd=2, xlab='day 1 retention', ylab='density')
```

Changing our beliefs (n=20)
========================================================
```{r, figs.only=TRUE, echo=FALSE,results='hide'}
n <- 20
y <- round(n * p)
curve(dbeta(x, alpha0 + y, beta0+n-y), from=0, to=1, n=1e5, col='darkblue', lwd=2, xlab='day 1 retention', ylab='density')
```

Changing our beliefs (n=30)
========================================================
```{r, figs.only=TRUE, echo=FALSE,results='hide'}
n <- 30
y <- round(n * p)
curve(dbeta(x, alpha0 + y, beta0+n-y), from=0, to=1, n=1e5, col='darkblue', lwd=2, xlab='day 1 retention', ylab='density')
```

Changing our beliefs (n=40)
========================================================
```{r, figs.only=TRUE, echo=FALSE,results='hide'}
n <- 40
y <- round(n * p)
curve(dbeta(x, alpha0 + y, beta0+n-y), from=0, to=1, n=1e5, col='darkblue', lwd=2, xlab='day 1 retention', ylab='density')
```

Changing our beliefs (n=50)
========================================================
```{r, figs.only=TRUE, echo=FALSE,results='hide'}
n <- 50
y <- round(n * p)
curve(dbeta(x, alpha0 + y, beta0+n-y), from=0, to=1, n=1e5, col='darkblue', lwd=2, xlab='day 1 retention', ylab='density')
```

Changing our beliefs (n=100)
========================================================
```{r, figs.only=TRUE, echo=FALSE,results='hide'}
n <- 100
y <- round(n * p)
curve(dbeta(x, alpha0 + y, beta0+n-y), from=0, to=1, n=1e5, col='darkblue', lwd=2, xlab='day 1 retention', ylab='density')
```

Changing our beliefs (n=1000)
========================================================
```{r, figs.only=TRUE, echo=FALSE,results='hide'}
n <- 1000
y <- round(n * p)
curve(dbeta(x, alpha0 + y, beta0+n-y), from=0, to=1, n=1e5, col='darkblue', lwd=2, xlab='day 1 retention', ylab='density')
```

Comparing two groups
========================================================

$$ \begin{align} \displaystyle P(\theta_B > \theta_A) &= \int \int_{\theta_B > \theta_A}P(\theta_A, \theta_B)  \\
 & = \int \int _{\theta_B > \theta_A} P(\theta_A | data) \times P(\theta_B | data) 
 \end{align}$$

```{r, figs.only=TRUE, echo=FALSE,results='hide'}
y <- c(5,20)
n <- c(100,100)
grid.resolution <- 512
pgrid <- seq(0, .8, length=grid.resolution+2)[-c(1,grid.resolution+2)] #(0,1) grid
dposterior <- function(pa, pb, alpha0,beta0, y, n) dbeta(pa,alpha0 + y[1],n[1]-y[1]+beta0) * dbeta(pb,alpha0 + y[2],n[2]-y[2]+beta0)
pdf_arr <- outer(pgrid, pgrid, dposterior, alpha0,beta0, y,n)
image(pdf_arr, xlab='True Day 1 retention (group A)', ylab='True Day 1 retention (group B)', xlim=c(0,.6), ylim=c(0,.6))
points(seq(0,1,length=10000), seq(0,1,length=10000), pch='.')
```

```{r, figs.only=TRUE, echo=FALSE,results='hide'}
y <- c(5,20)
n <- c(100,100)
alpha0 <- 22
beta0 <- 29
  for (g in 1:2)         
      curve(dbeta(x,alpha0 + y[g], n[g]-y[g]+beta0), from=0, to=1, n=1e3, col='darkblue', lwd=2, xlab='retention', ylab='density',add=g>1)  
#beta.binomial.ab.test(y,n)
```


When do we end a test?
========================================================

* **Risk:** How much many users/conversions/money do we expect we lose if we are wrong?
* Test is over as soon as the risk is below a threshold 


References
========================================================

- [Bayesian witch - Agile A/B Testing with Bayesian Statistics and Python](http://www.bayesianwitch.com/blog/2014/bayesian_ab_test.html)
- [Richrelevance - Bayesian A/B tests](http://engineering.richrelevance.com/bayesian-ab-tests/)
- [Richrelevance -  Bayesian analysis of Normal distributions with Python] (http://engineering.richrelevance.com/bayesian-analysis-of-normal-distributions-with-python/)
- [Richrelevance -  Bayesian A/B testing with a Lognormal model] (http://engineering.richrelevance.com/bayesian-ab-testing-with-a-log-normal-model/)
- [Swrve - A/B testing for game design iteration: a Bayesian approach] (http://www.gdcvault.com/play/1020201/A-B-Testing-for-Game)
- [Probabilistic Programming and Bayesian Methods for Hackers](http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/)