Skip to content

Commit

Permalink
Merge pull request #55 from e10v/dev
Browse files Browse the repository at this point in the history
Multiple minor improvements
  • Loading branch information
e10v authored Apr 21, 2024
2 parents e9687db + 796799c commit 2e9551f
Show file tree
Hide file tree
Showing 12 changed files with 333 additions and 266 deletions.
70 changes: 51 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,21 @@
- [Delta method](https://alexdeng.github.io/public/files/kdd2018-dm.pdf) for ratio metrics.
- Variance reduction with [CUPED](https://exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf)/[CUPAC](https://doordash.engineering/2020/06/08/improving-experimental-power-through-control-using-predictions-as-covariate-cupac/) (also in combination with delta method for ratio metrics).
- Confidence interval for both absolute and percent change.
- Sample ratio mismatch check.

**tea-tasting** calculates statistics within data backends such as BigQuery, ClickHouse, PostgreSQL, Snowflake, Spark, and other of 20+ backends supported by [Ibis](https://ibis-project.org/). This approach eliminates the need to import granular data into a Python environment, though Pandas DataFrames are also supported.

**tea-tasting** is still in alpha, but already includes all the features listed above. The following features are coming soon:

- Sample ratio mismatch check.
- More statistical tests:
- Asymptotic and exact tests for frequency data.
- Bootstrap.
- Quantile test (using Bootstrap).
- Asymptotic and exact tests for frequency data.
- Mann–Whitney U test.
- Power analysis.
- A/A tests and simulations.
- Pretty output for experiment results (round etc.).
- Documentation on how to define metrics with custom statistical tests.
- Documentation with MkDocs and Material for MkDocs.
- More examples.
- More documentation and examples.

## Installation

Expand Down Expand Up @@ -66,10 +64,10 @@ In the following sections, each step of this process will be explained in detail
The `make_users_data` function creates synthetic data for demonstration purposes. This data mimics what you might encounter in an A/B test for an online store. Each row represents an individual user, with the following columns:

- `user`: The unique identifier for each user.
- `variant`: The specific variant (e.g., 0 or 1) assigned to the user in the A/B test.
- `sessions`: The total number of sessions by the user.
- `orders`: The total number of purchases made by the user.
- `revenue`: The total revenue generated from the user's purchases.
- `variant`: The specific variant (e.g., 0 or 1) assigned to each user in the A/B test.
- `sessions`: The total number of user's sessions.
- `orders`: The total number of user's orders.
- `revenue`: The total revenue generated by the user.

**tea-tasting** accepts data as either a Pandas DataFrame or an Ibis Table. [Ibis](https://ibis-project.org/) is a Python package which serves as a DataFrame API to various data backends. It supports 20+ backends including BigQuery, ClickHouse, DuckDB, Polars, PostgreSQL, Snowflake, Spark etc. You can write an SQL-query, [wrap](https://ibis-project.org/how-to/extending/sql#backend.sql) it as an Ibis Table and pass it to **tea-tasting**.

Expand Down Expand Up @@ -108,19 +106,19 @@ experiment = tt.Experiment(

Metrics are instances of metric classes which define how metrics are calculated. Those calculations include calculation of effect size, confidence interval, p-value and other statistics.

Use the `Mean` class to compare metric averages between variants of an A/B test. For example, average number of orders per user, where user is a randomization unit of an experiment. Specify the column containing the metric values using the first parameter `value`.
Use the `Mean` class to compare averages between variants of an A/B test. For example, average number of orders per user, where user is a randomization unit of an experiment. Specify the column containing the metric values using the first parameter `value`.

Use the `RatioOfMeans` class to compare ratios of metrics averages between variants of an A/B test. For example, average number of orders per average number of sessions. Specify the columns containing the numerator and denominator values using the parameters `numer` and `denom`.
Use the `RatioOfMeans` class to compare ratios of averages between variants of an A/B test. For example, average number of orders per average number of sessions. Specify the columns containing the numerator and denominator values using the parameters `numer` and `denom`.

Use the following parameters of `Mean` and `RatioOfMeans` to customize the analysis:

- `alternative`: Alternative hypothesis. The following options are available:
- `two-sided` (default): the means are unequal.
- `greater`: the mean in the treatment variant is greater than the mean in the control variant.
- `less`: the mean in the treatment variant is less than the mean in the control variant.
- `"two-sided"` (default): the means are unequal.
- `"greater"`: the mean in the treatment variant is greater than the mean in the control variant.
- `"less"`: the mean in the treatment variant is less than the mean in the control variant.
- `confidence_level`: Confidence level of the confidence interval. Default is `0.95`.
- `equal_var`: If `False` (default), assume unequal population variances in calculation of the standard deviation and the number of degrees of freedom. Otherwise, assume equal population variance and calculate pooled standard deviation.
- `use_t`: If `True` (default), use Student's t-distribution in p-value and confidence interval calculations. Otherwise use Normal distribution.
- `equal_var`: Defines whether equal variance is assumed. If `True`, pooled variance is used for the calculation of the standard error of the difference between two means. Default is `False`.
- `use_t`: Defines whether to use the Student's t-distribution (`True`) or the Normal distribution (`False`). Default is `True`.

Example usage:

Expand Down Expand Up @@ -176,7 +174,7 @@ The fields in the result depend on metrics. For `Mean` and `RatioOfMeans`, the f
- `rel_effect_size_ci_lower`: Lower bound of the relative effect size confidence interval.
- `rel_effect_size_ci_upper`: Upper bound of the relative effect size confidence interval.
- `pvalue`: P-value
- `statistic`: Statistic.
- `statistic`: Statistic (standardized effect size).

## More features

Expand Down Expand Up @@ -216,14 +214,48 @@ Define the metrics' covariates:
- In `Mean`, specify the covariate using the `covariate` parameter.
- In `RatioOfMeans`, specify the covariates for the numerator and denominator using the `numer_covariate` and `denom_covariate` parameters, respectively.

### Sample ratio mismatch check

The `SampleRatio` class in **tea-tasting** detects mismatches in the sample ratios of different variants of an A/B test.

Example usage:

```python
experiment = tt.Experiment(
sessions_per_user=tt.Mean("sessions"),
orders_per_session=tt.RatioOfMeans("orders", "sessions"),
orders_per_user=tt.Mean("orders"),
revenue_per_user=tt.Mean("revenue"),
sample_ratio=tt.SampleRatio(),
)
```

By default, `SampleRatio` expects equal number of observations across all variants. To specify a different ratio, use the `ratio` parameter. It accepts two types of values:

- Ratio of the number of observation in treatment relative to control, as a positive number. Example: `SampleRatio(0.5)`.
- A dictionary with variants as keys and expected ratios as values. Example: `SampleRatio({"A": 2, "B": 1})`.

The `method` parameter determines the statistical test to apply:

- `"auto"`: Apply exact binomial test if the total number of observations is less than 1000, or normal approximation otherwise.
- `"binom"`: Apply exact binomial test.
- `"norm"`: Apply normal approximation of the binomial distribution.

The result of the sample ratio mismatch includes the following attributes:

- `metric`: Metric name.
- `control`: Number of observations in control.
- `treatment`: Number of observations in treatment.
- `pvalue`: P-value

### Global settings

In **tea-tasting**, you can change defaults for the following parameters:

- `alternative`: Alternative hypothesis.
- `confidence_level`: Confidence level of the confidence interval.
- `equal_var`: If False, assume unequal population variances in calculation of the standard deviation and the number of degrees of freedom. Otherwise, assume equal population variance and calculate pooled standard deviation.
- `use_t`: If True, use Student's t-distribution in p-value and confidence interval calculations. Otherwise use Normal distribution.
- `equal_var`: If `False`, assume unequal population variances in calculation of the standard deviation and the number of degrees of freedom. Otherwise, assume equal population variance and calculate pooled standard deviation.
- `use_t`: If `True`, use Student's t-distribution in p-value and confidence interval calculations. Otherwise use Normal distribution.

Use `set_config` to set a global option value:

Expand Down
2 changes: 1 addition & 1 deletion src/tea_tasting/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@
from tea_tasting.config import config_context, get_config, set_config
from tea_tasting.datasets import make_sessions_data, make_users_data
from tea_tasting.experiment import Experiment
from tea_tasting.metrics import Mean, RatioOfMeans
from tea_tasting.metrics import Mean, RatioOfMeans, SampleRatio
from tea_tasting.version import __version__
51 changes: 29 additions & 22 deletions src/tea_tasting/aggr.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,66 +37,72 @@ def __init__(
var_: dict[str, float | int] = {}, # noqa: B006
cov_: dict[tuple[str, str], float | int] = {}, # noqa: B006
) -> None:
"""Create an object with aggregated statistics.
"""Aggregated statistics.
Args:
count_: Sample size.
mean_: Variables sample means.
var_: Variables sample variances.
cov_: Pairs of variables sample covariances.
count_: Sample size (number of observations).
mean_: Dictionary of sample means with variable names as keys.
var_: Dictionary of sample variances with variable names as keys.
cov_: Dictionary of sample covariances with pairs of variable names as keys.
"""
self.count_ = count_
self.mean_ = mean_
self.var_ = var_
self.cov_ = {_sorted_tuple(*k): v for k, v in cov_.items()}

def with_zero_div(self) -> Aggregates:
"""Return aggregates with values which can be divided by zero without error."""
"""Return aggregates, which don't raise an error on division by zero.
Division by zero returns:
nan if numerator == 0,
inf if numerator > 0,
-inf if numerator < 0.
"""
return Aggregates(
count_=None if self.count_ is None else tea_tasting.utils.Int(self.count_),
mean_={k: tea_tasting.utils.Float(v) for k, v in self.mean_.items()},
var_={k: tea_tasting.utils.Float(v) for k, v in self.var_.items()},
cov_={k: tea_tasting.utils.Float(v) for k, v in self.cov_.items()},
mean_={k: tea_tasting.utils.numeric(v) for k, v in self.mean_.items()},
var_={k: tea_tasting.utils.numeric(v) for k, v in self.var_.items()},
cov_={k: tea_tasting.utils.numeric(v) for k, v in self.cov_.items()},
)

def count(self) -> int:
"""Sample size.
"""Sample size (number of observations).
Raises:
RuntimeError: Count is None (it wasn't defined at init).
Returns:
Number of observations.
Sample size (number of observations).
"""
if self.count_ is None:
raise RuntimeError("Count is None.")
return self.count_

def mean(self, key: str | None) -> float | int:
def mean(self, name: str | None) -> float | int:
"""Sample mean.
Args:
key: Variable name.
name: Variable name.
Returns:
Sample mean.
"""
if key is None:
if name is None:
return 1
return self.mean_[key]
return self.mean_[name]

def var(self, key: str | None) -> float | int:
def var(self, name: str | None) -> float | int:
"""Sample variance.
Args:
key: Variable name.
name: Variable name.
Returns:
Sample variance.
"""
if key is None:
if name is None:
return 0
return self.var_[key]
return self.var_[name]

def cov(self, left: str | None, right: str | None) -> float | int:
"""Sample covariance.
Expand Down Expand Up @@ -248,18 +254,19 @@ def read_aggregates(
var_cols: Sequence[str],
cov_cols: Sequence[tuple[str, str]],
) -> dict[Any, Aggregates] | Aggregates:
"""Read aggregated statistics from an Ibis Table.
"""Read aggregated statistics from an Ibis Table or a Pandas DataFrame.
Args:
data: Ibis Table.
data: Granular data.
group_col: Column name to group by before aggregation.
If None, total aggregates are calculated.
has_count: If True, calculate the sample size.
mean_cols: Column names for calculation of sample means.
var_cols: Column names for calculation of sample variances.
cov_cols: Pairs of column names for calculation of sample covariances.
Returns:
Aggregated statistics from the Ibis Table.
Aggregated statistics.
"""
if isinstance(data, pd.DataFrame):
con = ibis.pandas.connect()
Expand Down
4 changes: 2 additions & 2 deletions src/tea_tasting/config.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Global config."""
"""Global configuration."""

from __future__ import annotations

Expand Down Expand Up @@ -28,7 +28,7 @@ def get_config(option: str | None = None) -> Any:
option: The option name.
Returns:
The value of the option if it's not None,
The option value if its name is not None,
or a dictionary with all options otherwise.
"""
if option is not None:
Expand Down
62 changes: 33 additions & 29 deletions src/tea_tasting/datasets.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Generates a sample of data for examples."""
"""Example datasets."""
# ruff: noqa: PLR0913

from __future__ import annotations
Expand Down Expand Up @@ -75,9 +75,9 @@ def make_users_data(
- user identifier,
- variant of the test,
- number of sessions by the user,
- number of orders made by the user,
- revenue generated from user's orders.
- number of user's sessions,
- number of user's orders,
- revenue generated by the user.
Optionally, pre-experimental data can be generated as well.
Expand All @@ -86,26 +86,28 @@ def make_users_data(
in addition to default columns.
seed: Random seed.
n_users: Number of users.
ratio: Ratio of treatment observations to control observations.
sessions_uplift: Relative sessions uplift in the treatment variant.
orders_uplift: Relative orders uplift in the treatment variant.
revenue_uplift: Relative revenue uplift in the treatment variant.
ratio: Ratio of the number of observation in treatment relative to control.
sessions_uplift: Sessions uplift in the treatment variant, relative to control.
orders_uplift: Orders uplift in the treatment variant, relative to control.
revenue_uplift: Revenue uplift in the treatment variant, relative to control.
avg_sessions: Average number of sessions per user.
avg_orders_per_session: Average number of orders per session.
Should be less than 1.
avg_revenue_per_order: Average revenue per order.
to_ibis: If True, return Ibis Table instead if Pandas DataFrame.
to_ibis: If True, return an Ibis Table instead of a Pandas DataFrame.
Returns:
An Ibis Table or a Pandas DataFrame with the following columns:
user: User identifier.
variant: Variant of the test. 0 is control, 1 is treatment.
sessions: Number of sessions.
orders: Number of orders.
revenue: Revenue.
sessions_covariate (optional): Number of sessions before the experiment.
orders_covariate (optional): Number of orders before the experiment.
revenue_covariate (optional): Revenue before the experiment.
sessions: Number of user's sessions.
orders: Number of user's orders.
revenue: Revenue generated by the user.
sessions_covariate (optional): Number of user's sessions
before the experiment.
orders_covariate (optional): Number of user's orders before the experiment.
revenue_covariate (optional): Revenue generated by the user
before the experiment.
"""
return _make_data(
covariates=covariates,
Expand Down Expand Up @@ -179,9 +181,9 @@ def make_sessions_data(
- user identifier,
- variant of the test,
- number of sessions by the user,
- number of orders made by the user,
- revenue generated from user's orders.
- number of user's sessions,
- number of user's orders,
- revenue generated by the user.
Optionally, pre-experimental data can be generated as well.
Expand All @@ -190,26 +192,28 @@ def make_sessions_data(
in addition to default columns.
seed: Random seed.
n_users: Number of users.
ratio: Ratio of treatment observations to control observations.
sessions_uplift: Relative sessions uplift in the treatment variant.
orders_uplift: Relative orders uplift in the treatment variant.
revenue_uplift: Relative revenue uplift in the treatment variant.
ratio: Ratio of the number of observation in treatment relative to control.
sessions_uplift: Sessions uplift in the treatment variant, relative to control.
orders_uplift: Orders uplift in the treatment variant, relative to control.
revenue_uplift: Revenue uplift in the treatment variant, relative to control.
avg_sessions: Average number of sessions per user.
avg_orders_per_session: Average number of orders per session.
Should be less than 1.
avg_revenue_per_order: Average revenue per order.
to_ibis: If True, return Ibis Table instead if Pandas DataFrame.
to_ibis: If True, return an Ibis Table instead of a Pandas DataFrame.
Returns:
An Ibis Table or a Pandas DataFrame with the following columns:
user: User identifier.
variant: Variant of the test. 0 is control, 1 is treatment.
sessions: Number of sessions.
orders: Number of orders.
revenue: Revenue.
sessions_covariate (optional): Number of sessions before the experiment.
orders_covariate (optional): Number of orders before the experiment.
revenue_covariate (optional): Revenue before the experiment.
sessions: Number of user's sessions.
orders: Number of user's orders.
revenue: Revenue generated by the user.
sessions_covariate (optional): Number of user's sessions
before the experiment.
orders_covariate (optional): Number of user's orders before the experiment.
revenue_covariate (optional): Revenue generated by the user
before the experiment.
"""
return _make_data(
covariates=covariates,
Expand Down
Loading

0 comments on commit 2e9551f

Please sign in to comment.