Skip to content

Commit

Permalink
Merge branch 'master' into dev/639_liudong
Browse files Browse the repository at this point in the history
  • Loading branch information
ras44 authored Jan 23, 2024
2 parents 2fc3c45 + 0040ac6 commit 5fd6523
Show file tree
Hide file tree
Showing 12 changed files with 832 additions and 297 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/python-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ jobs:
# You can use PyPy versions in python-version.
# For example, pypy3.10
matrix:
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11", "3.12"]
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"]

steps:
- uses: actions/checkout@v4
Expand Down
221 changes: 13 additions & 208 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
# Disclaimer
This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to change.


# Causal ML: A Python Package for Uplift Modeling and Causal Inference with ML

**Causal ML** is a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent
Expand All @@ -25,230 +26,34 @@ research [[1]](#Literature). It provides a standard interface that allows user t

* **Personalized engagement**: A company has multiple options to interact with its customers such as different product choices in up-sell or messaging channels for communications. One can use CATE to estimate the heterogeneous treatment effect for each customer and treatment option combination for an optimal personalized recommendation system.

The package currently supports the following methods

* **Tree-based algorithms**
* Uplift tree/random forests on KL divergence, Euclidean Distance, and Chi-Square [[2]](#Literature)
* Uplift tree/random forests on Contextual Treatment Selection [[3]](#Literature)
* Uplift tree/random forests on DDP [[4]](#Literature)
* Uplift tree/random forests on IDDP [[5]](#Literature)
* Interaction Tree [[6]](#Literature)
* Conditional Interaction Tree [[7]](#Literature)
* Causal Tree [[8]](#Literature) - Work-in-progress
* **Meta-learner algorithms**
* S-learner [[9]](#Literature)
* T-learner [[9]](#Literature)
* X-learner [[9]](#Literature)
* R-learner [[10]](#Literature)
* Doubly Robust (DR) learner [[11]](#Literature)
* TMLE learner [[12]](#Literature)
* **Instrumental variables algorithms**
* 2-Stage Least Squares (2SLS)
* Doubly Robust (DR) IV [[13]](#Literature)
* **Neural-network-based algorithms**
* CEVAE [[14]](#Literature)
* DragonNet [[15]](#Literature) - with `causalml[tf]` installation (see [Installation](#installation))


# Installation

Installation with `conda` is recommended.

`conda` environment files for Python 3.7, 3.8 and 3.9 are available in the repository. To use models under the `inference.tf` module (e.g. `DragonNet`), additional dependency of `tensorflow` is required. For detailed instructions, see below.

## Install using `conda`:

Install `conda` with:

```
wget https://repo.anaconda.com/miniconda/Miniconda3-py38_23.5.0-3-Linux-x86_64.sh
bash Miniconda3-py38_23.5.0-3-Linux-x86_64.sh -b
source miniconda3/bin/activate
conda init
source ~/.bashrc
```

### Install from `conda-forge`
Directly install from the conda-forge channel using conda.

```sh
conda install -c conda-forge causalml
```

### Install with the `conda` virtual environment
This will create a new `conda` virtual environment named `causalml-[tf-]py3x`, where `x` is in `[6, 7, 8, 9]`. e.g. `causalml-py37` or `causalml-tf-py38`. If you want to change the name of the environment, update the relevant YAML file in `envs/`

```bash
git clone https://github.com/uber/causalml.git
cd causalml/envs/
conda env create -f environment-py38.yml # for the virtual environment with Python 3.8 and CausalML
conda activate causalml-py38
(causalml-py38)
```

### Install `causalml` with `tensorflow`
```bash
git clone https://github.com/uber/causalml.git
cd causalml/envs/
conda env create -f environment-tf-py38.yml # for the virtual environment with Python 3.8 and CausalML
conda activate causalml-tf-py38
(causalml-tf-py38) pip install -U numpy # this step is necessary to fix [#338](https://github.com/uber/causalml/issues/338)
```

## Install from `PyPI`:

```bash
pip install causalml
```

### Install `causalml` with `tensorflow`
```bash
pip install causalml[tf]
pip install -U numpy # this step is necessary to fix [#338](https://github.com/uber/causalml/issues/338)
```

## Install from source:

### Create a clean conda environment

```
conda create -n causalml-py38 -y python=3.8
conda activate causalml-py38
conda install -c conda-forge cxx-compiler
conda install python-graphviz
conda install -c conda-forge xorg-libxrender
```

Then:

```bash
git clone https://github.com/uber/causalml.git
cd causalml
pip install .
python setup.py build_ext --inplace
```

with `tensorflow`:

```bash
pip install .[tf]
```
# Documentation

Documentation is available at:

# Quick Start
https://causalml.readthedocs.io/en/latest/about.html

## Average Treatment Effect Estimation with S, T, X, and R Learners

```python
from causalml.inference.meta import LRSRegressor
from causalml.inference.meta import XGBTRegressor, MLPTRegressor
from causalml.inference.meta import BaseXRegressor
from causalml.inference.meta import BaseRRegressor
from xgboost import XGBRegressor
from causalml.dataset import synthetic_data

y, X, treatment, _, _, e = synthetic_data(mode=1, n=1000, p=5, sigma=1.0)

lr = LRSRegressor()
te, lb, ub = lr.estimate_ate(X, treatment, y)
print('Average Treatment Effect (Linear Regression): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

xg = XGBTRegressor(random_state=42)
te, lb, ub = xg.estimate_ate(X, treatment, y)
print('Average Treatment Effect (XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

nn = MLPTRegressor(hidden_layer_sizes=(10, 10),
learning_rate_init=.1,
early_stopping=True,
random_state=42)
te, lb, ub = nn.estimate_ate(X, treatment, y)
print('Average Treatment Effect (Neural Network (MLP)): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

xl = BaseXRegressor(learner=XGBRegressor(random_state=42))
te, lb, ub = xl.estimate_ate(X, treatment, y, e)
print('Average Treatment Effect (BaseXRegressor using XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

rl = BaseRRegressor(learner=XGBRegressor(random_state=42))
te, lb, ub = rl.estimate_ate(X=X, p=e, treatment=treatment, y=y)
print('Average Treatment Effect (BaseRRegressor using XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
```

See the [Meta-learner example notebook](https://github.com/uber/causalml/blob/master/docs/examples/meta_learners_with_synthetic_data.ipynb) for details.


## Interpretable Causal ML

Causal ML provides methods to interpret the treatment effect models trained as follows:

### Meta Learner Feature Importances

```python
from causalml.inference.meta import BaseSRegressor, BaseTRegressor, BaseXRegressor, BaseRRegressor
from causalml.dataset.regression import synthetic_data

# Load synthetic data
y, X, treatment, tau, b, e = synthetic_data(mode=1, n=10000, p=25, sigma=0.5)
w_multi = np.array(['treatment_A' if x==1 else 'control' for x in treatment]) # customize treatment/control names

slearner = BaseSRegressor(LGBMRegressor(), control_name='control')
slearner.estimate_ate(X, w_multi, y)
slearner_tau = slearner.fit_predict(X, w_multi, y)

model_tau_feature = RandomForestRegressor() # specify model for model_tau_feature

slearner.get_importance(X=X, tau=slearner_tau, model_tau_feature=model_tau_feature,
normalize=True, method='auto', features=feature_names)

# Using the feature_importances_ method in the base learner (LGBMRegressor() in this example)
slearner.plot_importance(X=X, tau=slearner_tau, normalize=True, method='auto')

# Using eli5's PermutationImportance
slearner.plot_importance(X=X, tau=slearner_tau, normalize=True, method='permutation')
# Installation

# Using SHAP
shap_slearner = slearner.get_shap_values(X=X, tau=slearner_tau)
Installation instructions are available at:

# Plot shap values without specifying shap_dict
slearner.plot_shap_values(X=X, tau=slearner_tau)
https://causalml.readthedocs.io/en/latest/installation.html

# Plot shap values WITH specifying shap_dict
slearner.plot_shap_values(X=X, shap_dict=shap_slearner)

# interaction_idx set to 'auto' (searches for feature with greatest approximate interaction)
slearner.plot_shap_dependence(treatment_group='treatment_A',
feature_idx=1,
X=X,
tau=slearner_tau,
interaction_idx='auto')
```
<div align="center">
<img width="629px" height="618px" src="https://raw.githubusercontent.com/uber/causalml/master/docs/_static/img/shap_vis.png">
</div>
# Quickstart

See the [feature interpretations example notebook](https://github.com/uber/causalml/blob/master/docs/examples/feature_interpretations_example.ipynb) for details.
Quickstarts with code-snippets are available at:

### Uplift Tree Visualization
https://causalml.readthedocs.io/en/latest/quickstart.html

```python
from IPython.display import Image
from causalml.inference.tree import UpliftTreeClassifier, UpliftRandomForestClassifier
from causalml.inference.tree import uplift_tree_string, uplift_tree_plot

uplift_model = UpliftTreeClassifier(max_depth=5, min_samples_leaf=200, min_samples_treatment=50,
n_reg=100, evaluationFunction='KL', control_name='control')
# Example Notebooks

uplift_model.fit(df[features].values,
treatment=df['treatment_group_key'].values,
y=df['conversion'].values)
Example notebooks are available at:

graph = uplift_tree_plot(uplift_model.fitted_uplift_tree, features)
Image(graph.create_png())
```
<div align="center">
<img width="800px" height="479px" src="https://raw.githubusercontent.com/uber/causalml/master/docs/_static/img/uplift_tree_vis.png">
</div>
https://causalml.readthedocs.io/en/latest/examples.html

See the [Uplift Tree visualization example notebook](https://github.com/uber/causalml/blob/master/docs/examples/uplift_tree_visualization.ipynb) for details.

# Contributing

Expand Down
1 change: 1 addition & 0 deletions causalml/metrics/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,4 @@
SensitivitySubsetData,
SensitivitySelectionBias,
) # noqa
from maq import MAQ, get_ipw_scores # noqa
42 changes: 22 additions & 20 deletions causalml/metrics/visualize.py
Original file line number Diff line number Diff line change
Expand Up @@ -848,7 +848,7 @@ def qini_score(
return (qini.sum(axis=0) - qini[RANDOM_COL].sum()) / qini.shape[0]


def plot_ps_diagnostics(df, covariate_col, treatment_col="w", p_col="p"):
def plot_ps_diagnostics(df, covariate_col, treatment_col="w", p_col="p", bal_tol=0.1):
"""Plot covariate balances (standardized differences between the treatment and the control)
before and after weighting the sample using the inverse probability of treatment weights.
Expand All @@ -865,40 +865,42 @@ def plot_ps_diagnostics(df, covariate_col, treatment_col="w", p_col="p"):
IPTW = get_simple_iptw(W, PS)

diffs_pre = get_std_diffs(X, W, weighted=False)
num_unbal_pre = (np.abs(diffs_pre) > 0.1).sum()[0]
num_unbal_pre = (np.abs(diffs_pre) > bal_tol).sum()[0]

diffs_post = get_std_diffs(X, W, IPTW, weighted=True)
num_unbal_post = (np.abs(diffs_post) > 0.1).sum()[0]
num_unbal_post = (np.abs(diffs_post) > bal_tol).sum()[0]

diff_plot = _plot_std_diffs(diffs_pre, num_unbal_pre, diffs_post, num_unbal_post)
diff_plot = _plot_std_diffs(
diffs_pre, num_unbal_pre, diffs_post, num_unbal_post, bal_tol=bal_tol
)

return diff_plot


def _plot_std_diffs(diffs_pre, num_unbal_pre, diffs_post, num_unbal_post):
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 10), sharex=True, sharey=True)
def _plot_std_diffs(diffs_pre, num_unbal_pre, diffs_post, num_unbal_post, bal_tol=0.1):
fig, ax1 = plt.subplots()

color = "#EA2566"

sns.stripplot(diffs_pre.iloc[:, 0], diffs_pre.index, ax=ax1)
ax1.set_xlabel(
"Before. Number of unbalanced covariates: {num_unbal}".format(
num_unbal=num_unbal_pre
),
fontsize=14,
sds_pre = pd.DataFrame(
{"std_diff": diffs_pre[0], "covariate": diffs_pre.index, "prepost": "pre"}
)
ax1.axvline(x=-0.1, ymin=0, ymax=1, color=color, linestyle="--")
ax1.axvline(x=0.1, ymin=0, ymax=1, color=color, linestyle="--")
sds_post = pd.DataFrame(
{"std_diff": diffs_post[0], "covariate": diffs_post.index, "prepost": "post"}
)

sds = pd.concat([sds_pre, sds_post], ignore_index=True)

sns.stripplot(diffs_post.iloc[:, 0], diffs_post.index, ax=ax2)
ax2.set_xlabel(
"After. Number of unbalanced covariates: {num_unbal}".format(
num_unbal=num_unbal_post
sns.stripplot(data=sds, x="std_diff", y="covariate", hue="prepost", ax=ax1)

ax1.set_xlabel(
"Pre/Post Number of unbalanced covariates: {num_unbal_pre}/{num_unbal_post}".format(
num_unbal_pre=num_unbal_pre, num_unbal_post=num_unbal_post
),
fontsize=14,
)
ax2.axvline(x=-0.1, ymin=0, ymax=1, color=color, linestyle="--")
ax2.axvline(x=0.1, ymin=0, ymax=1, color=color, linestyle="--")
ax1.axvline(x=-bal_tol, ymin=0, ymax=1, color=color, linestyle="--", lw=2)
ax1.axvline(x=bal_tol, ymin=0, ymax=1, color=color, linestyle="--", lw=2)

fig.suptitle("Standardized differences in means", fontsize=16)

Expand Down
Loading

0 comments on commit 5fd6523

Please sign in to comment.