Merge branch 'master' into dev/639_liudong

uber · Jan 23, 2024 · 5fd6523 · 5fd6523
2 parents 2fc3c45 + 0040ac6
commit 5fd6523
Show file tree

Hide file tree

Showing 12 changed files with 832 additions and 297 deletions.
diff --git a/.github/workflows/python-test.yaml b/.github/workflows/python-test.yaml
@@ -10,7 +10,7 @@ jobs:
       # You can use PyPy versions in python-version.
       # For example, pypy3.10
       matrix:
-        python-version: ["3.7", "3.8", "3.9", "3.10", "3.11", "3.12"]
+        python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"]
 
     steps:
       - uses: actions/checkout@v4

diff --git a/README.md b/README.md
@@ -14,6 +14,7 @@
 # Disclaimer
 This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to change.
 
+
 # Causal ML: A Python Package for Uplift Modeling and Causal Inference with ML
 
 **Causal ML** is a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent
@@ -25,230 +26,34 @@ research [[1]](#Literature). It provides a standard interface that allows user t
 
 * **Personalized engagement**: A company has multiple options to interact with its customers such as different product choices in up-sell or messaging channels for communications. One can use CATE to estimate the heterogeneous treatment effect for each customer and treatment option combination for an optimal personalized recommendation system.
 
-The package currently supports the following methods
-
-* **Tree-based algorithms**
-    * Uplift tree/random forests on KL divergence, Euclidean Distance, and Chi-Square [[2]](#Literature)
-    * Uplift tree/random forests on Contextual Treatment Selection [[3]](#Literature)
-    * Uplift tree/random forests on DDP [[4]](#Literature)
-    * Uplift tree/random forests on IDDP [[5]](#Literature)
-    * Interaction Tree [[6]](#Literature)
-    * Conditional Interaction Tree [[7]](#Literature)
-    * Causal Tree [[8]](#Literature) - Work-in-progress
-* **Meta-learner algorithms**
-    * S-learner [[9]](#Literature)
-    * T-learner [[9]](#Literature)
-    * X-learner [[9]](#Literature)
-    * R-learner [[10]](#Literature)
-    * Doubly Robust (DR) learner [[11]](#Literature)
-    * TMLE learner [[12]](#Literature)
-* **Instrumental variables algorithms**
-    * 2-Stage Least Squares (2SLS)
-    * Doubly Robust (DR) IV [[13]](#Literature)
-* **Neural-network-based algorithms**
-    * CEVAE [[14]](#Literature)
-    * DragonNet [[15]](#Literature) - with `causalml[tf]` installation (see [Installation](#installation))
-
-
-# Installation
-
-Installation with `conda` is recommended.
-
-`conda` environment files for Python 3.7, 3.8 and 3.9 are available in the repository. To use models under the `inference.tf` module (e.g. `DragonNet`), additional dependency of `tensorflow` is required. For detailed instructions, see below.
-
-## Install using `conda`:
-
-Install `conda` with:
-
-```
-wget https://repo.anaconda.com/miniconda/Miniconda3-py38_23.5.0-3-Linux-x86_64.sh
-bash Miniconda3-py38_23.5.0-3-Linux-x86_64.sh -b
-source miniconda3/bin/activate 
-conda init
-source ~/.bashrc 
-```
-
-### Install from `conda-forge`
-Directly install from the conda-forge channel using conda.
-
-```sh
-conda install -c conda-forge causalml
-```
-
-### Install with the `conda` virtual environment
-This will create a new `conda` virtual environment named `causalml-[tf-]py3x`, where `x` is in `[6, 7, 8, 9]`. e.g. `causalml-py37` or `causalml-tf-py38`. If you want to change the name of the environment, update the relevant YAML file in `envs/`
-
-```bash
-git clone https://github.com/uber/causalml.git
-cd causalml/envs/
-conda env create -f environment-py38.yml	# for the virtual environment with Python 3.8 and CausalML
-conda activate causalml-py38
-(causalml-py38)
-```
-
-### Install `causalml` with `tensorflow`
-```bash
-git clone https://github.com/uber/causalml.git
-cd causalml/envs/
-conda env create -f environment-tf-py38.yml	# for the virtual environment with Python 3.8 and CausalML
-conda activate causalml-tf-py38
-(causalml-tf-py38) pip install -U numpy			# this step is necessary to fix [#338](https://github.com/uber/causalml/issues/338)
-```
-
-## Install from `PyPI`:
-
-```bash
-pip install causalml
-```
-
-### Install `causalml` with `tensorflow`
-```bash
-pip install causalml[tf]
-pip install -U numpy							# this step is necessary to fix [#338](https://github.com/uber/causalml/issues/338)
-```
-
-## Install from source:
-
-### Create a clean conda environment
-
-```
-conda create -n causalml-py38 -y python=3.8
-conda activate causalml-py38
-conda install -c conda-forge cxx-compiler
-conda install python-graphviz
-conda install -c conda-forge xorg-libxrender
-```
-
-Then:
-
-```bash
-git clone https://github.com/uber/causalml.git
-cd causalml
-pip install .
-python setup.py build_ext --inplace
-```
-
-with `tensorflow`:
 
-```bash
-pip install .[tf]
-```
+# Documentation
 
+Documentation is available at:
 
-# Quick Start
+https://causalml.readthedocs.io/en/latest/about.html
 
-## Average Treatment Effect Estimation with S, T, X, and R Learners
 
-```python
-from causalml.inference.meta import LRSRegressor
-from causalml.inference.meta import XGBTRegressor, MLPTRegressor
-from causalml.inference.meta import BaseXRegressor
-from causalml.inference.meta import BaseRRegressor
-from xgboost import XGBRegressor
-from causalml.dataset import synthetic_data
-
-y, X, treatment, _, _, e = synthetic_data(mode=1, n=1000, p=5, sigma=1.0)
-
-lr = LRSRegressor()
-te, lb, ub = lr.estimate_ate(X, treatment, y)
-print('Average Treatment Effect (Linear Regression): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
-
-xg = XGBTRegressor(random_state=42)
-te, lb, ub = xg.estimate_ate(X, treatment, y)
-print('Average Treatment Effect (XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
-
-nn = MLPTRegressor(hidden_layer_sizes=(10, 10),
-                 learning_rate_init=.1,
-                 early_stopping=True,
-                 random_state=42)
-te, lb, ub = nn.estimate_ate(X, treatment, y)
-print('Average Treatment Effect (Neural Network (MLP)): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
-
-xl = BaseXRegressor(learner=XGBRegressor(random_state=42))
-te, lb, ub = xl.estimate_ate(X, treatment, y, e)
-print('Average Treatment Effect (BaseXRegressor using XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
-
-rl = BaseRRegressor(learner=XGBRegressor(random_state=42))
-te, lb, ub =  rl.estimate_ate(X=X, p=e, treatment=treatment, y=y)
-print('Average Treatment Effect (BaseRRegressor using XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
-```
-
-See the [Meta-learner example notebook](https://github.com/uber/causalml/blob/master/docs/examples/meta_learners_with_synthetic_data.ipynb) for details.
-
-
-## Interpretable Causal ML
-
-Causal ML provides methods to interpret the treatment effect models trained as follows:
-
-### Meta Learner Feature Importances
-
-```python
-from causalml.inference.meta import BaseSRegressor, BaseTRegressor, BaseXRegressor, BaseRRegressor
-from causalml.dataset.regression import synthetic_data
-
-# Load synthetic data
-y, X, treatment, tau, b, e = synthetic_data(mode=1, n=10000, p=25, sigma=0.5)
-w_multi = np.array(['treatment_A' if x==1 else 'control' for x in treatment]) # customize treatment/control names
-
-slearner = BaseSRegressor(LGBMRegressor(), control_name='control')
-slearner.estimate_ate(X, w_multi, y)
-slearner_tau = slearner.fit_predict(X, w_multi, y)
-
-model_tau_feature = RandomForestRegressor()  # specify model for model_tau_feature
-
-slearner.get_importance(X=X, tau=slearner_tau, model_tau_feature=model_tau_feature,
-                        normalize=True, method='auto', features=feature_names)
-
-# Using the feature_importances_ method in the base learner (LGBMRegressor() in this example)
-slearner.plot_importance(X=X, tau=slearner_tau, normalize=True, method='auto')
-
-# Using eli5's PermutationImportance
-slearner.plot_importance(X=X, tau=slearner_tau, normalize=True, method='permutation')
+# Installation
 
-# Using SHAP
-shap_slearner = slearner.get_shap_values(X=X, tau=slearner_tau)
+Installation instructions are available at: 
 
-# Plot shap values without specifying shap_dict
-slearner.plot_shap_values(X=X, tau=slearner_tau)
+https://causalml.readthedocs.io/en/latest/installation.html
 
-# Plot shap values WITH specifying shap_dict
-slearner.plot_shap_values(X=X, shap_dict=shap_slearner)
 
-# interaction_idx set to 'auto' (searches for feature with greatest approximate interaction)
-slearner.plot_shap_dependence(treatment_group='treatment_A',
-                              feature_idx=1,
-                              X=X,
-                              tau=slearner_tau,
-                              interaction_idx='auto')
-```
-<div align="center">
-  <img width="629px" height="618px" src="https://raw.githubusercontent.com/uber/causalml/master/docs/_static/img/shap_vis.png">
-</div>
+# Quickstart
 
-See the [feature interpretations example notebook](https://github.com/uber/causalml/blob/master/docs/examples/feature_interpretations_example.ipynb) for details.
+Quickstarts with code-snippets are available at: 
 
-### Uplift Tree Visualization
+https://causalml.readthedocs.io/en/latest/quickstart.html
 
-```python
-from IPython.display import Image
-from causalml.inference.tree import UpliftTreeClassifier, UpliftRandomForestClassifier
-from causalml.inference.tree import uplift_tree_string, uplift_tree_plot
 
-uplift_model = UpliftTreeClassifier(max_depth=5, min_samples_leaf=200, min_samples_treatment=50,
-                                    n_reg=100, evaluationFunction='KL', control_name='control')
+# Example Notebooks
 
-uplift_model.fit(df[features].values,
-                 treatment=df['treatment_group_key'].values,
-                 y=df['conversion'].values)
+Example notebooks are available at:
 
-graph = uplift_tree_plot(uplift_model.fitted_uplift_tree, features)
-Image(graph.create_png())
-```
-<div align="center">
-  <img width="800px" height="479px" src="https://raw.githubusercontent.com/uber/causalml/master/docs/_static/img/uplift_tree_vis.png">
-</div>
+https://causalml.readthedocs.io/en/latest/examples.html
 
-See the [Uplift Tree visualization example notebook](https://github.com/uber/causalml/blob/master/docs/examples/uplift_tree_visualization.ipynb) for details.
 
 # Contributing
 

diff --git a/causalml/metrics/__init__.py b/causalml/metrics/__init__.py
@@ -32,3 +32,4 @@
     SensitivitySubsetData,
     SensitivitySelectionBias,
 )  # noqa
+from maq import MAQ, get_ipw_scores  # noqa
diff --git a/causalml/metrics/visualize.py b/causalml/metrics/visualize.py
@@ -848,7 +848,7 @@ def qini_score(
     return (qini.sum(axis=0) - qini[RANDOM_COL].sum()) / qini.shape[0]
 
 
-def plot_ps_diagnostics(df, covariate_col, treatment_col="w", p_col="p"):
+def plot_ps_diagnostics(df, covariate_col, treatment_col="w", p_col="p", bal_tol=0.1):
     """Plot covariate balances (standardized differences between the treatment and the control)
     before and after weighting the sample using the inverse probability of treatment weights.
 
@@ -865,40 +865,42 @@ def plot_ps_diagnostics(df, covariate_col, treatment_col="w", p_col="p"):
     IPTW = get_simple_iptw(W, PS)
 
     diffs_pre = get_std_diffs(X, W, weighted=False)
-    num_unbal_pre = (np.abs(diffs_pre) > 0.1).sum()[0]
+    num_unbal_pre = (np.abs(diffs_pre) > bal_tol).sum()[0]
 
     diffs_post = get_std_diffs(X, W, IPTW, weighted=True)
-    num_unbal_post = (np.abs(diffs_post) > 0.1).sum()[0]
+    num_unbal_post = (np.abs(diffs_post) > bal_tol).sum()[0]
 
-    diff_plot = _plot_std_diffs(diffs_pre, num_unbal_pre, diffs_post, num_unbal_post)
+    diff_plot = _plot_std_diffs(
+        diffs_pre, num_unbal_pre, diffs_post, num_unbal_post, bal_tol=bal_tol
+    )
 
     return diff_plot
 
 
-def _plot_std_diffs(diffs_pre, num_unbal_pre, diffs_post, num_unbal_post):
-    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 10), sharex=True, sharey=True)
+def _plot_std_diffs(diffs_pre, num_unbal_pre, diffs_post, num_unbal_post, bal_tol=0.1):
+    fig, ax1 = plt.subplots()
 
     color = "#EA2566"
 
-    sns.stripplot(diffs_pre.iloc[:, 0], diffs_pre.index, ax=ax1)
-    ax1.set_xlabel(
-        "Before. Number of unbalanced covariates: {num_unbal}".format(
-            num_unbal=num_unbal_pre
-        ),
-        fontsize=14,
+    sds_pre = pd.DataFrame(
+        {"std_diff": diffs_pre[0], "covariate": diffs_pre.index, "prepost": "pre"}
     )
-    ax1.axvline(x=-0.1, ymin=0, ymax=1, color=color, linestyle="--")
-    ax1.axvline(x=0.1, ymin=0, ymax=1, color=color, linestyle="--")
+    sds_post = pd.DataFrame(
+        {"std_diff": diffs_post[0], "covariate": diffs_post.index, "prepost": "post"}
+    )
+
+    sds = pd.concat([sds_pre, sds_post], ignore_index=True)
 
-    sns.stripplot(diffs_post.iloc[:, 0], diffs_post.index, ax=ax2)
-    ax2.set_xlabel(
-        "After. Number of unbalanced covariates: {num_unbal}".format(
-            num_unbal=num_unbal_post
+    sns.stripplot(data=sds, x="std_diff", y="covariate", hue="prepost", ax=ax1)
+
+    ax1.set_xlabel(
+        "Pre/Post Number of unbalanced covariates: {num_unbal_pre}/{num_unbal_post}".format(
+            num_unbal_pre=num_unbal_pre, num_unbal_post=num_unbal_post
         ),
         fontsize=14,
     )
-    ax2.axvline(x=-0.1, ymin=0, ymax=1, color=color, linestyle="--")
-    ax2.axvline(x=0.1, ymin=0, ymax=1, color=color, linestyle="--")
+    ax1.axvline(x=-bal_tol, ymin=0, ymax=1, color=color, linestyle="--", lw=2)
+    ax1.axvline(x=bal_tol, ymin=0, ymax=1, color=color, linestyle="--", lw=2)
 
     fig.suptitle("Standardized differences in means", fontsize=16)