Skip to content

Commit de75b4b

Browse files
committed
Initial example
1 parent 7cde989 commit de75b4b

File tree

2 files changed

+206
-0
lines changed

2 files changed

+206
-0
lines changed

doc/examples/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ General Examples
3333
py_double_ml_plm_irm_hetfx.ipynb
3434
py_double_ml_meets_flaml.ipynb
3535
py_double_ml_rdflex.ipynb
36+
py_double_ml_lplr.ipynb
3637

3738

3839
Effect Heterogeneity
Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
{
2+
"cells": [
3+
{
4+
"attachments": {},
5+
"cell_type": "markdown",
6+
"metadata": {
7+
"collapsed": false
8+
},
9+
"source": [
10+
"# Python: Log-Odds Effects for Logistic PLR models\n",
11+
"\n",
12+
"In this simple example, we illustrate how the [DoubleML](https://docs.doubleml.org/stable/index.html) package can be used to estimate the changes in log-odds due to treatment in a logistic partíal linear regression [DoubleMLLPLR](https://docs.doubleml.org/stable/guide/models.html#logistic-partial-linear-regression-lplr) model."
13+
]
14+
},
15+
{
16+
"cell_type": "code",
17+
"metadata": {
18+
"ExecuteTime": {
19+
"end_time": "2025-11-12T23:42:30.920222Z",
20+
"start_time": "2025-11-12T23:42:30.915753Z"
21+
}
22+
},
23+
"source": [
24+
"import numpy as np\n",
25+
"import pandas as pd\n",
26+
"import doubleml as dml\n",
27+
"\n",
28+
"from doubleml.plm.datasets import make_lplr_LZZ2020"
29+
],
30+
"outputs": [],
31+
"execution_count": 3
32+
},
33+
{
34+
"cell_type": "markdown",
35+
"metadata": {},
36+
"source": [
37+
"## Data\n",
38+
"\n",
39+
"We define a data generating process to create synthetic data to compare the estimates to the true effect. The data generating process is adapted and extended from [Liu et al. (2020)](https://academic.oup.com/ectj/article-abstract/24/3/559/6296639).\n",
40+
"\n",
41+
"The documentation of the data generating process can be found [here](https://docs.doubleml.org/stable/api/datasets.html).\n",
42+
"\n",
43+
"The data generation process supports both binary and continuous treatments. In this example we consider a continuous treatment effect. Both the treatment assignment (if binary) and the outcome variable balancing can be can be adjusted."
44+
]
45+
},
46+
{
47+
"cell_type": "code",
48+
"metadata": {
49+
"ExecuteTime": {
50+
"end_time": "2025-11-13T00:05:27.845205Z",
51+
"start_time": "2025-11-13T00:05:27.835022Z"
52+
}
53+
},
54+
"source": [
55+
"np.random.seed(42)\n",
56+
"data = make_lplr_LZZ2020(n_obs=1000, dim_x=20, alpha=0.5, treatment=\"continuous\")\n",
57+
"print(data)"
58+
],
59+
"outputs": [
60+
{
61+
"name": "stdout",
62+
"output_type": "stream",
63+
"text": [
64+
"================== DoubleMLData Object ==================\n",
65+
"\n",
66+
"------------------ Data summary ------------------\n",
67+
"Outcome variable: y\n",
68+
"Treatment variable(s): ['d']\n",
69+
"Covariates: ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20']\n",
70+
"Instrument variable(s): None\n",
71+
"No. Observations: 1000\n",
72+
"\n",
73+
"------------------ DataFrame info ------------------\n",
74+
"<class 'pandas.core.frame.DataFrame'>\n",
75+
"RangeIndex: 1000 entries, 0 to 999\n",
76+
"Columns: 23 entries, X1 to p\n",
77+
"dtypes: float64(23)\n",
78+
"memory usage: 179.8 KB\n",
79+
"\n"
80+
]
81+
}
82+
],
83+
"execution_count": 32
84+
},
85+
{
86+
"metadata": {},
87+
"cell_type": "markdown",
88+
"source": [
89+
"## Model\n",
90+
"\n",
91+
"The logistic partial linear regression (LPLR) model is specified as follows:\n",
92+
"\n",
93+
"$$\\mathbb{E} [Y | D, X] = \\mathbb{P} (Y=1 | D, X) = \\text{expit} \\{\\beta_0 D + r_0 (X) \\}$$\n",
94+
"\n",
95+
"where $Y$ is the binary outcome variable and $D$ is the policy variable of interest.\n",
96+
"The high-dimensional vector $X = (X_1, \\ldots, X_p)$ consists of other confounding covariates.\n",
97+
"$\\text{expit}$ is the logistic link function\n",
98+
"\n",
99+
"$$\\text{expit} ( X ) = \\frac{1}{1 + e^{-x}}$$\n",
100+
"\n",
101+
"The log-odds of the treated versus the untreated is modelled as a partial linear model. The estimated coefficient $\\beta_0$ can be interpreted as the change in log-odds due to a one unit increase in the treatment variable $D$, holding all other covariates constant."
102+
]
103+
},
104+
{
105+
"metadata": {},
106+
"cell_type": "markdown",
107+
"source": [
108+
"Next, define the learners for the nuisance functions and fit the [LPLR Model](https://docs.doubleml.org/stable/guide/models.html#logistic-partial-linear-regression-lplr).\n",
109+
"The correct type of learner (regressor or classifier) must be used for each nuisance function.\n",
110+
"\n",
111+
"- ml_M is a model of the outcome. Here, since the outcome is binary, we use a classifier.\n",
112+
"- ml_t is a model of the log-odds. This must always be a regressor.\n",
113+
"- ml_m is a model of the treatment. Here, since the treatment is continuous, we use a regressor. In the case of a binary treatment, a classifier must be used."
114+
]
115+
},
116+
{
117+
"metadata": {
118+
"ExecuteTime": {
119+
"end_time": "2025-11-13T00:05:47.340376Z",
120+
"start_time": "2025-11-13T00:05:31.657594Z"
121+
}
122+
},
123+
"cell_type": "code",
124+
"source": [
125+
"# First stage estimation\n",
126+
"from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor\n",
127+
"randomForest_reg = RandomForestRegressor()\n",
128+
"randomForest_class = RandomForestClassifier()\n",
129+
"\n",
130+
"np.random.seed(4242)\n",
131+
"\n",
132+
"dml_lplr = dml.DoubleMLLPLR(data,\n",
133+
" ml_M=randomForest_class,\n",
134+
" ml_t=randomForest_reg,\n",
135+
" ml_m=randomForest_reg,\n",
136+
" n_folds=5)\n",
137+
"print(\"Training LPLR Model\")\n",
138+
"dml_lplr.fit()\n",
139+
"\n",
140+
"print(dml_lplr.summary)"
141+
],
142+
"outputs": [
143+
{
144+
"name": "stdout",
145+
"output_type": "stream",
146+
"text": [
147+
"Training LPLR Model\n"
148+
]
149+
},
150+
{
151+
"name": "stderr",
152+
"output_type": "stream",
153+
"text": [
154+
"/Users/julius/Projects/DoubleMLLogit/.venv/lib/python3.13/site-packages/sklearn/utils/deprecation.py:132: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.\n",
155+
" warnings.warn(\n",
156+
"/Users/julius/Projects/DoubleMLLogit/.venv/lib/python3.13/site-packages/sklearn/utils/deprecation.py:132: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.\n",
157+
" warnings.warn(\n"
158+
]
159+
},
160+
{
161+
"name": "stdout",
162+
"output_type": "stream",
163+
"text": [
164+
" coef std err t P>|t| 2.5 % 97.5 %\n",
165+
"d 0.35212 0.100429 3.506179 0.000455 0.155284 0.548957\n"
166+
]
167+
}
168+
],
169+
"execution_count": 33
170+
},
171+
{
172+
"metadata": {},
173+
"cell_type": "code",
174+
"outputs": [],
175+
"execution_count": null,
176+
"source": ""
177+
}
178+
],
179+
"metadata": {
180+
"kernelspec": {
181+
"display_name": "Python 3.10.6 64-bit",
182+
"language": "python",
183+
"name": "python3"
184+
},
185+
"language_info": {
186+
"codemirror_mode": {
187+
"name": "ipython",
188+
"version": 3
189+
},
190+
"file_extension": ".py",
191+
"mimetype": "text/x-python",
192+
"name": "python",
193+
"nbconvert_exporter": "python",
194+
"pygments_lexer": "ipython3",
195+
"version": "3.12.3"
196+
},
197+
"vscode": {
198+
"interpreter": {
199+
"hash": "ac5e9af40c2048901fb5e070f7bbe2ca12417b0669992742e66f016e0e17b88e"
200+
}
201+
}
202+
},
203+
"nbformat": 4,
204+
"nbformat_minor": 0
205+
}

0 commit comments

Comments
 (0)