-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
493 add mondrian cp #504
Merged
Merged
493 add mondrian cp #504
Changes from 74 commits
Commits
Show all changes
86 commits
Select commit
Hold shift + click to select a range
45a65c1
ADD: initia Mondrian class
vincentblot28 c7c209a
ENH: add docstring to class
vincentblot28 0ead65c
ADD: typing docstring and linting
vincentblot28 3a6fa2d
TST: first test for mondrian
vincentblot28 103ace5
FIX: define not allowed method insteand of allowed
vincentblot28 258b2d1
TST: test for bad cv and mapie estimator
vincentblot28 ecd452b
FIX: use model predict instead of mapie prediciton in predict
vincentblot28 5e06b31
TST: bad groups, predict_proba, alpha none
vincentblot28 f9687cf
TST: check groups can be lists
vincentblot28 7763f5b
FIX: linting
vincentblot28 b764605
TST: same reuslts as classical if only one group
vincentblot28 d5015ad
FIX: typing
vincentblot28 7577ffc
ADD: docstring to tests
vincentblot28 ec47e49
FIX: linting
vincentblot28 2dbb7c0
FIX: checks for NCS were not working
vincentblot28 06fb35e
FIX: topk name anddistinction between task for valid estimators
vincentblot28 9c41479
FIX: replace isinstance by type to avoid confusion with child class
vincentblot28 986e2c1
FIX: indent in test in docstring
vincentblot28 d39af29
FIX: typing
vincentblot28 098230e
UPD: update history.rst
vincentblot28 70c351b
Merge branch 'master' into 493-add-mondrian-cp
vincentblot28 ca74087
FIX: typing
vincentblot28 32eb959
Merge branch '493-add-mondrian-cp' of github.com:scikit-learn-contrib…
vincentblot28 dd1fe50
FIX: typing
vincentblot28 44f7476
ADD: documentation
vincentblot28 aaa7f32
DOC: fix latex and add figure to mondrian
vincentblot28 56ea922
FIX: change image name
vincentblot28 1e2ccb5
ENH: rewrite quantile in italic
vincentblot28 c0532e4
FIX: typo in docstring
vincentblot28 53fb8b2
ENH: put public emthods at the begining of the file
vincentblot28 791d750
ENH: add in docstring that groups must be integers
vincentblot28 325c2a9
ENH remove MapieCalibrator
vincentblot28 ad8faab
ENH: remove MapieMultilabelClassifier
vincentblot28 f9b79e2
UPD: test with calibration and multilabel as wrong methods
vincentblot28 0651841
NEH: change kwargs to predcit_params and fit_params
vincentblot28 3b26142
ENH: rename Mondrian to MondrianCP
vincentblot28 48ebe09
UPD: class docstring with constraints
vincentblot28 dc5a371
FIX: Call MondrianCP in docstring test
vincentblot28 6646a07
ENH: add single method for cehck group length
vincentblot28 7ecd6a8
ENH: define output shape outside of the loop
vincentblot28 884c341
FIX: typing for n classes
vincentblot28 e32c8a0
ENH rename _check_mapie_classifier in _check_cv
vincentblot28 05d74a6
ENH: move check_alpha at begninning of predict
vincentblot28 518f78b
FIX: definiiton of n_classes
vincentblot28 cc48cb1
ENH remove old tests
vincentblot28 96e3358
FIX: coveage with frong fit_params in fit_params in tests
vincentblot28 d0842bb
Update mapie/tests/test_mondrian.py
vincentblot28 85fe875
Update mapie/tests/test_mondrian.py
vincentblot28 6f4b06c
Update mapie/tests/test_mondrian.py
vincentblot28 8f44c33
Update mapie/mondrian.py
vincentblot28 f4a0a45
Update doc/theoretical_description_mondrian.rst
vincentblot28 2ac857e
Update doc/theoretical_description_mondrian.rst
vincentblot28 94415c1
Update HISTORY.rst
vincentblot28 381a8ec
Update mapie/mondrian.py
vincentblot28 2aa9728
Update mapie/mondrian.py
vincentblot28 e58300b
Update mapie/mondrian.py
vincentblot28 791abba
Update mapie/mondrian.py
vincentblot28 999eb25
Update mapie/mondrian.py
vincentblot28 9c85ecb
Update mapie/mondrian.py
vincentblot28 c0646db
Update mapie/mondrian.py
vincentblot28 b4d5dd8
Update mapie/mondrian.py
vincentblot28 b9a9ca7
Update mapie/mondrian.py
vincentblot28 6330872
FIX: linting and docstring
vincentblot28 b4b9934
STY: skip lines in fit definition
vincentblot28 0e65abc
STY: docstring style
vincentblot28 5844fbe
ENH: test test_same_results_if_only_one_group for multiple values of …
vincentblot28 ccc1e2d
FIX: minor typo
vincentblot28 4b51a0a
ADD: mondrian to API.rst
vincentblot28 70f6f34
DOC: add tutorial notebook
vincentblot28 147142d
ADD: mondrian tutorial to index.rst
vincentblot28 8773dfc
UPD: odc
vincentblot28 aa47dae
UPD: doc
vincentblot28 cc6e39c
ADD: readme file for mondrian
vincentblot28 2acb98d
ADD: readme
vincentblot28 5ba97d6
UPD: use copy model to prefit
d7e88c5
FIX: lint problem with group at None
5979dcb
ENH: check group lenght in check fit params
vincentblot28 d6aa546
ENH: rename groups into partition
vincentblot28 2fa047d
ENH: rename groups into partition in tests
vincentblot28 9b85022
FIX: test in Mondrian docstring
vincentblot28 36b03c9
FIX: alpha value in docstring test
vincentblot28 ed374e6
DEL: delete unused notebook
vincentblot28 7672b2e
TST: test that estimator don't fail if given many alphas
vincentblot28 39c5c06
FIX: legend inside plot in tuto + rename group into partition in tuto
vincentblot28 f937bc8
ENH: increase figure size for tuto
vincentblot28 12e71f3
FIX: sections titles in tutorial
vincentblot28 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
.. title:: Theoretical Description Mondrian : contents | ||
|
||
.. _theoretical_description_mondrian: | ||
|
||
####################### | ||
Theoretical Description | ||
####################### | ||
|
||
Mondrian conformal prediction (MCP) [1] is a method that allows to build prediction sets with a group-conditional | ||
coverage guarantee. The coverage guarantee is given by: | ||
|
||
.. math:: | ||
P \{Y_{n+1} \in \hat{C}_{n, \alpha}(X_{n+1}) | G_{n+1} = g\} \geq 1 - \alpha | ||
|
||
where :math:`G_{n+1}` is the group of the new test point :math:`X_{n+1}` and :math:`g` | ||
is a group in the set of groups :math:`\mathcal{G}`. | ||
|
||
MCP can be used with any split conformal predictor and can be particularly useful when one have a prior | ||
knowledge about existing groups wheter the information is directly included in the features | ||
of the data or not. | ||
In a classifcation setting, the groups can be defined as the predicted classes of the data. Doing so, | ||
one can ensure that, for each predicted class, the coverage guarantee is satisfied. | ||
|
||
In order to achieve the group-conditional coverage guarantee, MCP simply classifies the data | ||
according to the groups and then applies the split conformal predictor to each group separately. | ||
|
||
The quantile of each group is defined as: | ||
|
||
.. math:: | ||
\widehat{q}^g =Quantile\left(s_1, ..., s_{n^g} ,\frac{\lceil (n^{(g)} + 1)(1-\alpha)\rceil}{n^{(g)}} \right) | ||
|
||
Where :math:`s_1, ..., s_{n^g}` are the conformity scores of the training points in group :math:`g` and :math:`n^{(g)}` | ||
is the number of training points in group :math:`g`. | ||
|
||
The following figure (from [1]) explains the process of Mondrian conformal prediction: | ||
|
||
.. image:: images/mondrian.png | ||
:width: 600 | ||
:align: center | ||
|
||
References | ||
---------- | ||
|
||
[1] Vladimir Vovk, David Lindsay, Ilia Nouretdinov, and Alex Gammerman. | ||
Mondrian confidence machine. | ||
Technical report, Royal Holloway University of London, 2003 |
Binary file added
BIN
+78.6 KB
doc/tutorial_mondrian_regression_files/tutorial_mondrian_regression_13_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+18.8 KB
doc/tutorial_mondrian_regression_files/tutorial_mondrian_regression_15_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+36.3 KB
doc/tutorial_mondrian_regression_files/tutorial_mondrian_regression_2_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+111 KB
doc/tutorial_mondrian_regression_files/tutorial_mondrian_regression_5_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+21 KB
doc/tutorial_mondrian_regression_files/tutorial_mondrian_regression_8_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
.. _mondrian_examples_1: | ||
|
||
1. Quickstart examples | ||
---------------------- | ||
|
||
The following examples present the main functionalities of MAPIE through basic quickstart regression problems. |
181 changes: 181 additions & 0 deletions
181
examples/mondrian/1-quickstart/plot_main-tutorial-mondrian-regression.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
r""" | ||
============================================= | ||
Tutorial for tabular regression with Mondrian | ||
============================================= | ||
|
||
In this tutorial, we compare the prediction intervals estimated by MAPIE on a | ||
simple, one-dimensional, ground truth function with classical conformal | ||
prediction intervals versus Mondrian conformal prediction intervals. | ||
The function is a sinusoidal function with added noise, and the data is | ||
grouped in 10 groups. The goal is to estimate the prediction intervals | ||
for new data points, and to compare the coverage of the prediction intervals | ||
by groups. | ||
Throughout this tutorial, we will answer the following questions: | ||
|
||
|
||
- How to use MAPIE to estimate prediction intervals for a regression problem? | ||
- How to use Mondrian conformal prediction intervals for regression? | ||
- How to compare the coverage of the prediction intervals by groups? | ||
""" | ||
|
||
import os | ||
import warnings | ||
|
||
import matplotlib.pyplot as plt | ||
import numpy as np | ||
from sklearn.model_selection import train_test_split | ||
from sklearn.ensemble import RandomForestRegressor | ||
|
||
from mapie.metrics import regression_coverage_score_v2 | ||
from mapie.mondrian import MondrianCP | ||
from mapie.regression import MapieRegressor | ||
|
||
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" | ||
warnings.filterwarnings("ignore") | ||
|
||
|
||
############################################################################## | ||
# 1. Create the noisy dataset with 10 groups, each of those groups having | ||
# a different level of noise. | ||
# ------------------------------------------------------------------- | ||
|
||
|
||
n_points = 100000 | ||
np.random.seed(0) | ||
X = np.linspace(0, 10, n_points).reshape(-1, 1) | ||
group_size = n_points // 10 | ||
groups_list = [] | ||
for i in range(10): | ||
groups_list.append(np.array([i] * group_size)) | ||
groups = np.concatenate(groups_list) | ||
|
||
noise_0_1 = np.random.normal(0, 0.1, group_size) | ||
noise_1_2 = np.random.normal(0, 0.5, group_size) | ||
noise_2_3 = np.random.normal(0, 1, group_size) | ||
noise_3_4 = np.random.normal(0, .4, group_size) | ||
noise_4_5 = np.random.normal(0, .2, group_size) | ||
noise_5_6 = np.random.normal(0, .3, group_size) | ||
noise_6_7 = np.random.normal(0, .6, group_size) | ||
noise_7_8 = np.random.normal(0, .7, group_size) | ||
noise_8_9 = np.random.normal(0, .8, group_size) | ||
noise_9_10 = np.random.normal(0, .9, group_size) | ||
|
||
y = np.concatenate( | ||
[ | ||
np.sin(X[groups == 0, 0] * 2) + noise_0_1, | ||
np.sin(X[groups == 1, 0] * 2) + noise_1_2, | ||
np.sin(X[groups == 2, 0] * 2) + noise_2_3, | ||
np.sin(X[groups == 3, 0] * 2) + noise_3_4, | ||
np.sin(X[groups == 4, 0] * 2) + noise_4_5, | ||
np.sin(X[groups == 5, 0] * 2) + noise_5_6, | ||
np.sin(X[groups == 6, 0] * 2) + noise_6_7, | ||
np.sin(X[groups == 7, 0] * 2) + noise_7_8, | ||
np.sin(X[groups == 8, 0] * 2) + noise_8_9, | ||
np.sin(X[groups == 9, 0] * 2) + noise_9_10, | ||
], axis=0 | ||
) | ||
|
||
|
||
############################################################################## | ||
# We plot the dataset with the groups as colors. | ||
|
||
|
||
plt.scatter(X, y, c=groups) | ||
plt.show() | ||
|
||
|
||
############################################################################## | ||
# 2. Split the dataset into a training set, a calibration set, and a test set. | ||
|
||
|
||
X_train_temp, X_test, y_train_temp, y_test = train_test_split( | ||
X, y, test_size=0.2, random_state=0 | ||
) | ||
groups_train_temp, groups_test, _, _ = train_test_split( | ||
groups, y, test_size=0.2, random_state=0 | ||
) | ||
X_cal, X_train, y_cal, y_train = train_test_split( | ||
X_train_temp, y_train_temp, test_size=0.5, random_state=0 | ||
) | ||
groups_cal, groups_train, _, _ = train_test_split( | ||
groups_train_temp, y_train_temp, test_size=0.5, random_state=0 | ||
) | ||
|
||
|
||
############################################################################## | ||
# We plot the training set, the calibration set, and the test set. | ||
|
||
|
||
f, ax = plt.subplots(1, 3, figsize=(15, 5)) | ||
ax[0].scatter(X_train, y_train, c=groups_train) | ||
ax[0].set_title("Train set") | ||
ax[1].scatter(X_cal, y_cal, c=groups_cal) | ||
ax[1].set_title("Calibration set") | ||
ax[2].scatter(X_test, y_test, c=groups_test) | ||
ax[2].set_title("Test set") | ||
plt.show() | ||
|
||
|
||
############################################################################## | ||
# 3. Fit a random forest regressor on the training set. | ||
|
||
|
||
rf = RandomForestRegressor(n_estimators=100) | ||
rf.fit(X_train, y_train) | ||
|
||
|
||
############################################################################## | ||
# 4. Fit a MapieRegressor and a MondrianCP on the calibration set. | ||
|
||
|
||
mapie_regressor = MapieRegressor(rf, cv="prefit") | ||
mondrian_regressor = MondrianCP(MapieRegressor(rf, cv="prefit")) | ||
mapie_regressor.fit(X_cal, y_cal) | ||
mondrian_regressor.fit(X_cal, y_cal, groups=groups_cal) | ||
|
||
|
||
############################################################################## | ||
# 5. Predict the prediction intervals on the test set with both methods. | ||
|
||
|
||
_, y_pss_split = mapie_regressor.predict(X_test, alpha=.1) | ||
_, y_pss_mondrian = mondrian_regressor.predict( | ||
X_test, groups=groups_test, alpha=.1 | ||
) | ||
|
||
|
||
############################################################################## | ||
# 6. Compare the coverage by groups, plot both methods side by side. | ||
|
||
|
||
coverages = {} | ||
for group in np.unique(groups_test): | ||
coverages[group] = {} | ||
coverages[group]["split"] = regression_coverage_score_v2( | ||
y_test[groups_test == group], y_pss_split[groups_test == group] | ||
) | ||
coverages[group]["mondrian"] = regression_coverage_score_v2( | ||
y_test[groups_test == group], y_pss_mondrian[groups_test == group] | ||
) | ||
|
||
|
||
# Plot the coverage by groups, plot both methods side by side | ||
plt.bar( | ||
np.arange(len(coverages)) * 2, | ||
[float(coverages[group]["split"]) for group in coverages], | ||
label="Split" | ||
) | ||
plt.bar( | ||
np.arange(len(coverages)) * 2 + 1, | ||
[float(coverages[group]["mondrian"]) for group in coverages], | ||
label="Mondrian" | ||
) | ||
plt.xticks( | ||
np.arange(len(coverages)) * 2 + .5, | ||
[f"Group {group}" for group in coverages], | ||
rotation=45 | ||
) | ||
plt.hlines(0.9, -1, 21, label="90% coverage", color="black", linestyle="--") | ||
plt.ylabel("Coverage") | ||
plt.legend(loc='upper left', bbox_to_anchor=(1, 1)) | ||
plt.show() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
.. _mondrian_examples: | ||
|
||
Mondrian examples | ||
======================= |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not urgent, but if you have the time, the titles are not displayed properly -> you will have to see what's wrong or if it's intentional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done