-
Notifications
You must be signed in to change notification settings - Fork 38
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
24 changed files
with
4,142 additions
and
153 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,20 @@ | ||
# `crepes`: Conformal Regressors and Predictive Systems | ||
# `crepes` | ||
|
||
`crepes` is a Python package for generating *conformal regressors*, which transform point predictions of any underlying regression model into prediction intervals for specified levels of confidence. The package also implements *conformal predictive systems*, which transform the point predictions into cumulative distribution functions. | ||
|
||
The `crepes` package implements standard, normalized and Mondrian conformal regressors and predictive systems. While the package allows you to use your own difficulty estimates and Mondrian categories, there is also a separate module, called `crepes.fillings`, which provides some standard options for these. | ||
|
||
## Installation | ||
|
||
Install with: `pip install crepes` | ||
Install with: | ||
|
||
```bash | ||
pip install crepes | ||
``` | ||
|
||
## Documentation | ||
|
||
For complete documentation of the `crepes` package, see [here](https://crepes.readthedocs.io/en/latest/). | ||
|
||
## Quickstart | ||
|
||
|
@@ -19,7 +27,7 @@ from crepes import ConformalRegressor | |
from crepes.fillings import sigma_knn, binning | ||
``` | ||
|
||
We will illustrate the above using a dataset from www.openml.org and a `RandomForestRegressor` from `sklearn`: | ||
We will illustrate the above using a dataset from [www.openml.org](https://www.openml.org) and a `RandomForestRegressor` from [sklearn](https://scikit-learn.org): | ||
|
||
```python | ||
from sklearn.datasets import fetch_openml | ||
|
@@ -60,16 +68,16 @@ We can now apply the conformal regressor to get prediction intervals for the tes | |
std_intervals = cr_std.predict(y_hat=y_hat_test, confidence=0.99) | ||
``` | ||
|
||
The output is a NumPy array, specifying the lower and upper bound of each interval: | ||
The output is a [NumPy](https://numpy.org) array, specifying the lower and upper bound of each interval: | ||
|
||
```numpy | ||
array([[-353379. , 939231. ], | ||
[-251874.3 , 1040735.7 ], | ||
[-138329.5 , 1154280.5 ], | ||
array([[-171902.2 , 953866.2 ], | ||
[-276818.01, 848950.39], | ||
[ 22679.37, 1148447.77], | ||
..., | ||
[-389128.68, 903481.32], | ||
[-313003. , 979607. ], | ||
[ -90551.53, 1202058.47]]) | ||
[ 242954.02, 1368722.42], | ||
[-308093.73, 817674.67], | ||
[-227057.4 , 898711. ]]) | ||
``` | ||
|
||
We may request that the intervals are cut to exclude impossible values, in this case below 0, and if we also rely on the default confidence level (0.95), the output intervals will be a bit tighter: | ||
|
@@ -79,21 +87,21 @@ intervals_std = cr_std.predict(y_hat=y_hat_test, y_min=0) | |
``` | ||
|
||
```numpy | ||
array([[ 7576.18, 578275.82], | ||
[109080.88, 679780.52], | ||
[222625.68, 793325.32], | ||
array([[ 152258.55, 629705.45], | ||
[ 47342.74, 524789.64], | ||
[ 346840.12, 824287.02], | ||
..., | ||
[ 0. , 542526.14], | ||
[ 47952.18, 618651.82], | ||
[270403.65, 841103.29]]) | ||
[ 567114.77, 1044561.67], | ||
[ 16067.02, 493513.92], | ||
[ 97103.35, 574550.25]]) | ||
``` | ||
|
||
The above intervals are not normalized, i.e., they are all of the same size (at least before they are cut). We could make the intervals more informative through normalization using difficulty estimates; more difficult instances will be assigned wider intervals. | ||
|
||
We will here use the helper function `sigma_knn` for this purpose. It estimates the difficulty by the mean absolute errors of the k (default `k=5`) nearest neighbors to each instance in the calibration set. A small value (beta) is added to the estimates, which may be given through an argument to the function; below we just use the default, i.e., `beta=0.01`. | ||
We will use the helper function `sigma_knn` for this purpose. Here it estimates the difficulty by the standard deviation of the target of the k (default `k=25`) nearest neighbors in the proper training set to each instance in the calibration set. A small value (beta) is added to the estimates, which may be given through an argument to the function; below we just use the default, i.e., `beta=0.01`. | ||
|
||
```python | ||
sigmas_cal = sigma_knn(X=X_cal, residuals=residuals_cal) | ||
sigmas_cal = sigma_knn(X=X_cal, X_ref=X_prop_train, y_ref=y_prop_train) | ||
``` | ||
|
||
The difficulty estimates and residuals of the calibration examples can now be used to form a normalized conformal regressor: | ||
|
@@ -103,10 +111,10 @@ cr_norm = ConformalRegressor() | |
cr_norm.fit(residuals=residuals_cal, sigmas=sigmas_cal) | ||
``` | ||
|
||
To generate prediction intervals for the test set using the normalized conformal regressor, we need difficulty estimates for the test set too, which we get using the calibration objects and residuals. | ||
To generate prediction intervals for the test set using the normalized conformal regressor, we need difficulty estimates for the test set too, which we get using the same helper function. | ||
|
||
```python | ||
sigmas_test = sigma_knn(X=X_cal, residuals=residuals_cal, X_test=X_test) | ||
sigmas_test = sigma_knn(X=X_test, X_ref=X_prop_train, y_ref=y_prop_train) | ||
``` | ||
|
||
Now we can obtain the prediction intervals, using the point predictions and difficulty estimates for the test set: | ||
|
@@ -117,13 +125,13 @@ intervals_norm = cr_norm.predict(y_hat=y_hat_test, sigmas=sigmas_test, | |
``` | ||
|
||
```numpy | ||
array([[ 0. , 645527.3140099 ], | ||
[100552.5573358 , 688308.8426642 ], | ||
[206605.7263972 , 809345.2736028 ], | ||
array([[205959.07517616, 576004.92482384], | ||
[133206.86035366, 438925.51964634], | ||
[291925.81345507, 879201.32654493], | ||
..., | ||
[ 55388.60029434, 458964.03970566], | ||
[252094.62400964, 414509.37599036], | ||
[305546.225071 , 805960.714929 ]]) | ||
[622212.95112744, 989463.48887256], | ||
[ 98805.77755066, 410775.16244934], | ||
[197248.38670265, 474405.21329735]]) | ||
``` | ||
|
||
Depending on the employed difficulty estimator, the normalized intervals may sometimes be unreasonably large, in the sense that they may be several times larger than any previously observed error. Moreover, if the difficulty estimator is not very informative, e.g., completely random, the varying interval sizes may give a false impression of that we can expect lower prediction errors for instances with tighter intervals. Ideally, a difficulty estimator providing little or no information on the expected error should instead lead to more uniformly distributed interval sizes. | ||
|
@@ -156,13 +164,13 @@ intervals_mond = cr_mond.predict(y_hat=y_hat_test, bins=bins_test, y_min=0) | |
``` | ||
|
||
```numpy | ||
array([[ 0. , 592782.5 ], | ||
[146648.15, 642213.25], | ||
[260192.95, 755758.05], | ||
array([[ 206379.7 , 575584.3 ], | ||
[ 144014.65, 428117.73], | ||
[ 17965.57, 1153161.57], | ||
..., | ||
[ 38332.66, 476019.98], | ||
[198148.5 , 468455.5 ], | ||
[329931.17, 781575.77]]) | ||
[ 653865.22, 957811.22], | ||
[ 174264.87, 335316.07], | ||
[ 140587.46, 531066.14]]) | ||
``` | ||
|
||
### Conformal predictive systems | ||
|
@@ -201,13 +209,13 @@ intervals = cps_mond_norm.predict(y_hat=y_hat_test, | |
``` | ||
|
||
```numpy | ||
array([[ 0. , 537757.93618585], | ||
[177348.62535049, 655015.98985999], | ||
[253618.31669927, 783707.98804461], | ||
array([[ 226536.76784152, 519404.56955659], | ||
[ 170043.51497485, 376524.37491457], | ||
[ 192376.08061079, 994115.461665 ], | ||
..., | ||
[ 73466.09003216, 397289.46238233], | ||
[273315.68901744, 405309.55870912], | ||
[274035.55188125, 789701.43635318]]) | ||
[ 594183.11971763, 1010273.54816378], | ||
[ 186478.52365968, 308050.53035102], | ||
[ 167498.01540504, 485813.1329371 ]]) | ||
``` | ||
|
||
We can also get the p values for the true target values; they should be uniformly distributed, if the test objects are drawn from the same underlying distribution as the calibration examples. | ||
|
@@ -220,13 +228,13 @@ p_values = cps_mond_norm.predict(y_hat=y_hat_test, | |
``` | ||
|
||
```numpy | ||
array([[0.3262945 ], | ||
[0.12184386], | ||
[0.82948135], | ||
array([[0.98298087], | ||
[0.90125379], | ||
[0.41770673], | ||
..., | ||
[0.75042278], | ||
[0.61815831], | ||
[0.70252814]]) | ||
[0.04659288], | ||
[0.07914733], | ||
[0.31090332]]) | ||
``` | ||
|
||
We may request that the predict method returns the full conformal predictive distribution (CPD) for each test instance, as defined by the threshold values, by setting `return_cpds=True`. The format of the distributions vary with the type of conformal predictive system; for a standard and normalized CPS, the output is an array with a row for each test instance and a column for each calibration instance (residual), while for a Mondrian CPS, the default output is a vector containing one CPD per test instance, since the number of values may vary between categories. | ||
|
@@ -240,17 +248,11 @@ cpds = cps_mond_norm.predict(y_hat=y_hat_test, | |
|
||
The resulting vector of arrays is not displayed here, but we instead provide a plot for the CPD of a random test instance: | ||
|
||
 | ||
 | ||
|
||
## Examples | ||
|
||
For additional examples of how to use the package and module, including how to use out-of-bag predictions rather than having to rely on dividing the training set into a proper training and calibration set, see [this Jupyter notebook](https://github.com/henrikbostrom/crepes/blob/main/crepes.ipynb). | ||
|
||
## Documentation | ||
|
||
For documentation of the `crepes` package, see [here](http://htmlpreview.github.io/?https://github.com/henrikbostrom/crepes/blob/main/docs/crepes.html). | ||
|
||
For documentation of the `crepes.fillings` module, see [here](http://htmlpreview.github.io/?https://github.com/henrikbostrom/crepes/blob/main/docs/crepes.fillings.html). | ||
For additional examples of how to use the package and module, including how to use out-of-bag predictions rather than having to rely on dividing the training set into a proper training and calibration set, see [the documentation](https://crepes.readthedocs.io/en/latest/) and [this Jupyter notebook](https://github.com/henrikbostrom/crepes/blob/main/docs/crepes_nb.ipynb). | ||
|
||
## Citing crepes | ||
|
||
|
@@ -289,8 +291,11 @@ Bibtex entry: | |
|
||
<a id="7">[7]</a> Boström, H., Johansson, U. and Löfström, T., 2021. Mondrian conformal predictive distributions. In Conformal and Probabilistic Prediction and Applications. PMLR, 152, pp. 24-38. [Link](https://proceedings.mlr.press/v152/bostrom21a.html) | ||
|
||
<a id="8">[8]</a> Vovk, V., 2022. Universal predictive systems. Pattern Recognition. 126: pp. 108536 [Link](https://dl.acm.org/doi/abs/10.1016/j.patcog.2022.108536) | ||
|
||
|
||
- - - | ||
|
||
Author: Henrik Boström ([email protected]) | ||
Copyright 2022 Henrik Boström | ||
Copyright 2023 Henrik Boström | ||
License: BSD 3 clause |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Citing crepes | ||
|
||
If you use `crepes` for a scientific publication, you are welcome to cite the following paper: | ||
|
||
Boström, H., 2022. crepes: a Python Package for Generating Conformal Regressors and Predictive Systems. In Conformal and Probabilistic Prediction and Applications. PMLR, 179. [Link](https://copa-conference.com/papers/COPA2022_paper_11.pdf) | ||
|
||
Bibtex entry: | ||
|
||
```bibtex | ||
@InProceedings{crepes, | ||
title = {crepes: a Python Package for Generating Conformal Regressors and Predictive Systems}, | ||
author = {Bostr\"om, Henrik}, | ||
booktitle = {Proceedings of the Eleventh Symposium on Conformal and Probabilistic Prediction and Applications}, | ||
year = {2022}, | ||
editor = {Johansson, Ulf and Boström, Henrik and An Nguyen, Khuong and Luo, Zhiyuan and Carlsson, Lars}, | ||
volume = {179}, | ||
series = {Proceedings of Machine Learning Research}, | ||
publisher = {PMLR} | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
# Configuration file for the Sphinx documentation builder. | ||
# | ||
# This file only contains a selection of the most common options. For a full | ||
# list see the documentation: | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html | ||
|
||
# -- Path setup -------------------------------------------------------------- | ||
|
||
# If extensions (or modules to document with autodoc) are in another directory, | ||
# add these directories to sys.path here. If the directory is relative to the | ||
# documentation root, use os.path.abspath to make it absolute, like shown here. | ||
# | ||
import os | ||
import sys | ||
sys.path.insert(0, os.path.abspath('../src')) | ||
|
||
# -- Project information ----------------------------------------------------- | ||
|
||
project = 'crepes' | ||
copyright = '2023, Henrik Boström' | ||
author = 'Henrik Boström' | ||
|
||
# The short X.Y version | ||
version = '0.2.0' | ||
|
||
# The full version, including alpha/beta/rc tags | ||
release = '0.2.0' | ||
|
||
|
||
# -- General configuration --------------------------------------------------- | ||
|
||
# Add any Sphinx extension module names here, as strings. They can be | ||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom | ||
# ones. | ||
extensions = [ | ||
"sphinx.ext.autodoc", | ||
"sphinx.ext.napoleon", | ||
"sphinx.ext.intersphinx", | ||
"sphinx.ext.todo", | ||
"numpydoc", | ||
"nbsphinx", | ||
"myst_parser", | ||
# 'jupyter_sphinx' | ||
] | ||
|
||
# Add any paths that contain templates here, relative to this directory. | ||
templates_path = ['_templates'] | ||
|
||
# The language for content autogenerated by Sphinx. Refer to documentation | ||
# for a list of supported languages. | ||
# | ||
# This is also used if you do content translation via gettext catalogs. | ||
# Usually you set "language" from the command line for these cases. | ||
language = 'en' | ||
|
||
# List of patterns, relative to source directory, that match files and | ||
# directories to ignore when looking for source files. | ||
# This pattern also affects html_static_path and html_extra_path. | ||
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] | ||
|
||
|
||
pygments_style = "sphinx" | ||
|
||
# -- Options for HTML output ------------------------------------------------- | ||
|
||
html_theme = "pydata_sphinx_theme" | ||
|
||
html_theme_options = { | ||
"navbar_end": ["navbar-icon-links"], | ||
"sidebarwidth": 270, | ||
"collapse_navigation": False, | ||
"navigation_depth": 4, | ||
"show_toc_level": 2, | ||
"github_url": "https://github.com/henrikbostrom/crepes" | ||
} | ||
|
||
|
||
html_sidebars = {} | ||
|
||
html_context = { | ||
"default_mode": "light", | ||
} | ||
|
||
html_title = f"{project} v. {version}" | ||
html_last_updated_fmt = "%b %d, %Y" | ||
|
||
|
||
# -- Extension configuration ------------------------------------------------- | ||
|
||
source_suffix = { | ||
'.rst': 'restructuredtext', | ||
'.md': 'markdown', | ||
} | ||
|
||
autoclass_content = "both" | ||
|
||
autodoc_member_order = "bysource" | ||
autoclass_member_order = 'bysource' | ||
|
||
#autosummary_generate = True | ||
#autosummary_imported_members = True | ||
|
||
# -- Options for todo extension ---------------------------------------------- | ||
|
||
# If true, `todo` and `todoList` produce output, else they produce nothing. | ||
todo_include_todos = True |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
The crepes.fillings module | ||
========================== | ||
|
||
.. module:: crepes.fillings | ||
|
||
.. autofunction:: sigma_knn | ||
|
||
.. autofunction:: sigma_knn_oob | ||
|
||
.. autofunction:: sigma_variance | ||
|
||
.. autofunction:: sigma_variance_oob | ||
|
||
.. autofunction:: binning | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
The crepes package | ||
================== | ||
|
||
.. module:: crepes | ||
|
||
.. autoclass:: ConformalRegressor | ||
:members: | ||
|
||
.. autoclass:: ConformalPredictiveSystem | ||
:members: | ||
|
||
|
Oops, something went wrong.