-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcrepes.py
121 lines (87 loc) · 4.59 KB
/
crepes.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
It’s clear that the **`crepes`** package provides a flexible and powerful framework for conformal prediction, particularly with the ability to integrate classifiers and calibrate predictions. Considering the challenges with previous approaches (e.g., MAPIE), **`crepes`** appears to be a promising alternative to achieve the desired goals of calibrated probabilities and valid prediction sets.
Let’s break down how we can approach integrating **`crepes`** into your existing pipeline and possibly retain aspects like **beta calibration**:
---
### **Goals to Achieve:**
1. Use the best pre-trained **CatBoost model** (with optimal hyperparameters and selected features) as the base model.
2. Integrate **beta calibration** for probability calibration if applicable.
3. Leverage **`crepes.WrapClassifier`** for conformal prediction, producing **calibrated probabilities** and **prediction sets**.
4. Evaluate the results with metrics like:
- Coverage (proportion of true labels within prediction sets).
- Interval width (size of prediction sets).
---
### **Steps to Implement Conformal Prediction with `crepes`**
#### **1. Wrap the Pre-Trained CatBoost Model**
The **`WrapClassifier`** in `crepes` can handle any scikit-learn-compatible classifier, including your pre-trained CatBoost model. Here’s how to integrate it:
```python
from crepes import WrapClassifier
# Assuming `model` is your pre-trained CatBoostClassifier
wrapped_model = WrapClassifier(model)
# Display the wrapped model
display(wrapped_model)
```
#### **2. Calibrate the Model**
Use the calibration dataset (`X_calib_selected` and `y_calib`) to calibrate the wrapped model.
```python
# Calibrate the model using the calibration dataset
wrapped_model.calibrate(X_calib_selected, y_calib)
# Check calibration status
display(wrapped_model)
```
#### **3. Generate Prediction Sets**
Once calibrated, you can generate prediction sets for the test dataset (`X_test_selected`) using `predict_set`. For example, a 95% confidence level:
```python
# Generate prediction sets for 95% confidence level
prediction_sets = wrapped_model.predict_set(X_test_selected, confidence=0.95)
# Display the prediction sets (binary arrays for each class)
print(prediction_sets)
```
#### **4. Generate Calibrated Probabilities**
To retrieve calibrated probabilities, you can use `predict_proba` on the wrapped model:
```python
# Get calibrated probabilities
calibrated_probs = wrapped_model.predict_proba(X_test_selected)
# Display calibrated probabilities
print(calibrated_probs)
```
#### **5. Evaluate the Model**
Use the `evaluate` method to assess performance, such as coverage and error rates:
```python
# Evaluate performance at a 95% confidence level
evaluation_results = wrapped_model.evaluate(X_test_selected, y_test, confidence=0.95)
# Display evaluation results
print(evaluation_results)
```
---
### **Considerations for Beta Calibration**
If beta calibration was highly effective for your use case, you might want to **pre-calibrate probabilities** using beta calibration and then pass these probabilities to `crepes` for conformal prediction. Here’s how:
1. **Apply Beta Calibration:**
Calibrate the model’s predicted probabilities with beta calibration before wrapping it with `crepes`.
```python
# Calibrate probabilities with beta calibration
calibrated_probs = beta_calibrator.predict_proba(model.predict_proba(X_calib_selected)[:, 1].reshape(-1, 1))
```
2. **Pass Pre-Calibrated Probabilities to `WrapClassifier`:**
Modify the `WrapClassifier` or use `crepes`' internal classes like `ConformalClassifier` to handle pre-calibrated probabilities.
---
### **Visualization of Results**
Once you have the calibrated probabilities and prediction sets, you can visualize the results:
1. **Plot Prediction Sets:**
```python
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
for i, pred_set in enumerate(prediction_sets[:100]): # Plot first 100 samples
plt.scatter([i] * len(pred_set), pred_set, color="blue")
plt.title("Prediction Sets for First 100 Samples")
plt.xlabel("Sample Index")
plt.ylabel("Prediction Set")
plt.show()
```
2. **Compare Coverage:**
Evaluate how often the true label falls within the prediction sets.
---
### **Next Steps**
1. **Implement the steps above with `crepes`.**
2. **Decide whether to retain beta calibration or switch entirely to `crepes`.**
- If retaining beta calibration, we’ll need to pass pre-calibrated probabilities to `crepes`.
3. **Visualize results and evaluate metrics.**
Would you like me to assist further with a specific implementation step or troubleshoot any part of this process?