Solution:
Note: sometimes your answer doesn't match one of the options exactly. That's fine. Select the option that's closest to your solution.
In this homework, we will use Credit Card Data from book "Econometric Analysis".
Here's a wget-able link:
wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/AER_credit_card_data.csv
The goal of this homework is to inspect the output of different evaluation metrics by creating a classification model (target column card
).
- Create the target variable by mapping
yes
to 1 andno
to 0. - Split the dataset into 3 parts: train/validation/test with 60%/20%/20% distribution. Use
train_test_split
function for that withrandom_state=1
.
ROC AUC could also be used to evaluate feature importance of numerical variables.
Let's do that
- For each numerical variable, use it as score and compute AUC with the
card
variable. - Use the training dataset for that.
If your AUC is < 0.5, invert this variable by putting "-" in front
(e.g. -df_train['expenditure']
)
AUC can go below 0.5 if the variable is negatively correlated with the target varialble. You can change the direction of the correlation by negating this variable - then negative correlation becomes positive.
Which numerical variable (among the following 4) has the highest AUC?
reports
dependents
active
share
From now on, use these columns only:
["reports", "age", "income", "share", "expenditure", "dependents", "months", "majorcards", "active", "owner", "selfemp"]
Apply one-hot-encoding using DictVectorizer
and train the logistic regression with these parameters:
LogisticRegression(solver='liblinear', C=1.0, max_iter=1000)
What's the AUC of this model on the validation dataset? (round to 3 digits)
- 0.615
- 0.515
- 0.715
- 0.995
Now let's compute precision and recall for our model.
- Evaluate the model on the validation dataset on all thresholds from 0.0 to 1.0 with step 0.01
- For each threshold, compute precision and recall
- Plot them
At which threshold precision and recall curves intersect?
- 0.1
- 0.3
- 0.6
- 0.8
Precision and recall are conflicting - when one grows, the other goes down. That's why they are often combined into the F1 score - a metrics that takes into account both
This is the formula for computing
Where
Let's compute F1 for all thresholds from 0.0 to 1.0 with increment 0.01 using the validation set
At which threshold F1 is maximal?
- 0.1
- 0.4
- 0.6
- 0.7
Use the KFold
class from Scikit-Learn to evaluate our model on 5 different folds:
KFold(n_splits=5, shuffle=True, random_state=1)
- Iterate over different folds of
df_full_train
- Split the data into train and validation
- Train the model on train with these parameters:
LogisticRegression(solver='liblinear', C=1.0, max_iter=1000)
- Use AUC to evaluate the model on validation
How large is standard devidation of the AUC scores across different folds?
- 0.003
- 0.014
- 0.09
- 0.24
Now let's use 5-Fold cross-validation to find the best parameter C
- Iterate over the following C values:
[0.01, 0.1, 1, 10]
- Initialize
KFold
with the same parameters as previously - Use these parametes for the model:
LogisticRegression(solver='liblinear', C=C, max_iter=1000)
- Compute the mean score as well as the std (round the mean and std to 3 decimal digits)
Which C leads to the best mean score?
- 0.01
- 0.1
- 1
- 10
If you have ties, select the score with the lowest std. If you still have ties, select the smallest C
- Submit your results here: https://forms.gle/8TfKNRd5Jq7sGK5M9
- You can submit your solution multiple times. In this case, only the last submission will be used
- If your answer doesn't match options exactly, select the closest one
The deadline for submitting is October 3 (Monday), 23:00 CEST.
After that, the form will be closed.