Source code accompanying the paper Implications of Model Indeterminacy for Explanations of Automated Decisions
Clone the repository, then run
conda env update -n indeterminacy -f environment_xxx.yml # choose env.yml file for your system
# This creates a conda environment and install dependencies,
# note the paper results came from a linux machine with environment_x86_64.yml
conda activate indeterminacy # activate the environment
conda develop src # makes the source code importable
They are available on Kaggle. Please read and follow all the applicable rules, terms, and conditions.
Unzip them and place the contents all in the same folder. The contents of that folder should look like:
GiveMeSomeCredit/
- Data Dictionary.xls
- cs-test.csv
- cs-training.csv
- sampleEntry.csv
UCI_Credit_Card/
- UCI_Credit_Card.csv
porto-seguro-safe-driver-prediction/
- sample_submission.csv
- test.csv
- train.csv
You may need to create the UCI_Credit_Card
folder,
as the unzipped contents might just be the csv file.
Edit the yaml file in config/compute/local.yaml
.
You'll need to specify where this raw data folder can be found,
as well as where to put the processed datasets, results, etc.
The project makes use of Hydra for configuration, so if you're familiar with that you can actually set it up to run on different compute environments.
Run the 3 preprocessing scripts in src/indeterminacy/data/preprocess/
.
These can also be run interactively via code cells in an editor that supports them,
or as scripts with ./scripts/preprocess_data.sh
.
Exploratory data analysis reports will be output in the configured results directory.
There's a quick check to make sure the data is loading properly
pytest test/test_data.py
The code for this will eventually be posted here. If you would like to use it sooner, please reach out. I'm happy to make it available to individuals upon request.