Moved codebase from the previous repo

DataResponsibly · Feb 1, 2023 · 4af12ef · 4af12ef
1 parent cc27e73
commit 4af12ef
Show file tree

Hide file tree

Showing 108 changed files with 117,454 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,204 @@
+*_venv
+.DS_Store
+.ipynb_checkpoints
+
+# Created by https://www.gitignore.io/api/python,pycharm+all
+# Edit at https://www.gitignore.io/?templates=python,pycharm+all
+
+### PyCharm+all ###
+# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and WebStorm
+# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
+
+doc/build/*
+doc/_build/*
+
+.idea
+
+.idea/*.xml
+.idea/*.iml
+
+# User-specific stuff
+.idea/**/workspace.xml
+.idea/**/tasks.xml
+.idea/**/usage.statistics.xml
+.idea/**/dictionaries
+.idea/**/shelf
+
+# Generated files
+.idea/**/contentModel.xml
+
+# Sensitive or high-churn files
+.idea/**/dataSources/
+.idea/**/dataSources.ids
+.idea/**/dataSources.local.xml
+.idea/**/sqlDataSources.xml
+.idea/**/dynamic.xml
+.idea/**/uiDesigner.xml
+.idea/**/dbnavigator.xml
+
+# Gradle
+.idea/**/gradle.xml
+.idea/**/libraries
+
+# Gradle and Maven with auto-import
+# When using Gradle or Maven with auto-import, you should exclude module files,
+# since they will be recreated, and may cause churn.  Uncomment if using
+# auto-import.
+# .idea/modules.xml
+# .idea/*.iml
+# .idea/modules
+# *.iml
+# *.ipr
+
+# CMake
+cmake-build-*/
+
+# Mongo Explorer plugin
+.idea/**/mongoSettings.xml
+
+# File-based project format
+*.iws
+
+# mpeltonen/sbt-idea plugin
+.idea_modules/
+
+# JIRA plugin
+atlassian-ide-plugin.xml
+
+# Cursive Clojure plugin
+.idea/replstate.xml
+
+# Crashlytics plugin (for Android Studio and IntelliJ)
+com_crashlytics_export_strings.xml
+crashlytics.properties
+crashlytics-build.properties
+fabric.properties
+
+# Editor-based Rest Client
+.idea/httpRequests
+
+# Android studio 3.1+ serialized cache file
+.idea/caches/build_file_checksums.ser
+
+### PyCharm+all Patch ###
+# Ignores the whole .idea folder and all .iml files
+# See https://github.com/joeblau/gitignore.io/issues/186 and https://github.com/joeblau/gitignore.io/issues/360
+
+../.idea/
+
+# Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-249601023
+
+*.iml
+modules.xml
+.idea/misc.xml
+*.ipr
+
+# Sonarlint plugin
+.idea/sonarlint
+
+### Python ###
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# Mr Developer
+.mr.developer.cfg
+.project
+.pydevproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+venv/
+# End of https://www.gitignore.io/api/python,pycharm+all
diff --git a/Makefile b/Makefile
@@ -0,0 +1,14 @@
+COMMIT_HASH := $(shell eval git rev-parse HEAD)
+
+convert-notebooks:
+	jupyter nbconvert --to markdown docs/examples/**.ipynb
+
+doc:
+	yamp source --out docs/api --verbose
+	mkdocs build
+
+livedoc: doc
+	mkdocs serve
+
+develop:
+	python ./setup.py develop
diff --git a/docs/.pages b/docs/.pages
@@ -0,0 +1,4 @@
+nav:
+  - api
+  - examples
+  - release_notes
diff --git a/docs/api/.pages b/docs/api/.pages
@@ -0,0 +1,4 @@
+title: API Reference 📚
+arrange:
+  - overview.md
+  - ...
diff --git a/docs/api/analyzers/.pages b/docs/api/analyzers/.pages
@@ -0,0 +1 @@
+title: analyzers
diff --git a/docs/api/analyzers/AbstractOverallVarianceAnalyzer.md b/docs/api/analyzers/AbstractOverallVarianceAnalyzer.md
@@ -0,0 +1,75 @@
+# AbstractOverallVarianceAnalyzer
+
+Abstract class for an analyzer that computes overall variance metrics for subgroups.
+
+
+
+## Parameters
+
+- **base_model**
+
+    Base model for stability measuring
+
+- **base_model_name** (*str*)
+
+    Model name like 'HoeffdingTreeClassifier' or 'LogisticRegression'
+
+- **bootstrap_fraction** (*float*)
+
+    [0-1], fraction from train_pd_dataset for fitting an ensemble of base models
+
+- **X_train** (*pandas.core.frame.DataFrame*)
+
+    Processed features train set
+
+- **y_train** (*pandas.core.frame.DataFrame*)
+
+    Targets train set
+
+- **X_test** (*pandas.core.frame.DataFrame*)
+
+    Processed features test set
+
+- **y_test** (*pandas.core.frame.DataFrame*)
+
+    Targets test set
+
+- **dataset_name** (*str*)
+
+    Name of dataset, used for correct results naming
+
+- **n_estimators** (*int*)
+
+    Number of estimators in ensemble to measure base_model stability
+
+
+
+
+## Methods
+
+???- note "UQ_by_boostrap"
+
+    Quantifying uncertainty of the base model by constructing an ensemble from bootstrapped samples.
+
+    Return a dictionary where keys are models indexes, and values are lists of  correspondent model predictions for X_test set.
+
+    **Parameters**
+
+    - **boostrap_size**     (*int*)    
+    - **with_replacement**     (*bool*)    
+
+???- note "compute_metrics"
+
+    Measure metrics for the base model. Display plots for analysis if needed. Save results to a .pkl file
+
+    **Parameters**
+
+    - **make_plots**     (*bool*)     – defaults to `False`    
+    - **save_results**     (*bool*)     – defaults to `True`    
+
+???- note "get_metrics_dict"
+
+???- note "print_metrics"
+
+???- note "save_metrics_to_file"
+
diff --git a/docs/api/analyzers/AbstractSubgroupAnalyzer.md b/docs/api/analyzers/AbstractSubgroupAnalyzer.md
@@ -0,0 +1,51 @@
+# AbstractSubgroupAnalyzer
+
+Abstract class for a subgroup analyzer to compute metrics for subgroups.
+
+
+
+## Parameters
+
+- **X_test** (*pandas.core.frame.DataFrame*)
+
+    Processed features test set
+
+- **y_test** (*pandas.core.frame.DataFrame*)
+
+    Targets test set
+
+- **sensitive_attributes_dct** (*dict*)
+
+    A dictionary where keys are sensitive attributes names (including attributes intersections),  and values are privilege values for these attributes
+
+- **test_protected_groups** (*dict*)
+
+    A dictionary where keys are sensitive attributes, and values input dataset rows  that are correspondent to these sensitive attributes
+
+
+
+
+## Methods
+
+???- note "compute_subgroup_metrics"
+
+    Compute metrics for each subgroup in self.test_protected_groups using _compute_metrics method.
+
+    Return a dictionary where keys are subgroup names, and values are subgroup metrics.
+
+    **Parameters**
+
+    - **y_preds**    
+    - **save_results**     (*bool*)    
+    - **result_filename**     (*str*)     – defaults to `None`    
+    - **save_dir_path**     (*str*)     – defaults to `None`    
+
+???- note "save_metrics_to_file"
+
+
+
+    **Parameters**
+
+    - **result_filename**     (*str*)    
+    - **save_dir_path**     (*str*)    
+