The Proxy Wars Tool helps data scientists identify and mitigate bias-inducing proxy variables within datasets. Proxy variables are attributes that correlate with sensitive variables (e.g., gender, race, age) and may unintentionally introduce bias into machine learning models.
- CSV Dataset Upload: Analyze numerical data from uploaded files.
- Algorithm Selection: Choose from Correlation Analysis, FACET, or Association Rule Mining (ARM) to identify proxy variables.
- Dataset Filtering: Refine datasets using random sampling or SQL-based filters.
- Dark Mode Support: Toggle between light and dark themes for a better user experience.
- Results Visualization: Display outputs in dynamic tables with sorting capabilities.
- Operating System: Windows, macOS, or Linux
- Node.js: v18.16.0 or higher
- Python: v3.11 or higher
- Flask: v2.3.3
- React: v18.2.0
-
Navigate to the /frontend folder and run:
npm install
-
Navigate to the /backend folder and run:
pip install -r requirements.txt
-
If not already installed, download Docker Desktop from here.
-
Verify Docker installation:
docker --version
-
Navigate to your project folder:
cd path/to/project -
Build the Docker container:
docker-compose build
-
Start the Docker container:
docker-compose up
-
Access the tool in Chrome: Open http://localhost:3000.
Open 2 terminals: one for the frontend and one for the backend.
Backend Setup:
-
Navigate to the backend folder:
cd backend -
Install Python dependencies:
pip install -r requirements.txt
-
Run the Flask app:
python src/controllers/app.py
Frontend Setup:
-
Navigate to the frontend folder:
cd frontend -
Install frontend dependencies:
npm install
-
Start the development server:
npm start
- Click the Upload Dataset button.
- Select a valid
.csvfile (only numerical columns). - Click Upload.
- A confirmation message will appear, and the dataset columns will be listed.
- Locate the Dark Mode Toggle in the top-right corner of the interface.
- Click to switch between Light Mode (default) and Dark Mode.
-
From the dropdown menu, choose an algorithm:
- Correlation Analysis: Calculates relationships between variables.
- FACET: Detects redundancy using feature selection.
- Association Rule Mining (ARM): Generates association rules.
-
If selecting FACET, specify a Target Variable.
- Complete Dataset: Use the full dataset.
- Random Sampling:
- Specify a percentage (e.g., 50%).
- Enter a Random Seed for reproducibility.
- SQL Filter: Enter a filter condition (e.g.,
age > 30 AND income < 50000).
-
Click Results to run the analysis.
-
The results are displayed in a table:
- Correlation Analysis: Displays Pearson, Kendall, and Spearman coefficients.
- FACET: Displays redundancy metrics.
- ARM: Displays support, confidence, and lift values.
-
Sort results by clicking on the column headers to organize by any metric.
- All tables and UI elements will adapt to Dark Mode when enabled.
- Use dropdowns to:
- Select a column for sorting.
- Choose a metric (e.g., Pearson coefficient for Correlation Analysis).
- Click the Sort button to organize results.
- No File Uploaded: Ensure you have selected a valid
.csvfile. - Target Variable Missing: Choose a target variable if using FACET.
- Sensitive Variables Not Set: Ensure you have selected sensitive variables for analysis.
- Correlation Analysis: Calculates relationships between variables using Pearson, Kendall, and Spearman coefficients.
- FACET: Detects redundancy using Random Forest feature selection.
- Association Rule Mining (ARM): Identifies patterns and associations with metrics like support, confidence, and lift.
- Titanic Dataset: Analyze survival likelihood based on various features.
- Census Data: Explore income-related proxies, such as education and occupation.