Feature/55-numerical-sensitive-representativity #120

vigneshgr · 2025-08-24T13:14:13Z

Fixes Issue #55

Added K-means clustering for numerical feature analysis
Enhanced sensitive_representativity to handle numerical features
Added comprehensive test suite
Updated package dependencies with pinned versions because dython package was deprecated
Updated Python version requirement to >=3.9

Why Package Updates Were Needed

During development, we encountered several compatibility issues that required updating the package versions:

The older version of dython (0.6.7) had a deprecated compute_associations function. The new version (0.7.9) uses the associations function with improved API.
The numerical clustering functionality requires newer versions of numpy and scikit-learn for better performance and stability.
The loose version constraints (using >=) were replaced with pinned versions to ensure reproducible builds and prevent unexpected breaks from dependency updates.
Python requirement was updated to >=3.9 because:

The newer versions of numpy (2.3.2) and pandas (2.3.2) require Python 3.9+
This ensures all dependencies work together consistently
Helps prevent potential compatibility issues during installation

These updates make the package more reliable and maintainable while ensuring all contributors work with the same tested dependency versions.

API Changes

No breaking changes to existing API
Added optional parameters to sensitive_representativity:
- n_clusters: Number of clusters for numerical analysis (default: 5)
- num_threshold: Threshold for disproportionate representation warning (default: 0.2)

Test Results

All tests passing:

================================= test session starts ================================= collected 3 items

tests/engines/test_bias_fairness.py::test_numerical_representativity_analysis PASSED [ 33%] tests/engines/test_bias_fairness.py::test_sensitive_representativity PASSED [ 66%] tests/engines/test_bias_fairness.py::test_sensitive_representativity_balanced PASSED [100%]

============================ 3 passed, 2 warnings in 9.02s ============================

Usage Example

from ydata_quality import DataQuality
import pandas as pd

# Load data
df = pd.DataFrame({
    'numerical_sensitive': [1, 1.1, 5, 5.1, 10, 10.1, 10.2, 10.3, 10.4, 10.5],
    'categorical_sensitive': ['A', 'A', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C'],
    'label': [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
})

# Create DataQuality object with sensitive features
dq = DataQuality(
    df=df,
    sensitive_features=['numerical_sensitive', 'categorical_sensitive']
)

# Run analysis
results = dq.evaluate()

# Get warnings about representativity issues
warnings = dq.get_warnings()

Checklist
 Added new feature
 Added tests
 Updated dependencies
 All tests passing
 Documentation update

Fixes ydataai#55 - Added K-means clustering for numerical feature analysis - Enhanced sensitive_representativity to handle numerical features - Added comprehensive test suite - Updated package dependencies with pinned versions - Updated Python version requirement to >=3.9

…r BiasFairness engine and tests

vigneshgr · 2025-09-03T00:48:12Z

@portellaa @gmartinsribeiro Can you please review and let know your feedback. Thanks!

vigneshgr added 6 commits August 24, 2025 08:50

Add numerical feature support to sensitive representativity test

fa04877

fix: update README and requirements for Python compatibility; refacto…

a9462f7

…r BiasFairness engine and tests

fixing static codacy issues for bias checks

8d14084

remove whitespaces to fix codacy errors

8b55fd8

Fixing codacy error to finish with a new line

491c441

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/55-numerical-sensitive-representativity #120

Feature/55-numerical-sensitive-representativity #120

Uh oh!

vigneshgr commented Aug 24, 2025

Uh oh!

vigneshgr commented Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feature/55-numerical-sensitive-representativity #120

Are you sure you want to change the base?

Feature/55-numerical-sensitive-representativity #120

Uh oh!

Conversation

vigneshgr commented Aug 24, 2025

Why Package Updates Were Needed

API Changes

Test Results

Usage Example

Uh oh!

vigneshgr commented Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant