Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Fabrication mode to ACD #87

Merged
merged 2 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions app/workflows/anonymize_case_data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,8 @@ The only user input required for synthesis is the `Epsilon` value, set to `12.00

We recommend using the lowest `Epsilon` value that results in synthetic data with sufficient accurary to support downstream analysis. Start with the default value, and reduce it in small increments at a time. If the accuracy values with an `Epsilon` value of `12.00` are themselves too low, go back to the `Prepare sensitive data` tab and continue refining the sensitive dataset in ways that reduce the `Excess combinations ratio`.

Optionally, you can set the desired `Fabrication mode` for the anonymization process and compare the data quality metrics from the various settings.

After pressing `Anonymize data`, you will see two differential privacy parameters: the `Epsilon` value you set, and a `Delta` value that is generated based on the data (and indicates the very small thereoretical chance that the `Epsilon` privacy guarantee does not hold). It is important to publish both values alongside any DP dataset for correct interpretation of the privacy protection provided.

Once generated, you will see the `Aggregate data` and `Synthetic data` appear on the right hand side:
Expand Down
15 changes: 14 additions & 1 deletion app/workflows/anonymize_case_data/workflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ def create(sv: ds_variables.SessionVariables, workflow: None):
c1, c2, c3 = st.columns([1, 1, 1])
with c1:
st.markdown("#### Anonymize data")
b1, b2 = st.columns([1, 1])
b1, b2, b3 = st.columns([1, 1, 1])

with b1:
epsilon = st.number_input(
Expand All @@ -119,13 +119,26 @@ def create(sv: ds_variables.SessionVariables, workflow: None):
help="The privacy budget, under differential privacy, to use when synthesizing the aggregate dataset.\n\nLower values of epsilon correspond to greater privacy protection but lower data quality.\n\nThe delta parameter is set automatically as 1/(protected_records*ln(protected_records)), where protected_records is the count of sensitive records protected using 0.5% of the privacy budget.\n\n**Rule of thumb**: Aim to keep epsilon at **12** or below.",
)
with b2:
fab_mode = st.selectbox(
"Fabrication mode",
options=["Balanced", "Progressive", "Minimized", "Uncontrolled"],
help="Options for controlling the fabrication of attribute combinations in the anonymized data. Experiment with different settings and compare the resulting data quality."
)
with b3:
if st.button("Anonymize data"):
sv.anonymize_epsilon.value = epsilon
df = sv.anonymize_sensitive_df.value
with st.spinner("Anonymizing data..."):
fab_option = AnonymizeCaseData.FabricationStrategy.BALANCED if fab_mode == "Balanced" \
else AnonymizeCaseData.FabricationStrategy.PROGRESSIVE if fab_mode == "Progressive" \
else AnonymizeCaseData.FabricationStrategy.MINIMIZED if fab_mode == "Minimized" \
else AnonymizeCaseData.FabricationStrategy.UNCONTROLLED


acd.anonymize_case_data(
df=df,
epsilon=epsilon,
fabrication_mode=fab_option
)
sv.anonymize_synthetic_df.value = acd.synthetic_df
sv.anonymize_aggregate_df.value = acd.aggregate_df
Expand Down
Loading
Loading