Skip to content

Commit

Permalink
Add Fabrication mode to ACD (#87)
Browse files Browse the repository at this point in the history
Co-authored-by: Darren Edge <[email protected]>
Co-authored-by: Dayenne Souza <[email protected]>
  • Loading branch information
3 people authored Jan 15, 2025
1 parent 6d22ac0 commit 276436f
Show file tree
Hide file tree
Showing 4 changed files with 42 additions and 3,971 deletions.
2 changes: 2 additions & 0 deletions app/workflows/anonymize_case_data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,8 @@ The only user input required for synthesis is the `Epsilon` value, set to `12.00

We recommend using the lowest `Epsilon` value that results in synthetic data with sufficient accurary to support downstream analysis. Start with the default value, and reduce it in small increments at a time. If the accuracy values with an `Epsilon` value of `12.00` are themselves too low, go back to the `Prepare sensitive data` tab and continue refining the sensitive dataset in ways that reduce the `Excess combinations ratio`.

Optionally, you can set the desired `Fabrication mode` for the anonymization process and compare the data quality metrics from the various settings.

After pressing `Anonymize data`, you will see two differential privacy parameters: the `Epsilon` value you set, and a `Delta` value that is generated based on the data (and indicates the very small thereoretical chance that the `Epsilon` privacy guarantee does not hold). It is important to publish both values alongside any DP dataset for correct interpretation of the privacy protection provided.

Once generated, you will see the `Aggregate data` and `Synthetic data` appear on the right hand side:
Expand Down
15 changes: 14 additions & 1 deletion app/workflows/anonymize_case_data/workflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ def create(sv: ds_variables.SessionVariables, workflow: None):
c1, c2, c3 = st.columns([1, 1, 1])
with c1:
st.markdown("#### Anonymize data")
b1, b2 = st.columns([1, 1])
b1, b2, b3 = st.columns([1, 1, 1])

with b1:
epsilon = st.number_input(
Expand All @@ -119,13 +119,26 @@ def create(sv: ds_variables.SessionVariables, workflow: None):
help="The privacy budget, under differential privacy, to use when synthesizing the aggregate dataset.\n\nLower values of epsilon correspond to greater privacy protection but lower data quality.\n\nThe delta parameter is set automatically as 1/(protected_records*ln(protected_records)), where protected_records is the count of sensitive records protected using 0.5% of the privacy budget.\n\n**Rule of thumb**: Aim to keep epsilon at **12** or below.",
)
with b2:
fab_mode = st.selectbox(
"Fabrication mode",
options=["Balanced", "Progressive", "Minimized", "Uncontrolled"],
help="Options for controlling the fabrication of attribute combinations in the anonymized data. Experiment with different settings and compare the resulting data quality."
)
with b3:
if st.button("Anonymize data"):
sv.anonymize_epsilon.value = epsilon
df = sv.anonymize_sensitive_df.value
with st.spinner("Anonymizing data..."):
fab_option = AnonymizeCaseData.FabricationStrategy.BALANCED if fab_mode == "Balanced" \
else AnonymizeCaseData.FabricationStrategy.PROGRESSIVE if fab_mode == "Progressive" \
else AnonymizeCaseData.FabricationStrategy.MINIMIZED if fab_mode == "Minimized" \
else AnonymizeCaseData.FabricationStrategy.UNCONTROLLED


acd.anonymize_case_data(
df=df,
epsilon=epsilon,
fabrication_mode=fab_option
)
sv.anonymize_synthetic_df.value = acd.synthetic_df
sv.anonymize_aggregate_df.value = acd.aggregate_df
Expand Down
Loading

0 comments on commit 276436f

Please sign in to comment.