-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Labels
Description
This is an example from MonteroSerrano, Javier, to practically see the oversuppresion problem.
Overprotection in 6x2 example
# Example with 6x2 data frame where kAnon (k = 3) makes 3 suppressions,
# while 1 suppression would have been enough.
# (Note: 3 suppressions would be needed with alpha = 0, but not with alpha = 1).
# Create data
data_3 <- data.frame(
gender = c("male", "male", "male", "male", "male", "male"),
education = c("no education", "primary", "primary", "primary", "secondary", "secondary"))
# Create sdc object
sdc_data_3 <- createSdcObj(data_3, keyVars = c("gender", "education"), alpha = 1)
# kAnon with k = 3 makes 3 suppressions, but 1 suppression would have been enough.
sdc_data_kAnon <- kAnon(sdc_data_3, k = 3)
extractManipData(sdc_data_kAnon)
print(sdc_data_kAnon, "kAnon")
# Manually forcing 1 suppression generates data that already comply with 3-anonymity:
data_3_edited <- data_3
data_3_edited[1,2] <- NA_character_
sdc_data_kAnon_manual <- createSdcObj(data_3_edited, keyVars = c("gender", "education"), alpha = 1)
print(data_3_edited)
print(sdc_data_kAnon_manual, "kAnon")
The reason is that kAnon is a heuristic algorithm that lead to oversuppression.
Idea of extensions: Implement a linear mixed-interger linear programming solution for small problems for an optimal suppression pattern. Guidance is given in Ton de Waal's book, Handbook of Statistical Data Editing and Imputation (Wiley).