-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
Maartje Boer mentioned to speed up kAnon by parallelization.
Here is a simple code that shows that parallelization of kAnon would be beneficial regarding stratification.
library(ggplot2)
library(sdcMicro)
data(testdata)
testdata$ageG <- cut(testdata$age, 5, labels=paste0("AG",1:5))
kv <- c("urbrur", "roof", "walls", "water", "electcon", "relat", "sex")
## data.frame method (no stratification)
system.time(res <- kAnon(testdata, keyVars = kv))
system.time(res2 <- kAnon(testdata, keyVars = kv, strataVars = "ageG"))
plot(res)
plot(res2)
bs <- function(df, n = nrow(df)){
sample_indices <- sample(seq_len(nrow(df)), size = n, replace = TRUE)
bootstrap_sample <- df[sample_indices, , drop = FALSE]
return(bootstrap_sample)
}
f <- function(x = testdata, kv, size, svar = NULL){
ctime <- system.time(res <- kAnon(bs(x, size), keyVars = kv, strataVar = svar))["elapsed"]
return(ctime)
}
N <- seq(100000, 5000000, 500000)
mytime_strat <- mytime <- numeric(length(N))
for(i in 1:length(N)){
mytime[i] <- f(testdata, kv, N[i])
mytime_strat[i] <- f(testdata, kv, N[i], svar = "ageG")
}
mytimes <- data.frame("time" = c(mytime,mytime_strat),
"N" = rep(N, 2),
"method" = rep(c("no strat", "strat"), each = length(N)))
options(scipen = 999)
ggplot(mytimes, aes(x = N, y = time, colour = method)) +
geom_line() +
geom_point()
The strata might be calculated on different cores, which might get the computation times nearly to the non-strata case.
See code line 471 of localSuppression.R, so see where parallelization might come into play.
Note that further parameters might be varied: alpha and number of keys, and benchmarking might be extended (e.g. with microbenchmark)
Metadata
Metadata
Assignees
Labels
No labels