-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
It seems that ROMM within addNoise is implemented in a way not preserving sample means. Below I suggest how to fix this and speed up the calculations remarkably by utilizing methodology in a recent paper. See https://github.com/olangsrud/RegSDC (hopefully soon on CRAN). Below, I will refer to the functions in that package.
y <- testdata[sample(NROW(testdata), 100), c("expend", "income", "savings")]
addNoise(y, method = "ROMM")$xm
# An almost identical (read about sequentially phenomenon in paper for minor differences) method is
RegSDCromm(y, lambda = 0.001, ensureIntercept = FALSE)
# This can be viewed as a high-speed version of the current implementation in addNoise.
# Sample means is preserved by the default method where ensureIntercept = TRUE.
# Other values of lambda may be used.
RegSDCromm(y, lambda = 0.001)
# This is equivalent to calling a more general function
RegSDCgen(y, lambda = 0.001, makeunique = TRUE)
# The parameter makeunique is of minor importance, but must be TRUE if exact distributional behaviour
# is important (sample form RegSDCromm several times). So setting makeunique to FALSE can be OK.
# Feel free to import/wrap functions from RegSDC within sdcMicro.
# However, this line
RegSDCgen(y, lambda = 0.001, makeunique = FALSE)
# can be implemented without using RegSDC by
lambda <- 0.001
y <- as.matrix(y)
Mean <- function(x) t(matrix(colMeans(x), ncol(x), nrow(x)))
qr1 <- qr(y - Mean(y))
qr1Q <- qr.Q(qr1)
z <- qr1Q + lambda * matrix(rnorm(length(qr1Q)), nrow(y))
qr2 <- qr(z - Mean(z))
Mean(y) + qr.Q(qr2) %*% qr.R(qr1)
# Here Mean can be replaced in several ways. The difference from the result using RegSDCgen is at the
# level of numerical precision (use set.seed to see).
Metadata
Metadata
Assignees
Labels
No labels