From testing of risk parameter

Hello,
I was testing funkcionality of risk and risk_threshold parameter.
In `recordSwap()` was parameter `swaprate` set to `0`, because I wanted to test swapping only of risky values.
But when checking which households were swapped, there was one non-risky household that was swapped.
**This issue is for somebody with enough time to dig up why it happened.**

Data used:
> seed <- 2021
> set.seed(seed)
> nhid <- 10000
> dat <- sdcMicro::createDat( nhid )
> 
> k_anonymity <- 0
> swaprate <- .05
> similar <- list(c("hsize"))
> hier <- c("nuts1","nuts2","nuts3")
> risk_variables <- c("ageGroup","national")
> hid <- "hid"
> expect_swaps <- swaprate * nhid
> risk <- matrix(data = rep(0,3*dim(dat)[1]), ncol = 3) 
> set.seed(seed)
> index <- sample(1:(length(hier)*nrow(dat)), size = expect_swaps)
> risk[index] <- 99 # risk value
> risk_threshold <- 0.9

Then run  `recordSwap()` function:
> dat_s <- recordSwap(
>                     data = dat, hid = hid, hierarchy = hier,
>                     similar,
>                     risk = risk, # TESTING
>                     risk_threshold = risk_threshold, # TESTING
>                     carry_along = NULL,
>                     return_swapped_id = TRUE,
>                     seed=seed,
>                     swaprate = 0  # to test swaping based only on risk values
>                     )
> risk does not contain column names; the first column in risk will be used for the first hierarchy level, e.g nuts1 and so on.
> risk was adjusted in order to give each household member the maximum household risk value
> Donor household was not found in 16 case(s).
> See TRS_logfile.txt for a detailed list

Following calculation of households that should be swapped
> risk_check <- copy(data.table(risk))
> risk_variables_names <- copy(colnames(risk_check))
> risk_check[,hid_help:=dat[[5]]] # 5 = hid
> risk_check[,c(risk_variables_names):=lapply(.SD,max), .SDcols=c(risk_variables_names), by=.(hid_help)] # assign to each household its max value
> risk_check_uniq <- unique(risk_check) # calculate max value in household
> risk_check_uniq[, Sum := rowSums(.SD), .SDcols = 1:3]
> n_of_hd_to_swap <- sum(risk_check_uniq$Sum > 1) # number of households that should be swapped
> n_of_swapped_hd <- dat_s[hid != hid_swapped, uniqueN(hid)] # number of swapped households

> n_of_hd_to_swap
[1] 489
> n_of_swapped_hd
[1] 474

So what is difference in households
> hd_to_swap <- risk_check_uniq$hid_help[risk_check_uniq$Sum > 1
                                       ] # number of households that was should be swapped
> swapped_hd <- dat_s[hid != hid_swapped, unique(hid)
] # number of swapped households

Here are those 16 hd that could not find the donor household
> hd_to_swap[which(!(hd_to_swap %in% swapped_hd))] # hd to swapp not in swapped hd
 [1] 1540 2823 3212 3588 3876 3996 4841 5717 5962 6885 7061 8621 9101 9168 9206 9257

Here is one non-risky household that was swapped. **And the question is why?**
> swapped_hd[which(!(swapped_hd %in% hd_to_swap))] # swapped hd not in hd to swap
[1] 10000

Here are data for this household
> dat_s[which(dat_s$hid==10000),]

| nuts1 | nuts2 | nuts3 | lau2  | hid   | hsize | ageGroup | gender | national | htype | hincome | hid_swapped |
|-------|-------|-------|-------|-------|-------|----------|--------|----------|-------|---------|-------------|
| 3     | 34    | 3405  | 24075 | 10000 | 3     | 6        | 2      | 4        | 10    | 2       | 5064        |
| 3     | 34    | 3405  | 24075 | 10000 | 3     | 1        | 2      | 1        | 10    | 2       | 5064        |
| 3     | 34    | 3405  | 24075 | 10000 | 3     | 4        | 2      | 3        | 10    | 2       | 5064        |

Here are risk values for this household
> risk_check[which(dat_s$hid==10000),]

| V1 | V2 | V3 | hid_help |
|----|----|----|----------|
| 0  | 0  | 0  | 10000    |
| 0  | 0  | 0  | 10000    |
| 0  | 0  | 0  | 10000    |

On recomendation of @JohannesGuss I created this issue.
Whole script can be found in attachment.
[script_issue.docx](https://github.com/sdcTools/sdcMicro/files/10075727/script_issue.docx)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

From testing of risk parameter #333

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nuts1	nuts2	nuts3	lau2	hid	hsize	ageGroup	gender	national	htype	hincome	hid_swapped
3	34	3405	24075	10000	3	6	2	4	10	2	5064
3	34	3405	24075	10000	3	1	2	1	10	2	5064
3	34	3405	24075	10000	3	4	2	3	10	2	5064

From testing of risk parameter #333

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions