-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevalence estimates returned by mwana
functions are slightly different than those returned by ENA
#87
Comments
mwana
functions is slightly different than those returned by ENAmwana
functions are slightly different than those returned by ENA
@tomaszaba can you please share the zscores calculated by ENA software for the same dataset that you are using here? |
@ernestguevarra , please find attached. |
@tomaszaba, see my Rscript looking at the zscore calculations for The Rscript is here: https://github.com/nutriverse/mwana/blob/fix/multiple-issues/data-raw/check_zscores.R Here are the results comparing SAM counts for each method: sam n_ena n_zscorer n_anthro
1 0 999 1002 1002
2 1 25 23 23
3 NA 62 61 61 Here are the results comparing MAM counts for each method: mam n_ena n_zscorer n_anthro
1 0 975 973 973
2 1 49 51 51
3 NA 62 62 62 I have reviewed the I will let you decide what to do with this but I think some people will question this (just like the open issues in |
We probably should write a vignette/article regarding this discrepancy. |
@tomaszaba, I don't understand what this means. What do you mean by error here? Can you give me an actual example that I can reproduce the errro? Totally confused and you are not explaining yourself very well. |
Based on your explanation of the differences, why not engineer your functions such that you have the option to match what ENA does but also do the things that are not there in ENA yet? This is important because these changes to ENA that you describe will happen in the near future will/may have an impact on estimates so having a function that can replicate the old style ENA and also do the new style ENA will be helpful for analysis that determines the impact of the change to the old results produced by ENA. |
same comment as earlier comment. Why not make it possible to replicate what the current ENA does but at the same time do the new things so that comparisons can be made using the same set of functions... |
@ernestguevarra, sorry this was not clear. Let me try to clarify in this message. For that, I will paste the following table (snippet from the first message of this issue): From my analysis, this issues comes from this code: p <- srvy |>
group_by({{ .summary_by }}) |>
filter(.data$flag_wfhz == 0) |>
summarise(
across(
c(.data$gam:.data$mam),
list(
n = \(.)sum(., na.rm = TRUE),
p = \(.)survey_mean(.,
vartype = "ci",
level = 0.95,
deff = TRUE,
na.rm = TRUE
)
)
),
wt_pop = sum(srvyr::cur_svy_wts())
)
p
} particularly this part: summarise(
across(
c(.data$gam:.data$mam),
list(
n = \(.)sum(., na.rm = TRUE), What I don't understand is what could be wrong here, but is not wrong in the other prevalence estimators, since I use the same approach to get the total of positive cases of acute malnutrition and there they work just fine. I hope this is clear now. |
@ernestguevarra, thank you for finding time to review this issue. I think this is something that I will have to raise with the SMART team when we share the package with them, and see what they would say. This clearly demonstrates that the issue is not in Thank you for looking in to this issue. |
I think this is a good idea, but I will only be able to work on this after December 20. I am currently occupied with an analysis until that date. |
@ernestguevarra, I've just had a new perspective on where this issue might be stemming from. I think the discrepancy might be due the fact that ENA applies a "first-order" flagging on the raw weight and height, before the zscores get computed to then apply a "second-order" flagging criteria. See the snippet below from the ENA data entry tab. I think we would need to re-factor the WFHZ wrangler to set NA in cases where weight and height would be out of those ranges. I had clearly missed this one. What do you think? |
Is this documented in the SMART methodology guidance? or just in ENA? You see how not being open source is a cumbersome thing in general. |
…ghtly different than those returned by ENA #87
Hi Ernest, I discussed about this issue with Douglas. I mentioned and demonstrated the discrepancies that arise from the ENA generated z-scores compared to
## Calculate difference in reference to WHO zscores -----
douglas <- addWGSR(
df, sex = "sex", firstPart = "wt", secondPart = "ht",
index = "wfh", output = "wfhz_zscorer", digits = 3
) |>
dplyr::mutate(
wfhz_ena_2 = round(wfhz_ena, digits = 2),
wfhz_zscorer_2 = round(wfhz_zscorer, digits = 2),
wfhz_anthro = anthro_zscores(sex = sex, weight = wt, lenhei = ht)$zwfl,
who_ena = wfhz_anthro - wfhz_ena_2,
who_zscorer_mwana = wfhz_anthro - wfhz_zscorer_2
) The presentation of |
As
mwana
could be seen as a neat and tidy implementation of SMART, we might want to assess how much big the observed differences in the prevalences estimates returned bymwana
utilities versus by ENA for SMART software are. In general, I think they are marginal differences that could be explained by the differences in softwares, hence not to worry about, however I want to make sure we are both aware of this. Below I share a list of summary tables of what both softwares return for different functions:mw_estimate_prevalence_wfhz()
:mwana
resultsmwana
- ENA)Overall the difference is quite marginal, however there is an error in the sum of positive cases. This does not affect the actual prevalence estimate as the sums are calculated outside the `srvyr::survey_mean() function. The error stems from a line of code where I ask for the sum of positive cases, see below. Curious is that this is the sample approach I am using in muac and combined-based prevalence functions.
Can you please check what I might have done wrong here?
mw_estimate_prevalence_muac()
:mwana
resultsmwana
- ENA)mwana
resultsmwana
- ENA)In this example, the observed differences are expected. This is simply because:
mwana
uses flags based on z-scores.mw_estimate_prevalence_combined()
:mwana
resultsmwana
- ENA)Similar to
mw_estimate_prevalence_muac()
, combined estimates returned bymw_estimate_prevalence_combined()
are not comparable with those from ENA. This is explained by the following factors:mwana
uses flags based on sample mean. This leads to different values being included or excluded.mwana
uses combined flagging criteria so that all flags detected in both WFHZ and MUAC (based on MFAZ) are removed. This is not available in ENA yet.The text was updated successfully, but these errors were encountered: