Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevalence estimates returned by mwana functions are slightly different than those returned by ENA #87

Open
tomaszaba opened this issue Nov 8, 2024 · 13 comments
Assignees
Labels
analysis question Further information is requested testing code testing
Milestone

Comments

@tomaszaba
Copy link
Collaborator

tomaszaba commented Nov 8, 2024

As mwana could be seen as a neat and tidy implementation of SMART, we might want to assess how much big the observed differences in the prevalences estimates returned by mwana utilities versus by ENA for SMART software are. In general, I think they are marginal differences that could be explained by the differences in softwares, hence not to worry about, however I want to make sure we are both aware of this. Below I share a list of summary tables of what both softwares return for different functions:

mw_estimate_prevalence_wfhz():

  • GAM by WFHZ and/or edema
ENA results mwana results Difference (mwana - ENA)
All(967): (55) 5.7% (4.4 - 7.4) All(970): (53), 5.5% (4.0 - 7.0) - 0.2%
All(967): (11) 1.1% (0.6 - 2.2) All(970): (10), 1.0% (0.2 - 1.8) - 0.1 %
All(967): (44) 4.6% (3.4 - 6.1) All(970): (43), 4.4% (3.1 - 5.7) - 0.2 %

Overall the difference is quite marginal, however there is an error in the sum of positive cases. This does not affect the actual prevalence estimate as the sums are calculated outside the `srvyr::survey_mean() function. The error stems from a line of code where I ask for the sum of positive cases, see below. Curious is that this is the sample approach I am using in muac and combined-based prevalence functions.

 p <- srvy |>
    group_by({{ .summary_by }}) |>
    filter(.data$flag_wfhz == 0) |>
    summarise(
      across(
        c(.data$gam:.data$mam),
        list(
          n = \(.)sum(., na.rm = TRUE),
          p = \(.)survey_mean(.,
                  vartype = "ci",
                  level = 0.95,
                  deff = TRUE,
                  na.rm = TRUE
          )
        )
      ),
      wt_pop = sum(srvyr::cur_svy_wts())
    )
  p
}

Can you please check what I might have done wrong here?

mw_estimate_prevalence_muac():

  • GAM by MUAC and/or edema
ENA results mwana results Difference (mwana - ENA)
All(1044): (70) 6.7% (5.0 - 8.9) All(1034): (61), 5.9% (4.1 - 7.7) - 0.8%
All(1044): (28) 2.7% (1.8 - 4.0) All(1034): (19), 1.8% (0.9 - 2.7) - 0.9 %
All(1044): (42) 4.0% (2.7 - 5.9) All(1034): (42), 4.1% (2.5 - 5.6) 0.1 %
  • Weighted prevalence
ENA results mwana results Difference (mwana - ENA)
Not available All(927,934): (61), 5.7% (3.7 - 7.7) Not applicable
Not available All(927,934): (19), 2.0% (0.7 - 3.2) Not applicable
Not available All(927,934): (42), 3.7% (2.3 - 5.2) Not applicable

In this example, the observed differences are expected. This is simply because:

  • The current version of ENA still uses the exclusion criteria of MUAC of: < 70 or >230 mm while mwana uses flags based on z-scores.
  • In the weighted analysis, ENA does not calculate weighted prevalence estimates for MUAC, hence not possible to compare.

mw_estimate_prevalence_combined():

  • Combined prevalence including edema
ENA results mwana results Difference (mwana - ENA)
All(1048): (99) 9.4% (7.5 - 11.9) All(959): (79), 8.2% (6.1 - 10.3) 0.9%
All(1048): (28) 2.7% (1.8 - 4.0) All(959): (18), 1.9% (0.9 - 2.9) - 0.8 %
Not reported All(959): (65), 6.8% (5.0 - 8.7) Not applicable

Similar to mw_estimate_prevalence_muac(), combined estimates returned by mw_estimate_prevalence_combined() are not comparable with those from ENA. This is explained by the following factors:

  • As aforementioned regarding MUAC, the current version of ENA uses a fixed exclusion criteria, while mwana uses flags based on sample mean. This leads to different values being included or excluded.
  • Moreover, mwana uses combined flagging criteria so that all flags detected in both WFHZ and MUAC (based on MFAZ) are removed. This is not available in ENA yet.
  • Regarding the weighted analysis, the current version of ENA does not provide this.
@tomaszaba tomaszaba added this to the Miscellaneous milestone Nov 8, 2024
@tomaszaba tomaszaba mentioned this issue Nov 14, 2024
@tomaszaba tomaszaba changed the title Prevalence estimates returned by mwana functions is slightly different than those returned by ENA Prevalence estimates returned by mwana functions are slightly different than those returned by ENA Nov 16, 2024
@ernestguevarra
Copy link
Member

@tomaszaba can you please share the zscores calculated by ENA software for the same dataset that you are using here?

@ernestguevarra ernestguevarra added the testing code testing label Dec 2, 2024
@tomaszaba
Copy link
Collaborator Author

@tomaszaba can you please share the zscores calculated by ENA software for the same dataset that you are using here?

ENA_generated_zscores.csv

@ernestguevarra , please find attached.

@ernestguevarra
Copy link
Member

@tomaszaba, see my Rscript looking at the zscore calculations for zscorer package and the WHO anthro package for WFHZ using the data you share. I put the script in the data-raw folder but I gitignored your dataset. You can replicate my observations by putting your dataset in the data-raw directory and running the script. Use the fix/multiple-issues branch to see the script.

The Rscript is here: https://github.com/nutriverse/mwana/blob/fix/multiple-issues/data-raw/check_zscores.R

Here are the results comparing SAM counts for each method:

  sam n_ena n_zscorer n_anthro
1   0   999      1002     1002
2   1    25        23       23
3  NA    62        61       61

Here are the results comparing MAM counts for each method:

  mam n_ena n_zscorer n_anthro
1   0   975       973      973
2   1    49        51       51
3  NA    62        62       62

I have reviewed the anthro package and its internals are very much the same as zscorer. Both packages use the same rules (WHO) for zscore calculation. So, here, with ENA outputing something different, the question is to know what rules SMART ENA are using and to see how this is implemented. I don't think the code for ENA is open source. Too bad.

I will let you decide what to do with this but I think some people will question this (just like the open issues in zscorer repository questioning the results of the zscore calculations.

@ernestguevarra ernestguevarra added the question Further information is requested label Dec 3, 2024
@ernestguevarra
Copy link
Member

We probably should write a vignette/article regarding this discrepancy.

@ernestguevarra
Copy link
Member

Overall the difference is quite marginal, however there is an error in the sum of positive cases. This does not affect the actual prevalence estimate as the sums are calculated outside the `srvyr::survey_mean() function. The error stems from a line of code where I ask for the sum of positive cases, see below. Curious is that this is the sample approach I am using in muac and combined-based prevalence functions.

@tomaszaba, I don't understand what this means. What do you mean by error here? Can you give me an actual example that I can reproduce the errro? Totally confused and you are not explaining yourself very well.

@ernestguevarra
Copy link
Member

  • The current version of ENA still uses the exclusion criteria of MUAC of: < 70 or >230 mm while mwana uses flags based on z-scores.
  • In the weighted analysis, ENA does not calculate weighted prevalence estimates for MUAC, hence not possible to compare.

Based on your explanation of the differences, why not engineer your functions such that you have the option to match what ENA does but also do the things that are not there in ENA yet? This is important because these changes to ENA that you describe will happen in the near future will/may have an impact on estimates so having a function that can replicate the old style ENA and also do the new style ENA will be helpful for analysis that determines the impact of the change to the old results produced by ENA.

@ernestguevarra
Copy link
Member

  • As aforementioned regarding MUAC, the current version of ENA uses a fixed exclusion criteria, while mwana uses flags based on sample mean. This leads to different values being included or excluded.
  • Moreover, mwana uses combined flagging criteria so that all flags detected in both WFHZ and MUAC (based on MFAZ) are removed. This is not available in ENA yet.
  • Regarding the weighted analysis, the current version of ENA does not provide this.

same comment as earlier comment. Why not make it possible to replicate what the current ENA does but at the same time do the new things so that comparisons can be made using the same set of functions...

@tomaszaba
Copy link
Collaborator Author

Overall the difference is quite marginal, however there is an error in the sum of positive cases. This does not affect the actual prevalence estimate as the sums are calculated outside the `srvyr::survey_mean() function. The error stems from a line of code where I ask for the sum of positive cases, see below. Curious is that this is the sample approach I am using in muac and combined-based prevalence functions.

@tomaszaba, I don't understand what this means. What do you mean by error here? Can you give me an actual example that I can reproduce the errro? Totally confused and you are not explaining yourself very well.

@ernestguevarra, sorry this was not clear. Let me try to clarify in this message. For that, I will paste the following table (snippet from the first message of this issue):
image
Please look at the number of cases in brackets between "All(#)" and the percentage. In this table, the total number of positive cases of GAM (WHZ < -2/or edema) returned is 70 and 33 for SAM (WHZ < -3/or edema). Those numbers are incorrect, but the % is correct. The number should be close to that displayed in 'ena' column, cognisant of your findings above. That awkward difference only happens for GAM and SAM in the mw_estimate_prevalence_wfhz(). In the other functions all work well.

From my analysis, this issues comes from this code:

p <- srvy |>
    group_by({{ .summary_by }}) |>
    filter(.data$flag_wfhz == 0) |>
    summarise(
      across(
        c(.data$gam:.data$mam),
        list(
          n = \(.)sum(., na.rm = TRUE),
          p = \(.)survey_mean(.,
                  vartype = "ci",
                  level = 0.95,
                  deff = TRUE,
                  na.rm = TRUE
          )
        )
      ),
      wt_pop = sum(srvyr::cur_svy_wts())
    )
  p
}

particularly this part:

summarise(
      across(
        c(.data$gam:.data$mam),
        list(
          n = \(.)sum(., na.rm = TRUE),

What I don't understand is what could be wrong here, but is not wrong in the other prevalence estimators, since I use the same approach to get the total of positive cases of acute malnutrition and there they work just fine.

I hope this is clear now.

@tomaszaba
Copy link
Collaborator Author

@tomaszaba, see my Rscript looking at the zscore calculations for zscorer package and the WHO anthro package for WFHZ using the data you share. I put the script in the data-raw folder but I gitignored your dataset. You can replicate my observations by putting your dataset in the data-raw directory and running the script. Use the fix/multiple-issues branch to see the script.

The Rscript is here: https://github.com/nutriverse/mwana/blob/fix/multiple-issues/data-raw/check_zscores.R

Here are the results comparing SAM counts for each method:

  sam n_ena n_zscorer n_anthro
1   0   999      1002     1002
2   1    25        23       23
3  NA    62        61       61

Here are the results comparing MAM counts for each method:

  mam n_ena n_zscorer n_anthro
1   0   975       973      973
2   1    49        51       51
3  NA    62        62       62

I have reviewed the anthro package and its internals are very much the same as zscorer. Both packages use the same rules (WHO) for zscore calculation. So, here, with ENA outputing something different, the question is to know what rules SMART ENA are using and to see how this is implemented. I don't think the code for ENA is open source. Too bad.

I will let you decide what to do with this but I think some people will question this (just like the open issues in zscorer repository questioning the results of the zscore calculations.

@ernestguevarra, thank you for finding time to review this issue. I think this is something that I will have to raise with the SMART team when we share the package with them, and see what they would say. This clearly demonstrates that the issue is not in {mwana} (or {zscorer}).

Thank you for looking in to this issue.

@tomaszaba
Copy link
Collaborator Author

  • The current version of ENA still uses the exclusion criteria of MUAC of: < 70 or >230 mm while mwana uses flags based on z-scores.
  • In the weighted analysis, ENA does not calculate weighted prevalence estimates for MUAC, hence not possible to compare.

Based on your explanation of the differences, why not engineer your functions such that you have the option to match what ENA does but also do the things that are not there in ENA yet? This is important because these changes to ENA that you describe will happen in the near future will/may have an impact on estimates so having a function that can replicate the old style ENA and also do the new style ENA will be helpful for analysis that determines the impact of the change to the old results produced by ENA.

I think this is a good idea, but I will only be able to work on this after December 20. I am currently occupied with an analysis until that date.

@tomaszaba
Copy link
Collaborator Author

@tomaszaba, see my Rscript looking at the zscore calculations for zscorer package and the WHO anthro package for WFHZ using the data you share. I put the script in the data-raw folder but I gitignored your dataset. You can replicate my observations by putting your dataset in the data-raw directory and running the script. Use the fix/multiple-issues branch to see the script.

The Rscript is here: https://github.com/nutriverse/mwana/blob/fix/multiple-issues/data-raw/check_zscores.R

Here are the results comparing SAM counts for each method:

sam n_ena n_zscorer n_anthro

1 0 999 1002 1002

2 1 25 23 23

3 NA 62 61 61

Here are the results comparing MAM counts for each method:

mam n_ena n_zscorer n_anthro

1 0 975 973 973

2 1 49 51 51

3 NA 62 62 62

I have reviewed the anthro package and its internals are very much the same as zscorer. Both packages use the same rules (WHO) for zscore calculation. So, here, with ENA outputing something different, the question is to know what rules SMART ENA are using and to see how this is implemented. I don't think the code for ENA is open source. Too bad.

I will let you decide what to do with this but I think some people will question this (just like the open issues in zscorer repository questioning the results of the zscore calculations.

@ernestguevarra, thank you for finding time to review this issue. I think this is something that I will have to raise with the SMART team when we share the package with them, and see what they would say. This clearly demonstrates that the issue is not in {mwana} (or {zscorer}).

Thank you for looking in to this issue.

@ernestguevarra, I've just had a new perspective on where this issue might be stemming from. I think the discrepancy might be due the fact that ENA applies a "first-order" flagging on the raw weight and height, before the zscores get computed to then apply a "second-order" flagging criteria. See the snippet below from the ENA data entry tab.

image

I think we would need to re-factor the WFHZ wrangler to set NA in cases where weight and height would be out of those ranges. I had clearly missed this one.

What do you think?

@ernestguevarra
Copy link
Member

I think we would need to re-factor the WFHZ wrangler to set NA in cases where weight and height would be out of those ranges. I had clearly missed this one.

Is this documented in the SMART methodology guidance? or just in ENA? You see how not being open source is a cumbersome thing in general.

ernestguevarra added a commit that referenced this issue Dec 9, 2024
@tomaszaba
Copy link
Collaborator Author

@tomaszaba, see my Rscript looking at the zscore calculations for zscorer package and the WHO anthro package for WFHZ using the data you share. I put the script in the data-raw folder but I gitignored your dataset. You can replicate my observations by putting your dataset in the data-raw directory and running the script. Use the fix/multiple-issues branch to see the script.

The Rscript is here: https://github.com/nutriverse/mwana/blob/fix/multiple-issues/data-raw/check_zscores.R

Here are the results comparing SAM counts for each method:

  sam n_ena n_zscorer n_anthro
1   0   999      1002     1002
2   1    25        23       23
3  NA    62        61       61

Here are the results comparing MAM counts for each method:

  mam n_ena n_zscorer n_anthro
1   0   975       973      973
2   1    49        51       51
3  NA    62        62       62

I have reviewed the anthro package and its internals are very much the same as zscorer. Both packages use the same rules (WHO) for zscore calculation. So, here, with ENA outputing something different, the question is to know what rules SMART ENA are using and to see how this is implemented. I don't think the code for ENA is open source. Too bad.

I will let you decide what to do with this but I think some people will question this (just like the open issues in zscorer repository questioning the results of the zscore calculations.

Hi Ernest, I discussed about this issue with Douglas. I mentioned and demonstrated the discrepancies that arise from the ENA generated z-scores compared to {anthro} and {zscorer}. Douglas suggested the following actions, before going to SMART initiative:

  1. Douglas will write to the person who developed the ENA software to understand where do the discrepancies stem from. For this, I sent to Douglas an excel file returned by the following output:
## Calculate difference in reference to WHO zscores -----
douglas <- addWGSR(
  df, sex = "sex", firstPart = "wt", secondPart = "ht",
  index = "wfh", output = "wfhz_zscorer", digits = 3
) |>
  dplyr::mutate(
    wfhz_ena_2 = round(wfhz_ena, digits = 2),
    wfhz_zscorer_2 = round(wfhz_zscorer, digits = 2),
    wfhz_anthro = anthro_zscores(sex = sex, weight = wt, lenhei = ht)$zwfl,
    who_ena = wfhz_anthro - wfhz_ena_2,
    who_zscorer_mwana = wfhz_anthro - wfhz_zscorer_2
  )

The presentation of {mwana} to the SMART Initiative will follow when this action is completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis question Further information is requested testing code testing
Projects
None yet
Development

No branches or pull requests

2 participants