You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used your package to bias correct some positive definite data (both obs, mod and sce have no negative values, using basic_quantile mode), and was surprised to find some negative values in the bias corrected data. I dug into this, and I think the following error is the reason.
To find the percentile of a given value within the mod sample, you use statsmodels.distributions.empirical_distribution.ECDF, which by default uses the interval [a,b), with a the minimum value of the sample, and b the maximum value of the sample. This means that if the value passed = a, it will output the lowest percentile increment calculated by the ecdf.
However, to then find the value of a given percentile, you use numpy.nanpercentile, which uses (a,b]. This means that to get back a, you need to pass percentile = 0, not the lowest percentile increment.
Here is an example of how it went wrong for me:
mod sample: sample_min = 0.092, sample_2nd_smallest = 0.13
obs sample: sample_min = 0.018, sample_2nd_smallest = 0.019
p(0.092) = 0.014 and not 0, so percentile(0.014) = sample_2nd_smallest and not sample_smallest
So if I correct the value 0.092, I get:
X_bc = 0.092 + 0.019 - 0.13 = -0.02, instead of 0.018, which is what it should be.
Luckily, the fix is easy: by passing side="left"to ECDF, you use (a,b], which corresponds to what numpy is using.
This error only appears when you try to bias correct a value equal to the minimum of the mod sample, so it doesn't affect the results too much, but in my case, it does make a difference to the overall results.
The text was updated successfully, but these errors were encountered:
Hi,
I used your package to bias correct some positive definite data (both obs, mod and sce have no negative values, using basic_quantile mode), and was surprised to find some negative values in the bias corrected data. I dug into this, and I think the following error is the reason.
To find the percentile of a given value within the mod sample, you use
statsmodels.distributions.empirical_distribution.ECDF
, which by default uses the interval [a,b), with a the minimum value of the sample, and b the maximum value of the sample. This means that if the value passed = a, it will output the lowest percentile increment calculated by the ecdf.However, to then find the value of a given percentile, you use
numpy.nanpercentile
, which uses (a,b]. This means that to get back a, you need to pass percentile = 0, not the lowest percentile increment.Here is an example of how it went wrong for me:
mod sample: sample_min = 0.092, sample_2nd_smallest = 0.13
obs sample: sample_min = 0.018, sample_2nd_smallest = 0.019
p(0.092) = 0.014 and not 0, so percentile(0.014) = sample_2nd_smallest and not sample_smallest
So if I correct the value 0.092, I get:
X_bc = 0.092 + 0.019 - 0.13 = -0.02, instead of 0.018, which is what it should be.
Luckily, the fix is easy: by passing side="left"to ECDF, you use (a,b], which corresponds to what numpy is using.
This error only appears when you try to bias correct a value equal to the minimum of the mod sample, so it doesn't affect the results too much, but in my case, it does make a difference to the overall results.
The text was updated successfully, but these errors were encountered: