Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in interval of calculating percentiles ( [a,b) vs. (a.b] ) #21

Open
luna-bloin opened this issue Apr 5, 2024 · 0 comments
Open

Comments

@luna-bloin
Copy link

Hi,

I used your package to bias correct some positive definite data (both obs, mod and sce have no negative values, using basic_quantile mode), and was surprised to find some negative values in the bias corrected data. I dug into this, and I think the following error is the reason.

To find the percentile of a given value within the mod sample, you use statsmodels.distributions.empirical_distribution.ECDF, which by default uses the interval [a,b), with a the minimum value of the sample, and b the maximum value of the sample. This means that if the value passed = a, it will output the lowest percentile increment calculated by the ecdf.

However, to then find the value of a given percentile, you use numpy.nanpercentile, which uses (a,b]. This means that to get back a, you need to pass percentile = 0, not the lowest percentile increment.

Here is an example of how it went wrong for me:
mod sample: sample_min = 0.092, sample_2nd_smallest = 0.13
obs sample: sample_min = 0.018, sample_2nd_smallest = 0.019

p(0.092) = 0.014 and not 0, so percentile(0.014) = sample_2nd_smallest and not sample_smallest

So if I correct the value 0.092, I get:
X_bc = 0.092 + 0.019 - 0.13 = -0.02, instead of 0.018, which is what it should be.

Luckily, the fix is easy: by passing side="left"to ECDF, you use (a,b], which corresponds to what numpy is using.

This error only appears when you try to bias correct a value equal to the minimum of the mod sample, so it doesn't affect the results too much, but in my case, it does make a difference to the overall results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant