Error in interval of calculating percentiles ( [a,b) vs. (a.b] ) #21

luna-bloin · 2024-04-05T10:56:27Z

Hi,

I used your package to bias correct some positive definite data (both obs, mod and sce have no negative values, using basic_quantile mode), and was surprised to find some negative values in the bias corrected data. I dug into this, and I think the following error is the reason.

To find the percentile of a given value within the mod sample, you use statsmodels.distributions.empirical_distribution.ECDF, which by default uses the interval [a,b), with a the minimum value of the sample, and b the maximum value of the sample. This means that if the value passed = a, it will output the lowest percentile increment calculated by the ecdf.

However, to then find the value of a given percentile, you use numpy.nanpercentile, which uses (a,b]. This means that to get back a, you need to pass percentile = 0, not the lowest percentile increment.

Here is an example of how it went wrong for me:
mod sample: sample_min = 0.092, sample_2nd_smallest = 0.13
obs sample: sample_min = 0.018, sample_2nd_smallest = 0.019

p(0.092) = 0.014 and not 0, so percentile(0.014) = sample_2nd_smallest and not sample_smallest

So if I correct the value 0.092, I get:
X_bc = 0.092 + 0.019 - 0.13 = -0.02, instead of 0.018, which is what it should be.

Luckily, the fix is easy: by passing side="left"to ECDF, you use (a,b], which corresponds to what numpy is using.

This error only appears when you try to bias correct a value equal to the minimum of the mod sample, so it doesn't affect the results too much, but in my case, it does make a difference to the overall results.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in interval of calculating percentiles ( [a,b) vs. (a.b] ) #21

Error in interval of calculating percentiles ( [a,b) vs. (a.b] ) #21

luna-bloin commented Apr 5, 2024

Error in interval of calculating percentiles ( [a,b) vs. (a.b] ) #21

Error in interval of calculating percentiles ( [a,b) vs. (a.b] ) #21

Comments

luna-bloin commented Apr 5, 2024