Fix epsilon_from_gdp to return a valid high-confidence lower bound#270
Fix epsilon_from_gdp to return a valid high-confidence lower bound#270vvv214 wants to merge 5 commits into
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
62b5ab7 to
022bad8
Compare
022bad8 to
65e2006
Compare
galenandrew-google
left a comment
There was a problem hiding this comment.
Thank you for your change. The code looks good but I believe the tests can be improved.
| eps = auditor.epsilon_from_gdp(significance, delta) | ||
| self.assertLessEqual(eps, 0.1) | ||
|
|
||
| def test_epsilon_from_gdp_separated_is_positive(self): |
There was a problem hiding this comment.
This test is less specific than the earlier test [test_epsilon_from_gdp_tight]. Please remove it.
| true_eps = dp_accounting.get_epsilon_gaussian(1 / mu, delta) | ||
| np.testing.assert_allclose(eps, true_eps, rtol=0.05) | ||
|
|
||
| def test_epsilon_from_gdp_null_is_zero(self): |
There was a problem hiding this comment.
If I understand correctly, the change in this PR means that if the in_canary_scores have smaller values on average than out_canary_scores, the returned epsilon is zero. If so, we should test that case like:
in_canary_scores = rng.normal(-1, 1, m)
out_canary_scores = rng.normal(0, 1, m)
I would suggest that instead of this test and test_epsilon_from_gdp_small_sample_null_is_zero below, we have a single parameterized product test that looks at
(large sample, small sample) x (mu 0, mu negative)
Summary
CanaryScoreAuditor.epsilon_from_gdpshould return a high-confidence lower bound. The previous implementation usedas a shortcut for the reverse
D', Ddirection. With Clopper-Pearson upper confidence bounds, that shortcut can be too optimistic: a negative forward-directionmuis not automatically valid reverse-direction evidence, because the reverse direction needs its own error-rate bounds and threshold frontier.This PR computes the two directions explicitly:
muon the originalD, D'score frontier;muon the swappedD', Dscore frontier;max(mu_forward, mu_reverse, 0)to(epsilon, delta).Tests
Added coverage for null, small-sample null, forward-separated, and reverse-separated scores. Against the previous implementation:
epsilon ~= 3.54)epsilon = 0)epsilon ~= 5.92)epsilon = 0)epsilon = 0)epsilon ~= 15.81)Also checked locally: