Skip to content

Fix epsilon_from_gdp to return a valid high-confidence lower bound#270

Open
vvv214 wants to merge 5 commits into
google-deepmind:mainfrom
vvv214:fix-auditing-gdp-lower-bound
Open

Fix epsilon_from_gdp to return a valid high-confidence lower bound#270
vvv214 wants to merge 5 commits into
google-deepmind:mainfrom
vvv214:fix-auditing-gdp-lower-bound

Conversation

@vvv214

@vvv214 vvv214 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

CanaryScoreAuditor.epsilon_from_gdp should return a high-confidence lower bound. The previous implementation used

np.abs(isf(FPR_ub) - ppf(FNR_ub))

as a shortcut for the reverse D', D direction. With Clopper-Pearson upper confidence bounds, that shortcut can be too optimistic: a negative forward-direction mu is not automatically valid reverse-direction evidence, because the reverse direction needs its own error-rate bounds and threshold frontier.

This PR computes the two directions explicitly:

  • one-sided GDP mu on the original D, D' score frontier;
  • one-sided GDP mu on the swapped D', D score frontier;
  • one Bonferroni budget over both frontiers' FPR/FNR confidence bounds;
  • one final conversion from max(mu_forward, mu_reverse, 0) to (epsilon, delta).

Tests

Added coverage for null, small-sample null, forward-separated, and reverse-separated scores. Against the previous implementation:

scenario previous implementation this PR
null scores fails (epsilon ~= 3.54) passes (epsilon = 0)
small-sample null scores fails (epsilon ~= 5.92) passes (epsilon = 0)
forward-separated scores passes passes
reverse-separated scores fails (epsilon = 0) passes (epsilon ~= 15.81)

Also checked locally:

uvx pyink --check --diff jax_privacy/auditing.py tests/auditing_test.py
git diff --check
python3 -m py_compile jax_privacy/auditing.py tests/auditing_test.py

@google-cla

google-cla Bot commented Jun 15, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@vvv214 vvv214 force-pushed the fix-auditing-gdp-lower-bound branch 7 times, most recently from 62b5ab7 to 022bad8 Compare June 23, 2026 07:35
@vvv214 vvv214 force-pushed the fix-auditing-gdp-lower-bound branch from 022bad8 to 65e2006 Compare June 23, 2026 14:40
@vvv214 vvv214 marked this pull request as ready for review June 24, 2026 03:00

@galenandrew-google galenandrew-google left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your change. The code looks good but I believe the tests can be improved.

Comment thread tests/auditing_test.py
eps = auditor.epsilon_from_gdp(significance, delta)
self.assertLessEqual(eps, 0.1)

def test_epsilon_from_gdp_separated_is_positive(self):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is less specific than the earlier test [test_epsilon_from_gdp_tight]. Please remove it.

Comment thread tests/auditing_test.py
true_eps = dp_accounting.get_epsilon_gaussian(1 / mu, delta)
np.testing.assert_allclose(eps, true_eps, rtol=0.05)

def test_epsilon_from_gdp_null_is_zero(self):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the change in this PR means that if the in_canary_scores have smaller values on average than out_canary_scores, the returned epsilon is zero. If so, we should test that case like:

in_canary_scores = rng.normal(-1, 1, m)
out_canary_scores = rng.normal(0, 1, m)

I would suggest that instead of this test and test_epsilon_from_gdp_small_sample_null_is_zero below, we have a single parameterized product test that looks at
(large sample, small sample) x (mu 0, mu negative)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants