Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix consistency bug when storing aggregate shares #412

Merged
merged 1 commit into from
Oct 25, 2023

Conversation

mendess
Copy link
Collaborator

@mendess mendess commented Oct 16, 2023

Closes #408

@mendess mendess self-assigned this Oct 16, 2023
@mendess mendess force-pushed the mendess/fix-do-consistency-issue branch from a98e488 to c6ee81b Compare October 16, 2023 17:08
@mendess mendess changed the base branch from main to mendess/shard-aggregate-store October 16, 2023 17:08
@mendess mendess force-pushed the mendess/shard-aggregate-store branch from c152e15 to 780d38a Compare October 16, 2023 17:09
@mendess mendess force-pushed the mendess/fix-do-consistency-issue branch 3 times, most recently from 21bea11 to 2123855 Compare October 16, 2023 17:17
@mendess mendess force-pushed the mendess/shard-aggregate-store branch 2 times, most recently from 848d6ef to 9cb6aa5 Compare October 16, 2023 17:33
@mendess mendess force-pushed the mendess/fix-do-consistency-issue branch 2 times, most recently from 5402ab0 to 38f250a Compare October 16, 2023 18:03
Copy link
Contributor

@cjpatton cjpatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Looks reasonable over all. However I'm wondering if you could help me with something: In Rename the try_put_agg_share_span() retry metric #411 I've tried to encapsulate this logic so that I can reuse it in draft07. (In draft07 we need to commit to the state change at an earlier point, just after AggregationJobInitReq.)
  2. It would be great to exercise the retry logic in unit tests. Could you set up MockAggregator to haave one of the batch buckets return a replay and another not?
  3. nit: It's good practice to link PRs to issues they close, e.g., by adding Closes #408 to the top comment.

daphne/src/lib.rs Outdated Show resolved Hide resolved
daphne/src/lib.rs Outdated Show resolved Hide resolved
daphne/src/lib.rs Show resolved Hide resolved
daphne/src/lib.rs Outdated Show resolved Hide resolved
daphne/src/lib.rs Show resolved Hide resolved
daphne/src/roles/helper.rs Outdated Show resolved Hide resolved
daphne/src/roles/helper.rs Outdated Show resolved Hide resolved
daphne/src/roles/helper.rs Outdated Show resolved Hide resolved
daphne/src/roles/helper.rs Outdated Show resolved Hide resolved
@mendess mendess force-pushed the mendess/fix-do-consistency-issue branch from 38f250a to bfa9443 Compare October 17, 2023 10:24
Base automatically changed from mendess/shard-aggregate-store to main October 17, 2023 11:07
@mendess mendess force-pushed the mendess/fix-do-consistency-issue branch from bfa9443 to 3684717 Compare October 17, 2023 11:16
Copy link
Contributor

@cjpatton cjpatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tacked on a commit with a suggestion for cleaning up the retry logic. The basic idea is to add a table that tracks what we did with each report the last time we tried try_put_agg_share_span(). Feel free to iterate on this or replace it with something you think is better.

daphne/src/roles/aggregator.rs Outdated Show resolved Hide resolved
daphne/src/roles/aggregator.rs Outdated Show resolved Hide resolved
daphne/src/roles/helper.rs Outdated Show resolved Hide resolved
daphne/src/roles/helper.rs Show resolved Hide resolved
@mendess mendess force-pushed the mendess/fix-do-consistency-issue branch 2 times, most recently from f5be4cb to c12ea23 Compare October 19, 2023 09:15
Copy link
Contributor

@cjpatton cjpatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🦆 - thanks for the hard work on a really tricky change.

Copy link
Contributor

@cjpatton cjpatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed the changes to DOs. (Sorry, I should have caught this earlier.) There might be changes here that break assumptions in daphne, we need to carefully document anything that's broken.

Also, ReportsProcessed is now obsolete, so we should either delete it or at least document somewhere that it's not used and will be deleted.

let replayed: usize = put_agg_share_result
.into_iter()
.map(|(_, (r, _))| r.map(|replayed| replayed.len()).unwrap_or_default())
.sum();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an error occurs, we need to handle it as fatal.

It should either commit an aggregate share to storage and mark the
associated reports as aggregated or do nothing at all.

Due to the batching nature of the input this requires an equally complex
return type which needs to be handled by the helper.
@mendess mendess force-pushed the mendess/fix-do-consistency-issue branch from 1094133 to 90de281 Compare October 25, 2023 16:51
@mendess mendess merged commit a05963a into main Oct 25, 2023
4 checks passed
@mendess mendess deleted the mendess/fix-do-consistency-issue branch October 25, 2023 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Report metrics are over-counted if try_put_agg_share_span() is retried
2 participants