Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persist-txn: compaction #22619

Merged
merged 1 commit into from
Oct 27, 2023
Merged

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented Oct 24, 2023

Compaction of data shards is initially delegated to the txns user (the storage controller). Because txn writes intentionally never read data shards and in no way depend on the sinces, the since of a data shard is free to be arbitrarily far ahead of or behind the txns upper. Data shard reads, when run through the above process, then follow the usual rules (can read at times beyond the since but not beyond the upper).

Compaction of the txns shard relies on the following invariant that is carefully maintained: every write less than the since of the txns shard has been applied. Mechanically, this is accomplished by a critical since capability held internally by the txns system. Any txn writer is free to advance it to a time once it has proven that all writes before that time have been applied.

It is advantageous to compact the txns shard aggressively so that applied writes are promptly consolidated out, minimizing the size. For a snapshot read at as_of, we need to be able to distinguish when the latest write <= as_of has been applied. The above invariant enables this as follows:

  • If as_of <= txns_shard.since(), then the invariant guarantees that all writes <= as_of have been applied, so we're free to read as described in the section above.
  • Otherwise, we haven't compacted as_of in the txns shard yet, and still have perfect information about which writes happened when. We can look at the data shard upper to determine which have been applied.

Touches MaterializeInc/database-issues#6685

Motivation

  • This PR adds a known-desirable feature.

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@danhhz danhhz force-pushed the persist_txn_compaction branch 3 times, most recently from e217486 to ebcedf0 Compare October 26, 2023 17:29
@danhhz danhhz marked this pull request as ready for review October 26, 2023 17:29
@danhhz danhhz requested a review from a team as a code owner October 26, 2023 17:29
@danhhz danhhz requested a review from jkosh44 October 26, 2023 17:29
Comment on lines +479 to +485
self.txns_cache.update_gt(&since_ts).await;
self.txns_cache.compact_to(&since_ts);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter that we do this before determining the actual min_unapplied_ts? For example if a since_ts of 100 is passed in, but then we reduce it to 50. The in memory cache will be compacted to 100, while the physical txn shard will only be compacted to 50.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's actually exactly what the "It's always safe" comment above is trying to say is okay, so this is a good hint that I should reword it somehow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want a second pair of eyes on this that are more familiar with the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, it's just you and aljoscha, so you might have to be it. None of the timely/dataflow bits are changing in this PR, just debugging, the unblock_read change, and various compaction related things. Happy to walk you through it on a huddle though (maybe Monday?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just went through it, it seems pretty straightforward and uncontroversial.

Copy link
Contributor

@jkosh44 jkosh44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Comment on lines +479 to +485
self.txns_cache.update_gt(&since_ts).await;
self.txns_cache.compact_to(&since_ts);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/// Allows compaction to the txns shard as well as internal representations,
/// losing the ability to answer queries about times less_than since_ts.
///
/// only call this from the singleton controller process
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, fixed this fragment

@danhhz danhhz force-pushed the persist_txn_compaction branch from ebcedf0 to 343b96e Compare October 27, 2023 19:25
@danhhz danhhz enabled auto-merge October 27, 2023 19:27
@danhhz danhhz disabled auto-merge October 27, 2023 19:33
@danhhz
Copy link
Contributor Author

danhhz commented Oct 27, 2023

Whoops, pulled in something from another branch. Fixing

Compaction of data shards is initially delegated to the txns user (the
storage controller). Because txn writes intentionally never read data
shards and in no way depend on the sinces, the since of a data shard is
free to be arbitrarily far ahead of or behind the txns upper. Data shard
reads, when run through the above process, then follow the usual rules
(can read at times beyond the since but not beyond the upper).

Compaction of the txns shard relies on the following invariant that is
carefully maintained: every write less than the since of the txns shard
has been applied. Mechanically, this is accomplished by a critical since
capability held internally by the txns system. Any txn writer is free to
advance it to a time once it has proven that all writes before that time
have been applied.

It is advantageous to compact the txns shard aggressively so that
applied writes are promptly consolidated out, minimizing the size. For a
snapshot read at `as_of`, we need to be able to distinguish when the
latest write `<= as_of` has been applied. The above invariant enables
this as follows:

- If `as_of <= txns_shard.since()`, then the invariant guarantees that
  all writes `<= as_of` have been applied, so we're free to read as
  described in the section above.
- Otherwise, we haven't compacted `as_of` in the txns shard yet, and
  still have perfect information about which writes happened when. We
  can look at the data shard upper to determine which have been applied.
@danhhz danhhz force-pushed the persist_txn_compaction branch from 343b96e to d381993 Compare October 27, 2023 19:36
@danhhz danhhz enabled auto-merge October 27, 2023 19:37
@danhhz danhhz merged commit 0cba4cf into MaterializeInc:main Oct 27, 2023
@danhhz danhhz deleted the persist_txn_compaction branch October 27, 2023 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants