fix(mesh): wire EpochMaxWins into CRDT merge#1469
Conversation
Teach the CRDT OR-map to choose merge behavior by key prefix instead of always applying timestamp LWW semantics. Rate-limit namespaces register EpochMaxWins so counter resets are compared by epoch first and count second, matching the v2 protocol contract. Update operation-log compaction to use the same merge strategy as apply-time merge. Without this, a compacted log could keep the wrong rate-limit operation and later peers would rehydrate stale counter state. Add focused CRDT tests for EpochMaxWins merge, compaction, and tombstone behavior so direct v2 cutover does not silently regress rate-limit correctness. Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughPer-prefix MergeStrategy added; CrdtOrMap stores a longest-prefix registry and dispatches inserts/compaction by strategy. OperationLog append/compact/snapshot are strategy-aware. MeshKV registers per-namespace strategies, KvStore upsert removed, and tests validate EpochMaxWins behavior. ChangesPer-Key-Prefix Merge Strategies
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6149e363e0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable MergeStrategy framework for the CRDT key-value store, enabling prefix-based conflict resolution policies such as LastWriterWins and EpochMaxWins. The implementation includes updates to the OperationLog for strategy-aware compaction and the addition of record_epoch_insert_metadata to handle epoch-based versioning. Review feedback highlights several performance optimization opportunities, including avoiding Vec allocations during value comparisons by decoding into structs and reducing lock contention by hoisting read locks out of tight loops during log compaction. Additionally, a correction was suggested for tombstone handling in metadata records to ensure consistency.
There was a problem hiding this comment.
Solid PR — the EpochMaxWins strategy is correctly wired through the CRDT merge, compaction, and operation-log paths. The candidate_wins logic correctly falls through to LWW for Remove operations and non-Insert pairs, and the record_epoch_insert_metadata correctly only suppresses on duplicate entries or newer tombstones (not newer non-tombstone inserts, which is the right call for value-based merge).
Summary: 0 🔴 Important · 1 🟡 Nit · 0 🟣 Pre-existing
The one nit: snapshot_and_truncate was not updated with a _with_strategy variant like compact and append were. No current callers, but it's a public method that would silently use LWW for EpochMaxWins keys.
Tests cover the core epoch-wins-over-stale-count and tombstone-wins-over-stale-insert scenarios well.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/mesh/src/crdt_kv/crdt.rs (1)
181-191:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftRoute local insert-like writes through the strategy-aware merge path.
These methods still call
record_insert_metadata()and overwriteself.storedirectly before theEpochMaxWinslogic inapply_insert()ever runs. For example, on anrl:key a later local write ofencode(5, 100)will replace an existingencode(6, 0)even though merge/compaction would keep epoch 6. The local write path needs to reuse the same strategy-specific resolution as replicated inserts, and it needs a regression test for that case.Also applies to: 220-230, 260-270, 299-310
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/mesh/src/crdt_kv/crdt.rs` around lines 181 - 191, The local insert path currently calls record_insert_metadata() and directly updates self.store via self.store.upsert before the strategy-specific merge logic in apply_insert()/EpochMaxWins runs; change the local insert-like handlers (the blocks that call record_insert_metadata and self.store.upsert, and then append_operation) to instead build the same Operation::insert and route it through the same merge/application path used for replicated ops (e.g., call apply_insert or the common apply_operation path after append_operation) so the EpochMaxWins strategy and rl: key semantics are respected; update the code blocks around the current self.store.upsert usage (and similar blocks at the other noted ranges) to stop writing the store directly and reuse apply_insert, and add a regression test that performs a local write encode(5,100) against an existing encode(6,0) on an rl: key to assert epoch 6 is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/mesh/src/crdt_kv/merge_strategy.rs`:
- Around line 8-9: The public enum variant MergeStrategy::MaxValueWins must not
be usable until its semantics are implemented; either remove the MaxValueWins
variant from the public MergeStrategy API or add a validation check wherever
MergeStrategy is instantiated/configured to reject MaxValueWins with an explicit
error. Locate the MergeStrategy definition and either make MaxValueWins
non-public/temporary (remove or comment out) or add a guard in the
configuration/constructor path (the code that parses or returns a MergeStrategy)
that returns an Err or validation failure if MergeStrategy::MaxValueWins is
selected; also add a unit test to assert the new validation so callers cannot
silently pick MaxValueWins while apply/compaction still treat it like
LastWriterWins.
In `@crates/mesh/src/crdt_kv/operation.rs`:
- Around line 252-255: snapshot_and_truncate() currently hardcodes
MergeStrategy::LastWriterWins when calling
latest_operations_by_key_with_strategy; change it to look up the per-key
registered merge strategy (the same lookup logic used by
compact_with_strategy()) so each key uses its configured MergeStrategy (e.g.,
EpochMaxWins vs LastWriterWins) during snapshotting and compaction; update the
closure passed to latest_operations_by_key_with_strategy in
snapshot_and_truncate() to retrieve the strategy for the given key instead of
always returning LastWriterWins.
---
Outside diff comments:
In `@crates/mesh/src/crdt_kv/crdt.rs`:
- Around line 181-191: The local insert path currently calls
record_insert_metadata() and directly updates self.store via self.store.upsert
before the strategy-specific merge logic in apply_insert()/EpochMaxWins runs;
change the local insert-like handlers (the blocks that call
record_insert_metadata and self.store.upsert, and then append_operation) to
instead build the same Operation::insert and route it through the same
merge/application path used for replicated ops (e.g., call apply_insert or the
common apply_operation path after append_operation) so the EpochMaxWins strategy
and rl: key semantics are respected; update the code blocks around the current
self.store.upsert usage (and similar blocks at the other noted ranges) to stop
writing the store directly and reuse apply_insert, and add a regression test
that performs a local write encode(5,100) against an existing encode(6,0) on an
rl: key to assert epoch 6 is preserved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3f208cf6-27bf-43d1-854a-329467fc10b7
📒 Files selected for processing (7)
crates/mesh/src/crdt_kv/crdt.rscrates/mesh/src/crdt_kv/merge_strategy.rscrates/mesh/src/crdt_kv/mod.rscrates/mesh/src/crdt_kv/operation.rscrates/mesh/src/crdt_kv/tests.rscrates/mesh/src/kv.rscrates/mesh/src/lib.rs
Route local insert/upsert paths through the same strategy-aware insert resolver used by replicated operations so stale old-epoch writes cannot overwrite newer rate-limit windows. Keep EpochMaxWins metadata aligned with the semantic value winner instead of the highest Lamport timestamp, and reject duplicate operation ids regardless of whether the existing entry is a tombstone. This lets tombstones newer than the actual epoch-winning insert remove the shard even when a stale higher-timestamp insert was seen earlier. Add regression coverage for local stale-epoch writes and tombstones after an older-timestamp epoch winner. Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
Drop the public MaxValueWins variant because the CRDT apply and compaction paths do not implement max-value semantics. Keeping it configurable would let callers select a strategy that silently behaves like LastWriterWins. Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
Change snapshot_and_truncate to require the same per-key merge-strategy callback used by append and compaction, removing the hidden LastWriterWins fallback for EpochMaxWins keys. Add a regression test showing an older-timestamp higher-epoch rate-limit value survives snapshot truncation. Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
Add an allocation-free winner helper for EpochMaxWins and use it when operation-log compaction only needs to decide which operation survives. This keeps the merge API for actual value writes while avoiding temporary Vec allocations in the hot compaction comparison path. Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
Avoid taking the merge-strategy read lock once per key while compacting the operation log. The compaction path now clones the sorted prefix registry once and performs all per-key lookups against that stable snapshot. Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
Avoid repeated merge-strategy read-lock acquisition when an append crosses the operation-log compaction threshold. The append path now takes one strategy snapshot under the operation-log write guard and reuses it if compaction runs. Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf16fcaf0d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/mesh/src/crdt_kv/crdt.rs`:
- Around line 623-656: The merge logic currently treats any incoming bytes equal
to the incoming value as a "candidate wins" and rewrites stored epoch metadata
even when the merged bytes equal the existing local value, which can rewind the
stored (timestamp, replica_id); update the decision so we only replace
versions/new_metadata when the merged bytes actually differ from the current
stored bytes or when there is no current value. Concretely, change the
computation of candidate_wins_value and the subsequent check around
current/is_some() so that if current.as_deref() == Some(merged.as_slice()) you
do not clear/replace versions (i.e., preserve existing versions), referencing
the symbols candidate_wins_value, merged, current, versions, new_metadata,
epoch_max_wins::merge, and Self::compact_key_metadata to locate the code to
modify.
In `@crates/mesh/src/crdt_kv/operation.rs`:
- Around line 169-195: candidate_wins currently mixes epoch-based insert
comparisons with timestamp-based insert/remove comparisons causing
non-transitive ordering; change the MergeStrategy::EpochMaxWins branch (function
candidate_wins and the caller latest_operations_by_key_with_strategy) to stop
doing pairwise epoch-vs-timestamp reductions. Instead, for a given key compute
the epoch-max Insert winner across all Insert operations using
epoch_max_wins::winner semantics, separately compute the newest Tombstone/Remove
by (timestamp, replica_id), then compare those two deterministic winners
(epoch-max insert vs newest tombstone) using a single consistent rule to decide
survival; update candidate_wins to call those helpers or delegate to a new
function so the EpochMaxWins path makes a holistic decision rather than falling
back to the (timestamp, replica_id) pairwise comparison.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4125fe15-96f3-4ed9-9df7-ff4d0d30ad4f
📒 Files selected for processing (6)
crates/mesh/src/crdt_kv/crdt.rscrates/mesh/src/crdt_kv/epoch_max_wins.rscrates/mesh/src/crdt_kv/kv_store.rscrates/mesh/src/crdt_kv/merge_strategy.rscrates/mesh/src/crdt_kv/operation.rscrates/mesh/src/crdt_kv/tests.rs
Avoid treating an older incoming insert with unchanged EpochMaxWins bytes as a semantic winner. When merged bytes match the current store value, preserve or advance metadata by timestamp instead of clearing versions, so tombstone ordering cannot be rewound by delayed equal-value inserts. Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
Replace pairwise mixed ordering for EpochMaxWins operation-log snapshots with a per-key survivor selection. The log now selects the epoch-winning insert, selects the newest tombstone, and compares those two once, avoiding order-dependent cycles between stale inserts, resets, and removes. Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5c2b791e87
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Treat the newest tombstone as the cutoff for EpochMaxWins operation-log compaction. Pre-tombstone inserts no longer compete with later inserts by epoch/count, so a valid post-delete write can revive the key even when its epoch is lower than a deleted historical value. Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8287ddf4bb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/mesh/src/crdt_kv/tests.rs`:
- Around line 552-586: The test function name
test_operation_log_epoch_max_wins_equal_insert_uses_newer_timestamp is
misleading because a tombstone removes the older insert rather than exercising
equal-epoch LWW tie-breaking; rename the test to
test_operation_log_epoch_max_wins_post_tombstone_insert_wins_over_pre_tombstone_equal_epoch
(update the function identifier and any references) to reflect the scenario
being tested (the code paths around OperationLog::append,
OperationLog::snapshot_and_truncate and MergeStrategy::EpochMaxWins remain
unchanged), or alternatively modify the test to remove the tombstone (keeping
the original name) if you want to directly test equal-epoch LWW tie-breaking.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d46d3ccc-bc31-465f-908e-131fc429edff
📒 Files selected for processing (2)
crates/mesh/src/crdt_kv/operation.rscrates/mesh/src/crdt_kv/tests.rs
Replace the raw EpochMaxWins compaction path for rl: keys with a RateLimitShard state that stores a normalized live frontier plus the newest tombstone boundary. This keeps operation-log compaction from losing the post-tombstone value needed to reject stale pre-delete updates later. Normalize rl: writes before storing them, decode both raw epoch/count payloads and normalized shard state for the gateway adapter, and apply incoming operations before compacting the local log so synthetic compacted operations cannot hide unseen updates behind an existing operation id. Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/mesh/src/crdt_kv/epoch_max_wins.rs`:
- Around line 342-369: compact_operations currently uses the try-operator on
state_from_insert_value(...) which returns None on a malformed `rl:` insert and
causes the whole compaction to drop the key; change the loop in
compact_operations so that decoding errors from state_from_insert_value are
handled locally (for Insert only) by skipping that operation (optionally
logging) and continuing to merge other operations instead of returning None,
then proceed to merge via current.merge(operation_state) and finally call
state.into_operation(key?) as before; reference the compact_operations function,
the match arm that calls state_from_insert_value, and the merge/into_operation
paths to locate where to replace the `?`-style early-return with a
continue-on-error behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f94d3fef-6561-4add-8bc2-e32b5760d43a
📒 Files selected for processing (6)
crates/mesh/src/crdt_kv/crdt.rscrates/mesh/src/crdt_kv/epoch_max_wins.rscrates/mesh/src/crdt_kv/operation.rscrates/mesh/src/crdt_kv/replica.rscrates/mesh/src/crdt_kv/tests.rsmodel_gateway/src/mesh/adapters/rate_limit_sync.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 15e781f06d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Remove the compatibility magic/version header from normalized rl: shard values now that v2 does not need to read old stored payloads. Raw 16-byte epoch/count payloads are accepted only at the insert boundary; stored and gossiped values are serialized RateLimitShard state. Rename the rate-limit Lamport wrapper and ValueMetadata conversion helpers so CRDT metadata code says RateLimitVersion explicitly instead of a generic version() / from_live_version pair. Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00dd0ba3cf
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c57f1ec86b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
The re-exported `merge_epoch_max_wins` operates on the normalized stored shard payload, but `encode_epoch_count` returns the raw 16-byte wire payload. Calling `merge_epoch_max_wins(encode(5,30), encode(6,0))` therefore treats both inputs as malformed shards and returns local unchanged — epoch 5 wins when epoch 6 should. Production merges go through `merge_live_value` (called from `CrdtOrMap::record_epoch_insert_metadata`), not this byte-only form. Grep confirms zero callers outside the unit tests in `epoch_max_wins.rs` itself. Gate `fn merge` with `#[cfg(test)]` and drop the `merge_epoch_max_wins` re-exports from `lib.rs` and `crdt_kv/mod.rs`. The wire-side public helpers (`encode_epoch_count`, `decode_epoch_count`, `EpochCount`, `EPOCH_MAX_WINS_ENCODED_LEN`) stay public for the gateway adapter. Flagged by Codex (P2) on PR #1469. Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
…Wins
When the source replica compacts a log of [pre-tombstone Insert,
Remove, post-tombstone Insert] into a single shard Insert, a peer
that merges only that snapshot never sees a Remove op. The
shard's embedded `tombstone_version` is the only thing standing
between the receiver and resurrection by a delayed pre-tombstone
high-epoch insert from a different peer.
Add a regression test that drives the exact path:
1. Source merges all three ops and compacts to one Insert.
2. Receiver applies that single op.
3. A late peer gossips its still-pre-tombstone high-epoch
Insert. Receiver must reject it.
Also document the tombstone duality on
`record_epoch_insert_metadata` so the next maintainer doesn't
treat the embedded `tombstone_version` as dead weight: it lives
alongside the `ValueMetadata { is_tombstone: true }` entry
because the two cover different propagation paths (local LWW
ordering + GC vs. cross-replica snapshot propagation).
Identified during PR #1469 design review.
Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
Description
Problem
Mesh v2 rate-limit state is documented as using EpochMaxWins semantics, where newer epochs beat older epochs and counters within the same epoch merge by max value. The CRDT OR-map merge path still used LastWriterWins for every key, so rate-limit counter resets could be lost once rate-limit namespaces route through MeshKV.
Solution
Teach the CRDT OR-map to select merge behavior by key prefix. Rate-limit namespaces can register EpochMaxWins, while existing namespaces keep LastWriterWins behavior. Operation-log compaction uses the same key-prefix strategy so compacted logs preserve the same winner that apply-time merge would choose.
Changes
MergeStrategytype undercrdt_kv.MeshKV::configure_crdt_prefix.Test Plan
cargo test -p smg-meshChecklist
cargo +nightly fmtpassescargo clippy --all-targets --all-features -- -D warningspassesSummary by CodeRabbit
New Features
Documentation
Tests