Implement `determine_timestamp` using constraints #27815

frankmcsherry · 2024-06-23T18:53:02Z

This PR explores what it would look like to pilot the determine_timestamp functions using constraints, rather than the current interwoven arguments with .. at least unclear-to-me semantics. The implementation reverses out some of the intents from the settings of QueryWhen, the optional oracle timestamp, the isolation level, and the timeline, and translates them to constraints with associated "reasons".

The intent is to nudge the implementation to one that starts with actual constraints with reasons, ideally pushing those constraints up the stack to the moment they are expressed. There are also surprising (to me) moments called out in the code where e.g. we avoid choosing the freshest timestamp as a function of QueryWhen rather than our isolation (presumably elsewhere the isolation results in a setting of the QueryWhen?).

The logic around StrongSessionSerializable is interesting, and the Preference enum isn't rich enough to represent it. I think this is because it is the only logic at the moment that intentionally trades off freshness and responsiveness; the other code paths sort of end up being wired to do reasonable things, as a result of the contexts in which we call the function with various arguments.

Motivation

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:

jkosh44 · 2024-06-24T18:46:03Z

src/adapter/src/coord/timestamp_selection.rs

+            // The specification of an `oracle_read_ts` may indicates that we must advance to it,
+            // except in one isolation mode, or if `when` does not indicate that we should.
+            // At the moment, only `QueryWhen::FreshestTableWrite` indicates that we should.
+            // TODO: Should this just depend on the isolation level?


maddyblue

This is great! So much improved clarity around constraints and preferences. I am in favor of getting it into non-draft readiness and merged.

Getting this information into EXPLAIN TIMESTAMP seems also super useful and it's very amenable to printing.

maddyblue · 2024-06-24T21:48:20Z

src/adapter/src/coord/timestamp_selection.rs

+            let storage = id_bundle.storage_ids.iter().cloned().collect();
+            constraints
+                .lower
+                .push((since.clone(), Reason::StorageInput(storage)));


The since here and for compute below is over the entire bundle, so could be over stating its needs. It perhaps makes sense to teach ReadHoldsInner about these separately so this constraint is more accurately expressed.

src/adapter/src/coord/timestamp_selection.rs

…onstraints

…traint solver and logging

ParkMyCar

By and large this looks great! Really excited to get it merged!

The two things I would love to see before merging are:

Moving the constraint based solved to an error-able function, this way we can make as many asserts as we would like, but expose them as errors, without disrupting existing users.
A dyncfg to toggle between disabled, validate, and enabled.

src/adapter/src/coord/timestamp_selection.rs

ParkMyCar · 2025-03-24T18:21:55Z

src/adapter/src/coord/timestamp_selection.rs

+                .iter()
+                .flat_map(|(anti, _)| anti.iter())
+                .cloned()
+                .collect()


I'm not super familiar with Antichain, but I'm assuming that the FromIterator impl is doing the bulk of the work to get the "greatest lower bound"?

According to frank, this is so

Yup. For better or worse, Antichain tracks the lower bound of inserted elements, and so blatting everything down and collecting it results in the lower bound of all the elements, which is the greatest lower bound of all the input antichains.

ParkMyCar · 2025-03-24T18:24:26Z

src/adapter/src/coord/timestamp_selection.rs

+    /// An explanation of reasons for a timestamp constraint.
+    #[derive(Serialize, Deserialize, Clone)]
+    pub enum Reason {
+        /// A compute input at a compute instance.


Adding some more detail to these comments explaining how these reasons link to more user facing concepts would be great. e.g. right now I'm assuming a ComputeInput means some collection that is maintained on a replica, like an index? But I'm not totally sure

It may also be that for the moment we aren't sure either. The PR only takes the arguments to determine_timestamp_for and projects out the "reasons", and they show up as "compute ids" without clearer identities or reasons attached to them.

Ahhh yeah true, we can definitely further improve the documentation as a follow-up!

Just thinking out loud, attaching a "reason" to the constraints would dovetail well with moving the constraints up a layer, like you mentioned previously @DAlperin

src/adapter/src/coord/timestamp_selection.rs

ParkMyCar · 2025-03-24T18:45:51Z

src/adapter/src/coord/timestamp_selection.rs

+                assert!(
+                    session.vars().real_time_recency()
+                        && isolation_level == &IsolationLevel::StrictSerializable,
+                    "real time recency timestamp should only be supplied when real time recency \
+                                is enabled and the isolation level is strict serializable"
+                );


Given the concept of a "shadow/validation rollout", assert-ing in this code seems bad since it can panic environmentd?

Could we move this logic to a separate error-able function and when validation is enabled, log the possible error?

Looks like we're still assert!-ing here?

ParkMyCar

~~Also, can we add at least one SLT test so we can exercise the printed format of the new output to EXPLAIN TIMESTAMP~~

Edit: Ignore this! I raced with your most recent push

src/environmentd/tests/sql.rs

frankmcsherry · 2025-03-25T17:58:49Z

src/sql/src/plan.rs

+    /// Returns whether the candidate's upper bound should be constrained.
+    /// This is only true for `AtTimestamp` since it is the only variant that
+    /// specifies a timestamp.
+    pub fn should_constrain_upper(&self) -> bool {


Nit, but it's hard to locally think about this. It reads a bit like the other methods, in that it is instructions for code elsewhere, and the should_ prefix indicates some obligation of someone to do something. Is there a way to dig us out of that hole a bit? Maybe "no; let's just aim to delete this enum entirely". But at the moment the comment and the method name seem redundant and not super clear about what needs to be true about this function.

frankmcsherry · 2025-03-25T18:01:39Z

src/adapter-types/src/timestamp_oracle.rs

+/// Whether to use the constraint-based timestamp selection.
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
+pub enum ConstraintBasedTimestampSelection {
+    Enabled,
+    Disabled,
+    Verify,
+}


Is this the right file for this? It seems less like "timestamp oracle" material and more "timestamp selection" material, which uses the oracle as input, but uses various other things as input as well. Would timestamp_selection.rs be more appropriate, or is there a reason to put it here?

The reason is I misread it and thought it did say timestamp selection :) will fix

frankmcsherry · 2025-03-25T18:53:04Z

src/adapter/src/coord/timestamp_selection.rs

+pub struct RawTimestampSelection<T> {
+    pub timestamp: T,
+    pub constraints: Option<Constraints>,
+    pub session_oracle_read_ts: Option<T>,


It wasn't clear while reading this what role this oracle read timestamp had. I can imagine it has one, but I could also imagine various other arguments play a role too. Not sure what to conclude, but if there is an intent for the type, to be more than a bag of arguments, we should record that. I foresee it growing to look a lot like TimestampDetermination (apropos: should this be RawTimestampDetermination, or is the Selection signifying an important distinction?).

frankmcsherry · 2025-03-25T18:57:16Z

src/adapter/src/coord/timestamp_selection.rs

+                        constraints.lower.push((
+                            Antichain::from_elem(advance_to),
+                            Reason::IsolationLevel(*isolation_level),
+                        ));


This seems like a thing we'll want to revisit, as afaict this is a "preference" rather than a "constraint". My sense is that it is easiest to introduce behavior using constraints, and .. no harm navigating this using constraints for the moment, but does it sound right that this is less a constraint and more a preference?

frankmcsherry · 2025-03-25T19:06:23Z

src/adapter/src/coord/timestamp_selection.rs

+    /// after `since` and sure to be available not after `upper`.
+    ///
+    /// The timeline that `id_bundle` belongs to is also returned, if one exists.
+    fn determine_timestamp_for(


This is perhaps future work, but the way that determine_timestamp calls in to this method might be appropriate to blend in with this logic (iirc, it calls this method up to twice, with different isolation levels, to judge what would happen if you relaxed the isolation level). That logic was incorrect when last I looked at it (it had an await between the calls, so the two determinations might be unrelated). But, flagging for near future thoughts.

frankmcsherry · 2025-03-25T19:07:49Z

I cannot "approve" the PR on account of I created it. Which is fine, but I wanted to flag that I would have, modulo the various comments which are mostly nits, style thoughts, and near future work.

ParkMyCar

Nice!! Thanks for making the changes!

ParkMyCar · 2025-03-25T21:42:45Z

src/adapter/src/coord/timestamp_selection.rs

+                assert!(
+                    session.vars().real_time_recency()
+                        && isolation_level == &IsolationLevel::StrictSerializable,
+                    "real time recency timestamp should only be supplied when real time recency \
+                                is enabled and the isolation level is strict serializable"
+                );


Looks like we're still assert!-ing here?

ParkMyCar · 2025-03-25T21:46:06Z

src/adapter/src/coord/timestamp_selection.rs

+    /// An explanation of reasons for a timestamp constraint.
+    #[derive(Serialize, Deserialize, Clone)]
+    pub enum Reason {
+        /// A compute input at a compute instance.


Ahhh yeah true, we can definitely further improve the documentation as a follow-up!

Just thinking out loud, attaching a "reason" to the constraints would dovetail well with moving the constraints up a layer, like you mentioned previously @DAlperin

antiguru

Added some nits!

antiguru · 2025-03-26T08:02:31Z

src/adapter/src/coord/timestamp_selection.rs

+                let constraint_determination = self.determine_timestamp_via_constraints(
+                    session,
+                    &read_holds,
+                    id_bundle,
+                    when,
+                    oracle_read_ts,
+                    compute_instance,
+                    real_time_recency_ts,
+                    isolation_level,
+                    &timeline,
+                    largest_not_in_advance_of_upper,
+                )?;


In Verify, we probably don't want to error the call if the constraints-based determination fails, rather log the error (instead of ?).

src/adapter/src/coord/timestamp_selection.rs

src/adapter-types/src/timestamp_selection.rs

antiguru · 2025-03-26T08:14:18Z

src/adapter-types/src/timestamp_selection.rs

+    pub fn from_str(s: &str) -> Self {
+        match s {
+            "enabled" => Self::Enabled,
+            "disabled" => Self::Disabled,
+            "verify" => Self::Verify,
+            _ => {
+                tracing::error!("invalid value for ConstraintBasedTimestampSelection: {}", s);
+                ConstraintBasedTimestampSelection::default()
+            }
+        }
+    }


Could implement this as TryFrom instead, but totally up to you. It would have the benefit that the caller needs to worry about the error path, not the implementation.

Dov originally used TryFrom but I recommended he push the error handling into this method so the caller doesn't have to worry about it :) In my experience the caller would generally just fallback to some default, unifying that across all callers seemed like a good idea

Follow-up to MaterializeInc#27815 Failure seen in https://buildkite.com/materialize/nightly/builds/11639

Implement determine_timestamp using constraints

a543334

jkosh44 reviewed Jun 24, 2024

View reviewed changes

maddyblue reviewed Jun 24, 2024

View reviewed changes

DAlperin added 9 commits March 20, 2025 14:05

Merge remote-tracking branch 'origin/main' into determine_timestamp_c…

a6a9a25

…onstraints

Better logging

f31e1c2

Fix minimization

36a5586

Fix strong session serializable timestamp selection logic in the cons…

bf997b5

…traint solver and logging

Add EXPLAIN TIMESTAMP support

d51ef8d

Improve output

e2e0d34

remove unused import.

7b8baf3

update sql test to reflect new explain contents

5552fb4

Fix testdrive

307843b

ParkMyCar self-requested a review March 24, 2025 18:09

ParkMyCar reviewed Mar 24, 2025

View reviewed changes

src/environmentd/tests/sql.rs Outdated Show resolved Hide resolved

DAlperin added 11 commits March 24, 2025 16:08

Don't add unneeded constraints

20ce3a1

fix tests

284d2a4

fix more tests

dd20c58

dyncfg

f087206

fix rust tests

a25c4e1

Add metrics

3534a30

unbreak

11aa520

fix timestamp selection test

9442655

oops, return the right selection

0438397

clippy

ee6888a

Fix materialized_views.py

84b6c5f

frankmcsherry commented Mar 25, 2025

View reviewed changes

This test file is liable to be the death of me

c028fcf

DAlperin added 4 commits March 25, 2025 15:54

Address notes from frank

9531fea

add copyright header

dc672a5

fix test build

0bcb8b0

oh fine

7cb1536

ParkMyCar approved these changes Mar 25, 2025

View reviewed changes

antiguru reviewed Mar 26, 2025

View reviewed changes

DAlperin added 4 commits March 26, 2025 11:17

truncate if needed

cf406cc

Address feedback

08c5b16

I literally can't tell what this test wants anymore

81f7c8c

import default directly

4b59ced

DAlperin marked this pull request as ready for review March 26, 2025 17:27

DAlperin requested review from a team as code owners March 26, 2025 17:27

DAlperin requested a review from aljoscha March 26, 2025 17:27

DAlperin merged commit 79ef7f5 into MaterializeInc:main Mar 26, 2025
82 of 83 checks passed

def- added a commit to def-/materialize that referenced this pull request Mar 28, 2025

tests: Fix nightly for new EXPLAIN TIMESTAMP output

5092635

Follow-up to MaterializeInc#27815 Failure seen in https://buildkite.com/materialize/nightly/builds/11639

def- mentioned this pull request Mar 28, 2025

tests: Fix nightly for new EXPLAIN TIMESTAMP output #32034

Merged

5 tasks

def- added a commit to def-/materialize that referenced this pull request Mar 28, 2025

tests: Fix nightly for new EXPLAIN TIMESTAMP output

4b63329

Follow-up to MaterializeInc#27815 Failure seen in https://buildkite.com/materialize/nightly/builds/11639

def- added a commit to def-/materialize that referenced this pull request Mar 28, 2025

tests: Fix nightly for new EXPLAIN TIMESTAMP output

e91cc49

Follow-up to MaterializeInc#27815 Failure seen in https://buildkite.com/materialize/nightly/builds/11639

Implement determine_timestamp using constraints #27815

Implement determine_timestamp using constraints #27815

Uh oh!

Conversation

frankmcsherry commented Jun 23, 2024

Motivation

Tips for reviewer

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maddyblue left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ParkMyCar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ParkMyCar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frankmcsherry commented Mar 25, 2025

Uh oh!

ParkMyCar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antiguru left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Implement `determine_timestamp` using constraints #27815

Implement `determine_timestamp` using constraints #27815

ParkMyCar left a comment •

edited

Loading