Skip to content

Implement determine_timestamp using constraints #27815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

frankmcsherry
Copy link
Contributor

This PR explores what it would look like to pilot the determine_timestamp functions using constraints, rather than the current interwoven arguments with .. at least unclear-to-me semantics. The implementation reverses out some of the intents from the settings of QueryWhen, the optional oracle timestamp, the isolation level, and the timeline, and translates them to constraints with associated "reasons".

The intent is to nudge the implementation to one that starts with actual constraints with reasons, ideally pushing those constraints up the stack to the moment they are expressed. There are also surprising (to me) moments called out in the code where e.g. we avoid choosing the freshest timestamp as a function of QueryWhen rather than our isolation (presumably elsewhere the isolation results in a setting of the QueryWhen?).

The logic around StrongSessionSerializable is interesting, and the Preference enum isn't rich enough to represent it. I think this is because it is the only logic at the moment that intentionally trades off freshness and responsiveness; the other code paths sort of end up being wired to do reasonable things, as a result of the contexts in which we call the function with various arguments.

Motivation

Tips for reviewer

Checklist

// The specification of an `oracle_read_ts` may indicates that we must advance to it,
// except in one isolation mode, or if `when` does not indicate that we should.
// At the moment, only `QueryWhen::FreshestTableWrite` indicates that we should.
// TODO: Should this just depend on the isolation level?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor

@maddyblue maddyblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! So much improved clarity around constraints and preferences. I am in favor of getting it into non-draft readiness and merged.

Getting this information into EXPLAIN TIMESTAMP seems also super useful and it's very amenable to printing.

let storage = id_bundle.storage_ids.iter().cloned().collect();
constraints
.lower
.push((since.clone(), Reason::StorageInput(storage)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The since here and for compute below is over the entire bundle, so could be over stating its needs. It perhaps makes sense to teach ReadHoldsInner about these separately so this constraint is more accurately expressed.

@ParkMyCar ParkMyCar self-requested a review March 24, 2025 18:09
Copy link
Contributor

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By and large this looks great! Really excited to get it merged!

The two things I would love to see before merging are:

  1. Moving the constraint based solved to an error-able function, this way we can make as many asserts as we would like, but expose them as errors, without disrupting existing users.
  2. A dyncfg to toggle between disabled, validate, and enabled.

.iter()
.flat_map(|(anti, _)| anti.iter())
.cloned()
.collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with Antichain, but I'm assuming that the FromIterator impl is doing the bulk of the work to get the "greatest lower bound"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to frank, this is so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. For better or worse, Antichain tracks the lower bound of inserted elements, and so blatting everything down and collecting it results in the lower bound of all the elements, which is the greatest lower bound of all the input antichains.

/// An explanation of reasons for a timestamp constraint.
#[derive(Serialize, Deserialize, Clone)]
pub enum Reason {
/// A compute input at a compute instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some more detail to these comments explaining how these reasons link to more user facing concepts would be great. e.g. right now I'm assuming a ComputeInput means some collection that is maintained on a replica, like an index? But I'm not totally sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be that for the moment we aren't sure either. The PR only takes the arguments to determine_timestamp_for and projects out the "reasons", and they show up as "compute ids" without clearer identities or reasons attached to them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh yeah true, we can definitely further improve the documentation as a follow-up!

Just thinking out loud, attaching a "reason" to the constraints would dovetail well with moving the constraints up a layer, like you mentioned previously @DAlperin

Comment on lines +334 to +339
assert!(
session.vars().real_time_recency()
&& isolation_level == &IsolationLevel::StrictSerializable,
"real time recency timestamp should only be supplied when real time recency \
is enabled and the isolation level is strict serializable"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the concept of a "shadow/validation rollout", assert-ing in this code seems bad since it can panic environmentd?

Could we move this logic to a separate error-able function and when validation is enabled, log the possible error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're still assert!-ing here?

Copy link
Contributor

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we add at least one SLT test so we can exercise the printed format of the new output to EXPLAIN TIMESTAMP

Edit: Ignore this! I raced with your most recent push

Comment on lines 1831 to 1834
/// Returns whether the candidate's upper bound should be constrained.
/// This is only true for `AtTimestamp` since it is the only variant that
/// specifies a timestamp.
pub fn should_constrain_upper(&self) -> bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but it's hard to locally think about this. It reads a bit like the other methods, in that it is instructions for code elsewhere, and the should_ prefix indicates some obligation of someone to do something. Is there a way to dig us out of that hole a bit? Maybe "no; let's just aim to delete this enum entirely". But at the moment the comment and the method name seem redundant and not super clear about what needs to be true about this function.

Comment on lines 33 to 39
/// Whether to use the constraint-based timestamp selection.
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum ConstraintBasedTimestampSelection {
Enabled,
Disabled,
Verify,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right file for this? It seems less like "timestamp oracle" material and more "timestamp selection" material, which uses the oracle as input, but uses various other things as input as well. Would timestamp_selection.rs be more appropriate, or is there a reason to put it here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is I misread it and thought it did say timestamp selection :) will fix

pub struct RawTimestampSelection<T> {
pub timestamp: T,
pub constraints: Option<Constraints>,
pub session_oracle_read_ts: Option<T>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't clear while reading this what role this oracle read timestamp had. I can imagine it has one, but I could also imagine various other arguments play a role too. Not sure what to conclude, but if there is an intent for the type, to be more than a bag of arguments, we should record that. I foresee it growing to look a lot like TimestampDetermination (apropos: should this be RawTimestampDetermination, or is the Selection signifying an important distinction?).

Comment on lines +532 to +535
constraints.lower.push((
Antichain::from_elem(advance_to),
Reason::IsolationLevel(*isolation_level),
));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a thing we'll want to revisit, as afaict this is a "preference" rather than a "constraint". My sense is that it is easiest to introduce behavior using constraints, and .. no harm navigating this using constraints for the moment, but does it sound right that this is less a constraint and more a preference?

/// after `since` and sure to be available not after `upper`.
///
/// The timeline that `id_bundle` belongs to is also returned, if one exists.
fn determine_timestamp_for(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps future work, but the way that determine_timestamp calls in to this method might be appropriate to blend in with this logic (iirc, it calls this method up to twice, with different isolation levels, to judge what would happen if you relaxed the isolation level). That logic was incorrect when last I looked at it (it had an await between the calls, so the two determinations might be unrelated). But, flagging for near future thoughts.

@frankmcsherry
Copy link
Contributor Author

I cannot "approve" the PR on account of I created it. Which is fine, but I wanted to flag that I would have, modulo the various comments which are mostly nits, style thoughts, and near future work.

Copy link
Contributor

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!! Thanks for making the changes!

Comment on lines +334 to +339
assert!(
session.vars().real_time_recency()
&& isolation_level == &IsolationLevel::StrictSerializable,
"real time recency timestamp should only be supplied when real time recency \
is enabled and the isolation level is strict serializable"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're still assert!-ing here?

/// An explanation of reasons for a timestamp constraint.
#[derive(Serialize, Deserialize, Clone)]
pub enum Reason {
/// A compute input at a compute instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh yeah true, we can definitely further improve the documentation as a follow-up!

Just thinking out loud, attaching a "reason" to the constraints would dovetail well with moving the constraints up a layer, like you mentioned previously @DAlperin

Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some nits!

Comment on lines 673 to 684
let constraint_determination = self.determine_timestamp_via_constraints(
session,
&read_holds,
id_bundle,
when,
oracle_read_ts,
compute_instance,
real_time_recency_ts,
isolation_level,
&timeline,
largest_not_in_advance_of_upper,
)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Verify, we probably don't want to error the call if the constraints-based determination fails, rather log the error (instead of ?).

Comment on lines +31 to +41
pub fn from_str(s: &str) -> Self {
match s {
"enabled" => Self::Enabled,
"disabled" => Self::Disabled,
"verify" => Self::Verify,
_ => {
tracing::error!("invalid value for ConstraintBasedTimestampSelection: {}", s);
ConstraintBasedTimestampSelection::default()
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could implement this as TryFrom instead, but totally up to you. It would have the benefit that the caller needs to worry about the error path, not the implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dov originally used TryFrom but I recommended he push the error handling into this method so the caller doesn't have to worry about it :) In my experience the caller would generally just fallback to some default, unifying that across all callers seemed like a good idea

@DAlperin DAlperin marked this pull request as ready for review March 26, 2025 17:27
@DAlperin DAlperin requested review from a team as code owners March 26, 2025 17:27
@DAlperin DAlperin requested a review from aljoscha March 26, 2025 17:27
@DAlperin DAlperin merged commit 79ef7f5 into MaterializeInc:main Mar 26, 2025
82 of 83 checks passed
def- added a commit to def-/materialize that referenced this pull request Mar 28, 2025
def- added a commit to def-/materialize that referenced this pull request Mar 28, 2025
def- added a commit to def-/materialize that referenced this pull request Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants