-
Notifications
You must be signed in to change notification settings - Fork 472
Implement determine_timestamp
using constraints
#27815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement determine_timestamp
using constraints
#27815
Conversation
// The specification of an `oracle_read_ts` may indicates that we must advance to it, | ||
// except in one isolation mode, or if `when` does not indicate that we should. | ||
// At the moment, only `QueryWhen::FreshestTableWrite` indicates that we should. | ||
// TODO: Should this just depend on the isolation level? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! So much improved clarity around constraints and preferences. I am in favor of getting it into non-draft readiness and merged.
Getting this information into EXPLAIN TIMESTAMP
seems also super useful and it's very amenable to printing.
let storage = id_bundle.storage_ids.iter().cloned().collect(); | ||
constraints | ||
.lower | ||
.push((since.clone(), Reason::StorageInput(storage))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The since
here and for compute below is over the entire bundle, so could be over stating its needs. It perhaps makes sense to teach ReadHoldsInner
about these separately so this constraint is more accurately expressed.
…traint solver and logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By and large this looks great! Really excited to get it merged!
The two things I would love to see before merging are:
- Moving the constraint based solved to an error-able function, this way we can make as many asserts as we would like, but expose them as errors, without disrupting existing users.
- A dyncfg to toggle between disabled, validate, and enabled.
.iter() | ||
.flat_map(|(anti, _)| anti.iter()) | ||
.cloned() | ||
.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with Antichain
, but I'm assuming that the FromIterator
impl is doing the bulk of the work to get the "greatest lower bound"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to frank, this is so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. For better or worse, Antichain
tracks the lower bound of inserted elements, and so blatting everything down and collecting it results in the lower bound of all the elements, which is the greatest lower bound of all the input antichains.
/// An explanation of reasons for a timestamp constraint. | ||
#[derive(Serialize, Deserialize, Clone)] | ||
pub enum Reason { | ||
/// A compute input at a compute instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some more detail to these comments explaining how these reasons link to more user facing concepts would be great. e.g. right now I'm assuming a ComputeInput
means some collection that is maintained on a replica, like an index? But I'm not totally sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may also be that for the moment we aren't sure either. The PR only takes the arguments to determine_timestamp_for
and projects out the "reasons", and they show up as "compute ids" without clearer identities or reasons attached to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh yeah true, we can definitely further improve the documentation as a follow-up!
Just thinking out loud, attaching a "reason" to the constraints would dovetail well with moving the constraints up a layer, like you mentioned previously @DAlperin
assert!( | ||
session.vars().real_time_recency() | ||
&& isolation_level == &IsolationLevel::StrictSerializable, | ||
"real time recency timestamp should only be supplied when real time recency \ | ||
is enabled and the isolation level is strict serializable" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the concept of a "shadow/validation rollout", assert
-ing in this code seems bad since it can panic environmentd?
Could we move this logic to a separate error-able function and when validation is enabled, log the possible error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're still assert!
-ing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can we add at least one SLT test so we can exercise the printed format of the new output to EXPLAIN TIMESTAMP
Edit: Ignore this! I raced with your most recent push
src/sql/src/plan.rs
Outdated
/// Returns whether the candidate's upper bound should be constrained. | ||
/// This is only true for `AtTimestamp` since it is the only variant that | ||
/// specifies a timestamp. | ||
pub fn should_constrain_upper(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but it's hard to locally think about this. It reads a bit like the other methods, in that it is instructions for code elsewhere, and the should_
prefix indicates some obligation of someone to do something. Is there a way to dig us out of that hole a bit? Maybe "no; let's just aim to delete this enum entirely". But at the moment the comment and the method name seem redundant and not super clear about what needs to be true about this function.
/// Whether to use the constraint-based timestamp selection. | ||
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] | ||
pub enum ConstraintBasedTimestampSelection { | ||
Enabled, | ||
Disabled, | ||
Verify, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right file for this? It seems less like "timestamp oracle" material and more "timestamp selection" material, which uses the oracle as input, but uses various other things as input as well. Would timestamp_selection.rs
be more appropriate, or is there a reason to put it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is I misread it and thought it did say timestamp selection :) will fix
pub struct RawTimestampSelection<T> { | ||
pub timestamp: T, | ||
pub constraints: Option<Constraints>, | ||
pub session_oracle_read_ts: Option<T>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't clear while reading this what role this oracle read timestamp had. I can imagine it has one, but I could also imagine various other arguments play a role too. Not sure what to conclude, but if there is an intent for the type, to be more than a bag of arguments, we should record that. I foresee it growing to look a lot like TimestampDetermination
(apropos: should this be RawTimestampDetermination
, or is the Selection
signifying an important distinction?).
constraints.lower.push(( | ||
Antichain::from_elem(advance_to), | ||
Reason::IsolationLevel(*isolation_level), | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a thing we'll want to revisit, as afaict this is a "preference" rather than a "constraint". My sense is that it is easiest to introduce behavior using constraints, and .. no harm navigating this using constraints for the moment, but does it sound right that this is less a constraint and more a preference?
/// after `since` and sure to be available not after `upper`. | ||
/// | ||
/// The timeline that `id_bundle` belongs to is also returned, if one exists. | ||
fn determine_timestamp_for( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is perhaps future work, but the way that determine_timestamp
calls in to this method might be appropriate to blend in with this logic (iirc, it calls this method up to twice, with different isolation levels, to judge what would happen if you relaxed the isolation level). That logic was incorrect when last I looked at it (it had an await
between the calls, so the two determinations might be unrelated). But, flagging for near future thoughts.
I cannot "approve" the PR on account of I created it. Which is fine, but I wanted to flag that I would have, modulo the various comments which are mostly nits, style thoughts, and near future work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!! Thanks for making the changes!
assert!( | ||
session.vars().real_time_recency() | ||
&& isolation_level == &IsolationLevel::StrictSerializable, | ||
"real time recency timestamp should only be supplied when real time recency \ | ||
is enabled and the isolation level is strict serializable" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're still assert!
-ing here?
/// An explanation of reasons for a timestamp constraint. | ||
#[derive(Serialize, Deserialize, Clone)] | ||
pub enum Reason { | ||
/// A compute input at a compute instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh yeah true, we can definitely further improve the documentation as a follow-up!
Just thinking out loud, attaching a "reason" to the constraints would dovetail well with moving the constraints up a layer, like you mentioned previously @DAlperin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some nits!
let constraint_determination = self.determine_timestamp_via_constraints( | ||
session, | ||
&read_holds, | ||
id_bundle, | ||
when, | ||
oracle_read_ts, | ||
compute_instance, | ||
real_time_recency_ts, | ||
isolation_level, | ||
&timeline, | ||
largest_not_in_advance_of_upper, | ||
)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Verify
, we probably don't want to error the call if the constraints-based determination fails, rather log the error (instead of ?
).
pub fn from_str(s: &str) -> Self { | ||
match s { | ||
"enabled" => Self::Enabled, | ||
"disabled" => Self::Disabled, | ||
"verify" => Self::Verify, | ||
_ => { | ||
tracing::error!("invalid value for ConstraintBasedTimestampSelection: {}", s); | ||
ConstraintBasedTimestampSelection::default() | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could implement this as TryFrom
instead, but totally up to you. It would have the benefit that the caller needs to worry about the error path, not the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dov originally used TryFrom
but I recommended he push the error handling into this method so the caller doesn't have to worry about it :) In my experience the caller would generally just fallback to some default, unifying that across all callers seemed like a good idea
This PR explores what it would look like to pilot the
determine_timestamp
functions using constraints, rather than the current interwoven arguments with .. at least unclear-to-me semantics. The implementation reverses out some of the intents from the settings ofQueryWhen
, the optional oracle timestamp, the isolation level, and the timeline, and translates them to constraints with associated "reasons".The intent is to nudge the implementation to one that starts with actual constraints with reasons, ideally pushing those constraints up the stack to the moment they are expressed. There are also surprising (to me) moments called out in the code where e.g. we avoid choosing the freshest timestamp as a function of
QueryWhen
rather than our isolation (presumably elsewhere the isolation results in a setting of theQueryWhen
?).The logic around
StrongSessionSerializable
is interesting, and thePreference
enum isn't rich enough to represent it. I think this is because it is the only logic at the moment that intentionally trades off freshness and responsiveness; the other code paths sort of end up being wired to do reasonable things, as a result of the contexts in which we call the function with various arguments.Motivation
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.