sql: persist CanMutate on function descriptors #171075
Conversation
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
79098da to
fdae0f3
Compare
fdae0f3 to
ee38a1a
Compare
spilchen
left a comment
There was a problem hiding this comment.
looks good, mainly looked at it from a SQL foundations perspective
@spilchen reviewed 78 files and all commit messages, and made 4 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on DrewKimball, michae2, and ZhouXing19).
pkg/upgrade/upgrades/upgrades.go line 225 at r1 (raw file):
upgrade.NewTenantUpgrade( "add can_mutate field to function descriptors",
nit: this string is a bit misleading. The can_mutate field isn't added by the upgrade step. It's a no-op as you stated a few lines below.
You could even argue that we don't need the upgrade step and instead just lean on the clusterversion.V26_3 version check instead. But it's fine to leave too.
pkg/sql/create_function.go line 313 at r4 (raw file):
// caller resolves its nested callees' CanMutate at its own CALL time. if params.p.EvalContext().Settings.Version.IsActive(params.ctx, clusterversion.V26_3_FunctionDescCanMutate) && n.cf.CanMutate != tree.RoutineCanMutateUnknown {
if n.cf.CanMutate is set to tree.RoutineCanMutateUnknown because of late binding, does it make sense to propagate unknown to all of the caller? For instance, if one of the callers wasn't CanMutate originally, but the new replaced function has DML now, but we skip because of late binding, isn't that calling function have the wrong CanMutate value?
pkg/sql/create_function.go line 289 at r1 (raw file):
} // Only persist CanMutate after the version gate is active. During a
nit: this comment is largely a dup of the earlier one in this file. And it's kind of verbose. Suggest make it more concise and/or reducing the duplication in some way.
ee38a1a to
0af4056
Compare
ZhouXing19
left a comment
There was a problem hiding this comment.
@ZhouXing19 made 3 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on DrewKimball, michae2, and spilchen).
pkg/sql/create_function.go line 289 at r1 (raw file):
Previously, spilchen wrote…
nit: this comment is largely a dup of the earlier one in this file. And it's kind of verbose. Suggest make it more concise and/or reducing the duplication in some way.
Good point, replaced with a reference.
pkg/sql/create_function.go line 313 at r4 (raw file):
Previously, spilchen wrote…
if
n.cf.CanMutateis set totree.RoutineCanMutateUnknownbecause of late binding, does it make sense to propagate unknown to all of the caller? For instance, if one of the callers wasn't CanMutate originally, but the new replaced function has DML now, but we skip because of late binding, isn't that calling function have the wrong CanMutate value?
This is a good call, and make me realized I expanded the impact of late binding too much -- if late-binding + deferred opt-build => CAN_MUTATE, we are overkilling the parallism for plpgsql procedures that doesn't contain mutations or DDLs at all. So I changed it to decouple CAN_MUTATE with late-binding, with a ddlWalker + trial optbuild approach. Please see the updated last commit of this PR.
pkg/upgrade/upgrades/upgrades.go line 225 at r1 (raw file):
Previously, spilchen wrote…
nit: this string is a bit misleading. The
can_mutatefield isn't added by the upgrade step. It's a no-op as you stated a few lines below.You could even argue that we don't need the upgrade step and instead just lean on the
clusterversion.V26_3version check instead. But it's fine to leave too.
Good point, updated.
0af4056 to
c73a336
Compare
DrewKimball
left a comment
There was a problem hiding this comment.
Flushing some comments on the first commit.
| stmtScope := plBuilder.buildRootBlock(stmt.AST, triggerFuncScope, params) | ||
| udfDef.Body = []memo.RelExpr{stmtScope.expr} | ||
| udfDef.BodyProps = []*physical.Required{stmtScope.makePhysicalProps()} | ||
| udfDef.CanMutate = tree.RoutineCanMutateFromBool(stmtScope.expr.Relational().CanMutate) |
There was a problem hiding this comment.
For anonymous routines (DO blocks and trigger functions), CanMutate is
derived directly from the body expression at build time, since these
have no descriptor.
Note that trigger functions do have a descriptor, but they are lazily bound and therefore we can't know for sure when the trigger function is created whether it could have a mutation. That being said, we could probably store CanMutate on the trigger descriptor, which lives on the table the trigger is associated with. Here's where we fully build the routine for a CREATE TRIGGER:
There was a problem hiding this comment.
Good point, added another commit to persist CanMutate on triggers as well.
spilchen
left a comment
There was a problem hiding this comment.
latest changes look good from SQL Foundations point of view
@spilchen made 2 comments and resolved 3 discussions.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on michae2 and ZhouXing19).
pkg/sql/opt/optbuilder/create_function.go line 554 at r12 (raw file):
probeOK := func() (ok bool) { defer func() { if r := recover(); r != nil {
this will catch any error, including runtime errors or assertions. Is it safe to ignore those? I'm a little out of my depth here, so feel free to push back if this standard practice. I just thought it was concerning.
| if lateBinding && !skipSQL { | ||
| // Stage 2: probe whether optbuild succeeds. If it fails | ||
| // (e.g., non-existent reference), fall back to conservative | ||
| // CAN_MUTATE with skipSQL=true. |
There was a problem hiding this comment.
Why not leave it unknown?
There was a problem hiding this comment.
Good call, left it unknown. I also removed this probing part as I agree that it changed the sementics of late binding. It means that if late binding is enabled, all plpgsql functions would be optbuild eagerly when being called, except those with DDLs. I was torn about whether this could be an overkill, but with late-binding off by default, it seems acceptable.
0b103aa to
438bd73
Compare
When executing a routine, with deferred UDF body optimization, body statements are not built into RelExprs **until it reaches the execution layer**. However, **before entering the execution layer**, we need to know whether a routine can mutate to choose between LeafTxn and RootTxn, which requires RelExprs of the body statements. Fix this by computing the CanMutate property at `CREATE (OR REPLACE) FUNCTION` time persisting it on the function descriptor. At the execution time of the function, the persisted value is read through the Overload and UDFDefinition and used to determine whether LeafTxn or RootTxn should be used for the execution. (This commit is based on the assumption that function body statement is resolved at the CREATE OR REPLACE FUNCTION time, which could be broken if late-binding is enabled. Left TODOs to have them addressed in the follow-up commit.) Specifically, this new descriptor field uses a three-way enum (UNKNOWN_CAN_MUTATE, CAN_MUTATE, CANNOT_MUTATE). The zero value UNKNOWN_CAN_MUTATE means "not yet determined" and causes consumers to fall back to inspecting the eagerly-built body RelExprs. This handles pre-existing function descriptors created before this field was introduced without requiring a migration: they naturally have the zero value, which triggers the correct fallback behavior. Functions created or replaced once the cluster is on 26.3 get CAN_MUTATE or CANNOT_MUTATE, allowing consumers to skip the body inspection. For DO blocks (which are genuinely anonymous and have no descriptor) CanMutate is derived directly from the body at build time. Writing CanMutate is gated on the cluster being on 26.3 (V26_3_Start), since the can_mutate field does not exist on pre-26.3 binaries. No dedicated version key or migration is needed: pre-existing descriptors keep the zero value (UNKNOWN), and consumers fall back to inspecting the body. Release note: None
A trigger function's own descriptor can't hold a meaningful CanMutate: its body isn't built until the function is bound to a table, which only happens at CREATE TRIGGER time. This commit computes CanMutate from the trigger function body built in the trigger's table context and stores it on the trigger descriptor, mirroring the function descriptor field. Same three-way enum, and the write is gated on the cluster being on 26.3 (V26_3_Start), since the field doesn't exist on older binaries; pre-existing descriptors keep the zero value and fall back to the body. Persist-only for now: trigger functions are still eagerly optbuilt when the trigger fires, so buildTriggerFunction keeps deriving CanMutate from the fresh body. The persisted value only becomes load-bearing once deferred optbuild lands for triggers; a TODO there marks where to read it from the descriptor instead. Release note: None
…ger descriptors Split out from the previous commits to make the reviewer's life easier since it's all mechanical golden file and generated output updates (bootstrap testdata, schema changer side_effects/explain plans, settings docs, generated test harnesses, UML). Release note: None
When a function/trigger is replaced via CREATE OR REPLACE and becomes mutating, propagate CAN_MUTATE transitively to all caller functions/triggers via the DependedOnBy back-references. Only the non-mutating → mutating direction is propagated; the reverse is left conservative (callers keep CAN_MUTATE) because determining true non-mutating status would require re-analyzing each caller's entire body to ensure no child routine mutates at all. Release note: None Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
438bd73 to
702fb50
Compare
702fb50 to
a6524cb
Compare
A late-bound PL/pgSQL procedure body is not optbuilt at CREATE time, so we cannot determine there whether it mutates. So if late-binding is enabled, we traverse the body statements to see if they contain any DDL. If yes, such function would be tagged with CAN_MUTATE. If not, leave CanMutate UNKNOWN and persist it. When the function is called, the UNKNOWN CanMutate would force eager-building the body statements, as opposed to deferring the optbuild till executing the body stmt. Epic: none Release note: None
a6524cb to
c3359f7
Compare
ZhouXing19
left a comment
There was a problem hiding this comment.
Sorry for the delay, RFAL, thanks!
(Main changes from the last review cycle: 1. Now persist CanMutate on trigger descriptor as well; 2. Removed the probing optbuild when late binding is enabled)
@ZhouXing19 made 10 comments and resolved 1 discussion.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on DrewKimball, michae2, and spilchen).
pkg/sql/opt/optbuilder/create_function.go line 554 at r12 (raw file):
Previously, spilchen wrote…
this will catch any error, including runtime errors or assertions. Is it safe to ignore those? I'm a little out of my depth here, so feel free to push back if this standard practice. I just thought it was concerning.
Removed this probing part.
| if lateBinding && !skipSQL { | ||
| // Stage 2: probe whether optbuild succeeds. If it fails | ||
| // (e.g., non-existent reference), fall back to conservative | ||
| // CAN_MUTATE with skipSQL=true. |
There was a problem hiding this comment.
Good call, left it unknown. I also removed this probing part as I agree that it changed the sementics of late binding. It means that if late binding is enabled, all plpgsql functions would be optbuild eagerly when being called, except those with DDLs. I was torn about whether this could be an overkill, but with late-binding off by default, it seems acceptable.
| stmtScope := plBuilder.buildRootBlock(stmt.AST, triggerFuncScope, params) | ||
| udfDef.Body = []memo.RelExpr{stmtScope.expr} | ||
| udfDef.BodyProps = []*physical.Required{stmtScope.makePhysicalProps()} | ||
| udfDef.CanMutate = tree.RoutineCanMutateFromBool(stmtScope.expr.Relational().CanMutate) |
There was a problem hiding this comment.
Good point, added another commit to persist CanMutate on triggers as well.
informs: #167687
With deferred UDF body optimization, body statements are not built into RelExprs at plan time. The execution layer needs to know whether a routine can mutate before execution to choose between LeafTxn and RootTxn (via PlanFlagContainsMutation).
Fix this by computing the
CanMutateproperty atCREATE (or REPLACE) FUNCTIONtime from the optimizer's transitiveRelational().CanMutatelogical property and persisting it on the function descriptor. At query time, the persisted value is read through theOverloadandUDFDefinition. See the commit message of the 1st commit for more details.If a function is changing from
CANNOT_MUTATEtoCAN_MUTATE, all its ancestor functions will be marked asCAN_MUTATEas well. (see the 3rd commit)Release note: None