Skip to content

sql: persist CanMutate on function descriptors #171075

Closed
ZhouXing19 wants to merge 5 commits into
cockroachdb:masterfrom
ZhouXing19:can-mutate-desc
Closed

sql: persist CanMutate on function descriptors #171075
ZhouXing19 wants to merge 5 commits into
cockroachdb:masterfrom
ZhouXing19:can-mutate-desc

Conversation

@ZhouXing19

@ZhouXing19 ZhouXing19 commented May 28, 2026

Copy link
Copy Markdown
Collaborator

informs: #167687

With deferred UDF body optimization, body statements are not built into RelExprs at plan time. The execution layer needs to know whether a routine can mutate before execution to choose between LeafTxn and RootTxn (via PlanFlagContainsMutation).

Fix this by computing the CanMutate property at CREATE (or REPLACE) FUNCTION time from the optimizer's transitive Relational().CanMutate logical property and persisting it on the function descriptor. At query time, the persisted value is read through the Overload and UDFDefinition. See the commit message of the 1st commit for more details.

If a function is changing from CANNOT_MUTATE to CAN_MUTATE, all its ancestor functions will be marked as CAN_MUTATE as well. (see the 3rd commit)

Release note: None

@trunk-io

trunk-io Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@blathers-crl

blathers-crl Bot commented May 28, 2026

Copy link
Copy Markdown

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@ZhouXing19 ZhouXing19 force-pushed the can-mutate-desc branch 3 times, most recently from 79098da to fdae0f3 Compare May 28, 2026 12:51
@ZhouXing19 ZhouXing19 marked this pull request as ready for review May 28, 2026 12:52
@ZhouXing19 ZhouXing19 requested review from a team as code owners May 28, 2026 12:52

@spilchen spilchen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, mainly looked at it from a SQL foundations perspective

@spilchen reviewed 78 files and all commit messages, and made 4 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on DrewKimball, michae2, and ZhouXing19).


pkg/upgrade/upgrades/upgrades.go line 225 at r1 (raw file):

	upgrade.NewTenantUpgrade(
		"add can_mutate field to function descriptors",

nit: this string is a bit misleading. The can_mutate field isn't added by the upgrade step. It's a no-op as you stated a few lines below.

You could even argue that we don't need the upgrade step and instead just lean on the clusterversion.V26_3 version check instead. But it's fine to leave too.


pkg/sql/create_function.go line 313 at r4 (raw file):

	// caller resolves its nested callees' CanMutate at its own CALL time.
	if params.p.EvalContext().Settings.Version.IsActive(params.ctx, clusterversion.V26_3_FunctionDescCanMutate) &&
		n.cf.CanMutate != tree.RoutineCanMutateUnknown {

if n.cf.CanMutate is set to tree.RoutineCanMutateUnknown because of late binding, does it make sense to propagate unknown to all of the caller? For instance, if one of the callers wasn't CanMutate originally, but the new replaced function has DML now, but we skip because of late binding, isn't that calling function have the wrong CanMutate value?


pkg/sql/create_function.go line 289 at r1 (raw file):

	}

	// Only persist CanMutate after the version gate is active. During a

nit: this comment is largely a dup of the earlier one in this file. And it's kind of verbose. Suggest make it more concise and/or reducing the duplication in some way.

@ZhouXing19 ZhouXing19 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZhouXing19 made 3 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on DrewKimball, michae2, and spilchen).


pkg/sql/create_function.go line 289 at r1 (raw file):

Previously, spilchen wrote…

nit: this comment is largely a dup of the earlier one in this file. And it's kind of verbose. Suggest make it more concise and/or reducing the duplication in some way.

Good point, replaced with a reference.


pkg/sql/create_function.go line 313 at r4 (raw file):

Previously, spilchen wrote…

if n.cf.CanMutate is set to tree.RoutineCanMutateUnknown because of late binding, does it make sense to propagate unknown to all of the caller? For instance, if one of the callers wasn't CanMutate originally, but the new replaced function has DML now, but we skip because of late binding, isn't that calling function have the wrong CanMutate value?

This is a good call, and make me realized I expanded the impact of late binding too much -- if late-binding + deferred opt-build => CAN_MUTATE, we are overkilling the parallism for plpgsql procedures that doesn't contain mutations or DDLs at all. So I changed it to decouple CAN_MUTATE with late-binding, with a ddlWalker + trial optbuild approach. Please see the updated last commit of this PR.


pkg/upgrade/upgrades/upgrades.go line 225 at r1 (raw file):

Previously, spilchen wrote…

nit: this string is a bit misleading. The can_mutate field isn't added by the upgrade step. It's a no-op as you stated a few lines below.

You could even argue that we don't need the upgrade step and instead just lean on the clusterversion.V26_3 version check instead. But it's fine to leave too.

Good point, updated.

@ZhouXing19 ZhouXing19 requested a review from spilchen May 29, 2026 17:55

@DrewKimball DrewKimball left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flushing some comments on the first commit.

stmtScope := plBuilder.buildRootBlock(stmt.AST, triggerFuncScope, params)
udfDef.Body = []memo.RelExpr{stmtScope.expr}
udfDef.BodyProps = []*physical.Required{stmtScope.makePhysicalProps()}
udfDef.CanMutate = tree.RoutineCanMutateFromBool(stmtScope.expr.Relational().CanMutate)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anonymous routines (DO blocks and trigger functions), CanMutate is
derived directly from the body expression at build time, since these
have no descriptor.

Note that trigger functions do have a descriptor, but they are lazily bound and therefore we can't know for sure when the trigger function is created whether it could have a mutation. That being said, we could probably store CanMutate on the trigger descriptor, which lives on the table the trigger is associated with. Here's where we fully build the routine for a CREATE TRIGGER:

func (b *Builder) buildCreateTrigger(ct *tree.CreateTrigger, inScope *scope) (outScope *scope) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added another commit to persist CanMutate on triggers as well.

Comment thread pkg/clusterversion/cockroach_versions.go Outdated
Comment thread pkg/sql/logictest/testdata/logic_test/mixed_version_udf_can_mutate Outdated
Comment thread pkg/sql/schemachanger/scbuild/internal/scbuildstmt/create_function.go Outdated
Comment thread pkg/sql/sem/tree/create_routine.go Outdated

@spilchen spilchen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: latest changes look good from SQL Foundations point of view

@spilchen made 2 comments and resolved 3 discussions.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on michae2 and ZhouXing19).


pkg/sql/opt/optbuilder/create_function.go line 554 at r12 (raw file):

			probeOK := func() (ok bool) {
				defer func() {
					if r := recover(); r != nil {

this will catch any error, including runtime errors or assertions. Is it safe to ignore those? I'm a little out of my depth here, so feel free to push back if this standard practice. I just thought it was concerning.

Comment thread pkg/sql/opt/optbuilder/create_function.go
Comment thread pkg/sql/opt/optbuilder/create_function.go Outdated
if lateBinding && !skipSQL {
// Stage 2: probe whether optbuild succeeds. If it fails
// (e.g., non-existent reference), fall back to conservative
// CAN_MUTATE with skipSQL=true.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not leave it unknown?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, left it unknown. I also removed this probing part as I agree that it changed the sementics of late binding. It means that if late binding is enabled, all plpgsql functions would be optbuild eagerly when being called, except those with DDLs. I was torn about whether this could be an overkill, but with late-binding off by default, it seems acceptable.

@ZhouXing19 ZhouXing19 force-pushed the can-mutate-desc branch 2 times, most recently from 0b103aa to 438bd73 Compare June 2, 2026 15:01
ZhouXing19 and others added 4 commits June 2, 2026 15:01
When executing a routine, with deferred UDF body optimization, body
statements are not built into RelExprs **until it reaches the execution
layer**. However, **before entering the execution layer**, we need to
know whether a routine can mutate to choose between LeafTxn and
RootTxn, which requires RelExprs of the body statements.

Fix this by computing the CanMutate property at `CREATE (OR REPLACE)
FUNCTION` time persisting it on the function descriptor. At the
execution time of the function, the persisted value is read through the
Overload and UDFDefinition and used to determine whether LeafTxn or RootTxn
should be used for the execution.

(This commit is based on the assumption that function body statement
is resolved at the CREATE OR REPLACE FUNCTION time, which could be broken
if late-binding is enabled. Left TODOs to have them addressed in the
follow-up commit.)

Specifically, this new descriptor field uses a three-way enum
(UNKNOWN_CAN_MUTATE, CAN_MUTATE, CANNOT_MUTATE). The
zero value UNKNOWN_CAN_MUTATE means "not yet determined" and causes
consumers to fall back to inspecting the eagerly-built body RelExprs.
This handles pre-existing function descriptors created before this
field was introduced without requiring a migration: they naturally have
the zero value, which triggers the correct fallback behavior. Functions
created or replaced once the cluster is on 26.3 get CAN_MUTATE or
CANNOT_MUTATE, allowing consumers to skip the body inspection.

For DO blocks (which are genuinely anonymous and have no descriptor)
CanMutate is derived directly from the body at build time.

Writing CanMutate is gated on the cluster being on 26.3 (V26_3_Start),
since the can_mutate field does not exist on pre-26.3 binaries. No
dedicated version key or migration is needed: pre-existing descriptors
keep the zero value (UNKNOWN), and consumers fall back to inspecting the
body.

Release note: None
A trigger function's own descriptor can't hold a meaningful CanMutate:
its body isn't built until the function is bound to a table, which only
happens at CREATE TRIGGER time. This commit computes CanMutate from the
trigger function body built in the trigger's table context and stores it
on the trigger descriptor, mirroring the function descriptor field.
Same three-way enum, and the write is gated on the cluster being on 26.3
(V26_3_Start), since the field doesn't exist on older binaries;
pre-existing descriptors keep the zero value and fall back to the body.

Persist-only for now: trigger functions are still eagerly optbuilt when
the trigger fires, so buildTriggerFunction keeps deriving CanMutate
from the fresh body. The persisted value only becomes load-bearing once
deferred optbuild lands for triggers; a TODO there marks where to read
it from the descriptor instead.

Release note: None
…ger descriptors

Split out from the previous commits to make the reviewer's life easier
since it's all mechanical golden file and generated output updates
(bootstrap testdata, schema changer side_effects/explain plans, settings
docs, generated test harnesses, UML).

Release note: None
When a function/trigger is replaced via CREATE OR REPLACE and becomes mutating,
propagate CAN_MUTATE transitively to all caller functions/triggers via the
DependedOnBy back-references. Only the non-mutating → mutating direction
is propagated; the reverse is left conservative (callers keep CAN_MUTATE)
because determining true non-mutating status would require re-analyzing
each caller's entire body to ensure no child routine mutates at all.

Release note: None

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
A late-bound PL/pgSQL procedure body is not optbuilt at CREATE time, so
we cannot determine there whether it mutates. So if late-binding is
enabled, we traverse the body statements to see if they contain any
DDL. If yes, such function would be tagged with CAN_MUTATE. If not,
leave CanMutate UNKNOWN and persist it. When the function is called,
the UNKNOWN CanMutate would force eager-building the body statements,
as opposed to deferring the optbuild till executing the body stmt.

Epic: none
Release note: None

@ZhouXing19 ZhouXing19 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, RFAL, thanks!
(Main changes from the last review cycle: 1. Now persist CanMutate on trigger descriptor as well; 2. Removed the probing optbuild when late binding is enabled)

@ZhouXing19 made 10 comments and resolved 1 discussion.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on DrewKimball, michae2, and spilchen).


pkg/sql/opt/optbuilder/create_function.go line 554 at r12 (raw file):

Previously, spilchen wrote…

this will catch any error, including runtime errors or assertions. Is it safe to ignore those? I'm a little out of my depth here, so feel free to push back if this standard practice. I just thought it was concerning.

Removed this probing part.

Comment thread pkg/sql/opt/optbuilder/create_function.go
if lateBinding && !skipSQL {
// Stage 2: probe whether optbuild succeeds. If it fails
// (e.g., non-existent reference), fall back to conservative
// CAN_MUTATE with skipSQL=true.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, left it unknown. I also removed this probing part as I agree that it changed the sementics of late binding. It means that if late binding is enabled, all plpgsql functions would be optbuild eagerly when being called, except those with DDLs. I was torn about whether this could be an overkill, but with late-binding off by default, it seems acceptable.

Comment thread pkg/sql/opt/optbuilder/create_function.go Outdated
stmtScope := plBuilder.buildRootBlock(stmt.AST, triggerFuncScope, params)
udfDef.Body = []memo.RelExpr{stmtScope.expr}
udfDef.BodyProps = []*physical.Required{stmtScope.makePhysicalProps()}
udfDef.CanMutate = tree.RoutineCanMutateFromBool(stmtScope.expr.Relational().CanMutate)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added another commit to persist CanMutate on triggers as well.

Comment thread pkg/sql/schemachanger/scbuild/internal/scbuildstmt/create_function.go Outdated
Comment thread pkg/sql/sem/tree/create_routine.go Outdated
Comment thread pkg/sql/logictest/testdata/logic_test/mixed_version_udf_can_mutate Outdated
Comment thread pkg/clusterversion/cockroach_versions.go Outdated
@ZhouXing19 ZhouXing19 closed this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants