Skip to content

sqlinstance: wait for active session before setting txn deadline#170162

Closed
Nukitt wants to merge 1 commit into
cockroachdb:masterfrom
Nukitt:start-timestamp-nonsensical
Closed

sqlinstance: wait for active session before setting txn deadline#170162
Nukitt wants to merge 1 commit into
cockroachdb:masterfrom
Nukitt:start-timestamp-nonsensical

Conversation

@Nukitt

@Nukitt Nukitt commented May 12, 2026

Copy link
Copy Markdown
Contributor

During server startup under heavy load, the sqlliveness session's expiration could fall behind the current clock time before the heartbeat loop extended it. When
createInstanceRow set the transaction deadline to this stale expiration, UpdateDeadline rejected it with "deadline below read timestamp is nonsensical", crashing the server.

Wait for the session expiration to advance past the current clock time before starting the transaction. The heartbeat loop (~5s interval) extends the session atomically, so the wait is brief. If the session never advances (e.g. it expired), return a clear error after 10s instead of an assertion failure.

Fixes: #135273
Epic: None
Release note: None

@Nukitt Nukitt requested a review from a team as a code owner May 12, 2026 06:30
@trunk-io

trunk-io Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

log.Dev.Infof(ctx,
"session %s expiration %s is behind current time, waiting for heartbeat to extend it",
session.ID(), session.Expiration())
const maxWait = 10 * time.Second

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the sqlliveness heartbeat is a config param, should we set it to 2 * config value (two cycles) instead of hardcoding it? It seems to be an internal setting and I'm not sure if it is practically ever set to a non-default value though.

return errors.Wrap(ctx.Err(), "waiting for active session")
}
return errors.Newf(
"session %s expiration %s did not advance past current time; session may have expired",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log the current clock time as well?


ctx := context.Background()

t.Run("expiration already valid", func(t *testing.T) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a test for the case that this fix primarily addresses - a session with a stale expiration that is advanced when heartbeat extends it.

if ctx.Err() != nil {
return errors.Wrap(ctx.Err(), "waiting for active session")
}
return errors.Newf(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a test that exercises this condition

During server startup under heavy load, the sqlliveness session's
expiration could fall behind the current clock time before the heartbeat
loop extended it. When createInstanceRow set the transaction deadline to
this stale expiration, UpdateDeadline rejected it with "deadline below
read timestamp is nonsensical", crashing the server.

Add waitForActiveSession, which polls until the session expiration
advances past the current clock time. The heartbeat loop
(server.sqlliveness.heartbeat, ~5s default) extends the session
atomically, so the wait is brief. The timeout is 2x the heartbeat
interval, derived from the cluster setting. If the session never
advances, a clear error is returned instead of an assertion failure.

Fixes: cockroachdb#135273
Epic: none
Release note: None
@Nukitt Nukitt force-pushed the start-timestamp-nonsensical branch from 1c6b7d8 to 6933984 Compare June 5, 2026 09:54
@Nukitt Nukitt closed this Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cli: start can fail with 'deadline below read timestamp is nonsensical'

3 participants