sqlinstance: wait for active session before setting txn deadline#170162
sqlinstance: wait for active session before setting txn deadline#170162Nukitt wants to merge 1 commit into
Conversation
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
| log.Dev.Infof(ctx, | ||
| "session %s expiration %s is behind current time, waiting for heartbeat to extend it", | ||
| session.ID(), session.Expiration()) | ||
| const maxWait = 10 * time.Second |
There was a problem hiding this comment.
Since the sqlliveness heartbeat is a config param, should we set it to 2 * config value (two cycles) instead of hardcoding it? It seems to be an internal setting and I'm not sure if it is practically ever set to a non-default value though.
| return errors.Wrap(ctx.Err(), "waiting for active session") | ||
| } | ||
| return errors.Newf( | ||
| "session %s expiration %s did not advance past current time; session may have expired", |
There was a problem hiding this comment.
Can we log the current clock time as well?
|
|
||
| ctx := context.Background() | ||
|
|
||
| t.Run("expiration already valid", func(t *testing.T) { |
There was a problem hiding this comment.
We don't have a test for the case that this fix primarily addresses - a session with a stale expiration that is advanced when heartbeat extends it.
| if ctx.Err() != nil { | ||
| return errors.Wrap(ctx.Err(), "waiting for active session") | ||
| } | ||
| return errors.Newf( |
There was a problem hiding this comment.
Need a test that exercises this condition
During server startup under heavy load, the sqlliveness session's expiration could fall behind the current clock time before the heartbeat loop extended it. When createInstanceRow set the transaction deadline to this stale expiration, UpdateDeadline rejected it with "deadline below read timestamp is nonsensical", crashing the server. Add waitForActiveSession, which polls until the session expiration advances past the current clock time. The heartbeat loop (server.sqlliveness.heartbeat, ~5s default) extends the session atomically, so the wait is brief. The timeout is 2x the heartbeat interval, derived from the cluster setting. If the session never advances, a clear error is returned instead of an assertion failure. Fixes: cockroachdb#135273 Epic: none Release note: None
1c6b7d8 to
6933984
Compare
During server startup under heavy load, the sqlliveness session's expiration could fall behind the current clock time before the heartbeat loop extended it. When
createInstanceRowset the transaction deadline to this stale expiration, UpdateDeadline rejected it with "deadline below read timestamp is nonsensical", crashing the server.Wait for the session expiration to advance past the current clock time before starting the transaction. The heartbeat loop (~5s interval) extends the session atomically, so the wait is brief. If the session never advances (e.g. it expired), return a clear error after 10s instead of an assertion failure.
Fixes: #135273
Epic: None
Release note: None