Skip to content

Conversation

spilchen
Copy link
Contributor

Long-running INSPECT statements (e.g., >3h) have previously failed due to connection resets, despite no server crash. This likely stems from a missed keep-alive or a client timeout, though the exact cause is unclear.

This change updates the roachtest to:

  • Run the INSPECT statement as a background job using a short statement_timeout (5s).
  • Poll the job table for job completion instead of waiting synchronously.
  • Report progress at 10% intervals to improve visibility without overwhelming the logs.

Fixes #155610

Release note: none

Epic: none

@spilchen spilchen self-assigned this Oct 17, 2025
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Long-running INSPECT statements (e.g., >3h) have previously failed due
to connection resets, despite no server crash. This likely stems from a
missed keep-alive or a client timeout, though the exact cause is
unclear.

This change updates the roachtest to:
- Run the INSPECT statement as a background job using a short
  statement_timeout (5s).
- Poll the job table for job completion instead of waiting
  synchronously.
- Report progress at 10% intervals to improve visibility without
  overwhelming the logs.

Fixes cockroachdb#155610

Release note: none

Epic: none
@spilchen spilchen force-pushed the gh-155610/251017/1508/inspect-long-run/pr-ready branch from 3a5fe19 to a2a23a2 Compare October 20, 2025 11:26
@spilchen spilchen marked this pull request as ready for review October 20, 2025 11:26
@spilchen spilchen requested review from a team and bghal October 20, 2025 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: inspect/throughput/bulkingest/nodes=12/cpu=8/rows=1000000000/checks=2 failed

2 participants