Skip to content

Conversation

rafiss
Copy link
Collaborator

@rafiss rafiss commented Oct 17, 2025

Backport:

Please see individual commits for details.

Release justification: low risk change that is gated by an opt-in cluster setting; will help reduce CPU usage during network timeouts.

In 39067de we added behavior to give up and close a network connection
if a threshold of repeated errors was reached. It retried on errors
since some network errors could be transient.

It was retrying tens of thousands of times, which is excessive. We lower
this to 256 now. This is motivated by a few tests that identifed the
error handling logic in this tight loop being quite expensive. Retrying
fewer times means that we'll reduce CPU usage during failure scenarios.

Release note: None
Previously, the maximum number of repeated network read errors before
aborting a connection was a hardcoded constant set to 256 (1 << 8).
This change makes the value configurable via a non-public cluster
setting `sql.pgwire.max_repeated_error_count`.

This allows operators to tune the threshold for aborting connections
experiencing repeated network errors, and allows us to backport this
change along with 5f562ad.

Epic: None
Release note: None
@rafiss rafiss requested a review from msbutler October 17, 2025 21:55
@rafiss rafiss requested review from a team as code owners October 17, 2025 21:55
Copy link

blathers-crl bot commented Oct 17, 2025

Thanks for opening a backport.

Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate.

@blathers-crl blathers-crl bot added backport Label PR's that are backports to older release branches T-multitenant Issues owned by the multi-tenant virtual team labels Oct 17, 2025
@cockroach-teamcity
Copy link
Member

This change is Reviewable

// while reading from the network connection before the server decides to give
// up and abort the connection.
const maxRepeatedErrorCount = 1 << 15
const maxRepeatedErrorCount = 1 << 8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you added this commit to the backport, just to prevent merge conflicts, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like in the subsequent commit, the default is 1<<15, which i think makes sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that was my thinking with this

@rafiss rafiss merged commit 8059a81 into cockroachdb:release-25.4 Oct 20, 2025
24 of 25 checks passed
jeffswenson added a commit to jeffswenson/cockroach that referenced this pull request Oct 20, 2025
This is a temporary work around until cockroachdb#155658 is backported to the
minor versions that are used by the mixed version test framework.

Release note: none
Release justification: test only change
Fixes: cockroachdb#154125
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches T-multitenant Issues owned by the multi-tenant virtual team target-release-25.4.1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants