-
Notifications
You must be signed in to change notification settings - Fork 4k
release-25.4: pgwire: make max repeated error count configurable via cluster setting #155658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-25.4: pgwire: make max repeated error count configurable via cluster setting #155658
Conversation
In 39067de we added behavior to give up and close a network connection if a threshold of repeated errors was reached. It retried on errors since some network errors could be transient. It was retrying tens of thousands of times, which is excessive. We lower this to 256 now. This is motivated by a few tests that identifed the error handling logic in this tight loop being quite expensive. Retrying fewer times means that we'll reduce CPU usage during failure scenarios. Release note: None
Previously, the maximum number of repeated network read errors before aborting a connection was a hardcoded constant set to 256 (1 << 8). This change makes the value configurable via a non-public cluster setting `sql.pgwire.max_repeated_error_count`. This allows operators to tune the threshold for aborting connections experiencing repeated network errors, and allows us to backport this change along with 5f562ad. Epic: None Release note: None
Thanks for opening a backport. Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate. |
pkg/sql/pgwire/server.go
Outdated
// while reading from the network connection before the server decides to give | ||
// up and abort the connection. | ||
const maxRepeatedErrorCount = 1 << 15 | ||
const maxRepeatedErrorCount = 1 << 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you added this commit to the backport, just to prevent merge conflicts, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like in the subsequent commit, the default is 1<<15, which i think makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that was my thinking with this
This is a temporary work around until cockroachdb#155658 is backported to the minor versions that are used by the mixed version test framework. Release note: none Release justification: test only change Fixes: cockroachdb#154125
Backport:
Please see individual commits for details.
Release justification: low risk change that is gated by an opt-in cluster setting; will help reduce CPU usage during network timeouts.