Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-43745: Add IdleCloseOnResponse field to IngressControllerSpec #2102

Conversation

frobware
Copy link
Contributor

@frobware frobware commented Nov 18, 2024

Introduce a new knob, IdleConnectionTerminationPolicy, in the IngressController configuration to control how idle connections are handled during router reloads.

Context

In OCPBUGS-32044, the idle-close-on-response option was unconditionally added to the HAProxy confuguration to address issues with incoming HTTP requests failing during router reloads. This issue primarily affected Apache HTTPClient versions prior to 5.0, which do not gracefully handle connection resets. Adding the option ensured that idle connections were left open to handle one final request before being closed.

Historically, HAProxy 2.2 maintained idle connections during router reloads by default, allowing requests on those connections to complete even when routing configuration changes were applied. Starting with HAProxy 2.4, the default behaviour changed to close idle connections immediately during soft reloads.

To accommodate existing clients dependent on the HAProxy 2.2 behaviour, the unconditional addition of idle-close-on-response restored the previous OpenShift status quo, where customers upgrading their OpenShift clusters experienced a behaviour change due to the jump from HAProxy 2.2 to 2.6, which altered the default handling of idle connections during router reloads.

However, unconditionally enabling idle-close-on-response has now led to issues (OCPBUGS-43745) with Route backend switching. When a Route switches its service backend, requests on persistent connections could continue being routed to the previously active backend due to HAProxy handling these connections in the old process. This behaviour occurs until the connection is closed, either by a new request, the expiration of the client keep-alive, or the expiration of the HAProxy timeout http-keep-alive 300s. While this behaviour is desirable in some cases (e.g., for clients sensitive to connection resets), it can lead to temporary inconsistencies and unexpected routing behaviour during backend switching.

This PR addresses these regressions by making the behaviour configurable through a new knob.

Changes

  • Add a new knob, IdleConnectionTerminationPolicy, to the IngressController configuration.
  • Provide two modes:
    • Immediate: Closes idle connections immediately during router reloads.
    • Deferred: Maintains idle connections during router reloads.

Behavioural Differences

  • Immediate (New Default in OpenShift 4.19+):

    • Closes idle connections immediately during router reloads.
    • Ensures immediate propagation of route changes.
    • May reset connections for clients sensitive to connection interruptions.
  • Deferred (Default for backports to 4.14–4.18):

    • Keeps idle connections open until a terminating event, such as:
      • A new request on the connection (handled by the old HAProxy process).
      • Expiration of haproxy timeout http-keep-alive (300 seconds in OpenShift).
      • The client closing the connection.
    • Restores pre-2.4 HAProxy behaviour for idle connection handling.
    • May cause temporary inconsistencies during the first request on a persistent connection after a reload.
    • Can lead to resource accumulation if connections remain idle for extended periods, especially with frequent reloads.

References:

Copy link
Contributor

openshift-ci bot commented Nov 18, 2024

Hello @frobware! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 18, 2024
@openshift-ci openshift-ci bot requested review from bparees and knobunc November 18, 2024 13:41
@frobware frobware force-pushed the OCPBUGS-43745-idle-close-on-response branch from e70b743 to 41b507e Compare November 18, 2024 16:28
@frobware frobware marked this pull request as draft November 18, 2024 16:44
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 18, 2024
@frobware frobware changed the title OCPBUGS 43745 idle close on response OCPBUGS-43745: Add IdleCloseOnResponse field to IngressControllerSpec Nov 18, 2024
@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Nov 18, 2024
@openshift-ci-robot
Copy link

@frobware: This pull request references Jira Issue OCPBUGS-43745, which is invalid:

  • expected the bug to target the "4.18.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

  • OCPBUGS-43745: make generate-with-container
  • OCPBUGS-43745: Add IdleCloseOnResponse field to IngressControllerTuningOptions

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@frobware
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link

@frobware: This pull request references Jira Issue OCPBUGS-43745, which is invalid:

  • expected the bug to target either version "4.18." or "openshift-4.18.", but it targets "4.19.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@frobware frobware force-pushed the OCPBUGS-43745-idle-close-on-response branch 2 times, most recently from 708ac88 to 84689bf Compare November 19, 2024 10:15
frobware added a commit to frobware/cluster-ingress-operator that referenced this pull request Nov 19, 2024
Pickup openshift/api#2102

% go mod edit -replace github.com/openshift/api=github.com/frobware/api@84689bf6752251547541a87d3cfb891f9c6add29
% go mod tidy
% go mod vendor
@frobware frobware marked this pull request as ready for review November 21, 2024 07:40
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2024
@openshift-ci openshift-ci bot requested a review from JoelSpeed November 21, 2024 07:40
Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow why backports would set the default to Deferred rather than Immediate, if the change in behaviour has already been made as of 4.14 (HAProxy 2.2 to 2.4), doesn't this break a change that was done a while back? What did I miss?

operator/v1/types_ingress.go Outdated Show resolved Hide resolved
operator/v1/types_ingress.go Show resolved Hide resolved
@frobware frobware force-pushed the OCPBUGS-43745-idle-close-on-response branch from 1c1e45f to 000e4bc Compare December 9, 2024 16:02
@frobware
Copy link
Contributor Author

frobware commented Dec 9, 2024

I'm not sure I follow why backports would set the default to Deferred rather than Immediate, if the change in behaviour has already been made as of 4.14 (HAProxy 2.2 to 2.4), doesn't this break a change that was done a while back? What did I miss?

The reason for defaulting to "Deferred" in backports is that OCPBUGS-32044 unconditionally made "Deferred" the default behaviour. Switching to "Immediate" in older releases would break customers by changing the behaviour in a patch release, which we need to avoid.

In hindsight, adding a knob initially would have been better, allowing customers to opt in if they wanted pre-2.4 HAProxy behaviour.

While I think "Immediate" makes the most sense as the default, using it now would break customers who rely on idle-close-on-response being enabled by default in all versions ≤ 4.18. Changing behaviour in a .z release isn’t something I want to do.

@frobware
Copy link
Contributor Author

/test integration

frobware added a commit to frobware/cluster-ingress-operator that referenced this pull request Dec 10, 2024
Pickup openshift/api#2102

% go mod edit -replace github.com/openshift/api=github.com/frobware/api@4ed96211ff956d09dd0fb1de8638547eb65a51c1
% go mod tidy
% go mod vendor
frobware added a commit to frobware/cluster-ingress-operator that referenced this pull request Dec 10, 2024
Pickup openshift/api#2102

% go mod edit -replace github.com/openshift/api=github.com/frobware/api@4ed96211ff956d09dd0fb1de8638547eb65a51c1
% go mod tidy
% go mod vendor
@alebedev87
Copy link
Contributor

/lgtm
/hold

Holding for @grzpiotrowski to have a final look.

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Dec 18, 2024
Copy link
Contributor

openshift-ci bot commented Dec 18, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alebedev87, frobware, JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@grzpiotrowski
Copy link

Everything looks good to me. I really appreciate the detailed description of the changes.

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 18, 2024
Copy link
Contributor

openshift-ci bot commented Dec 18, 2024

@frobware: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 2731647 into openshift:master Dec 18, 2024
11 checks passed
@openshift-ci-robot
Copy link

@frobware: Jira Issue OCPBUGS-43745: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-43745 has not been moved to the MODIFIED state.

In response to this:

Introduce a new knob, IdleConnectionTerminationPolicy, in the IngressController configuration to control how idle connections are handled during router reloads.

Context

In OCPBUGS-32044, the idle-close-on-response option was unconditionally added to the HAProxy confuguration to address issues with incoming HTTP requests failing during router reloads. This issue primarily affected Apache HTTPClient versions prior to 5.0, which do not gracefully handle connection resets. Adding the option ensured that idle connections were left open to handle one final request before being closed.

Historically, HAProxy 2.2 maintained idle connections during router reloads by default, allowing requests on those connections to complete even when routing configuration changes were applied. Starting with HAProxy 2.4, the default behaviour changed to close idle connections immediately during soft reloads.

To accommodate existing clients dependent on the HAProxy 2.2 behaviour, the unconditional addition of idle-close-on-response restored the previous OpenShift status quo, where customers upgrading their OpenShift clusters experienced a behaviour change due to the jump from HAProxy 2.2 to 2.6, which altered the default handling of idle connections during router reloads.

However, unconditionally enabling idle-close-on-response has now led to issues (OCPBUGS-43745) with Route backend switching. When a Route switches its service backend, requests on persistent connections could continue being routed to the previously active backend due to HAProxy handling these connections in the old process. This behaviour occurs until the connection is closed, either by a new request, the expiration of the client keep-alive, or the expiration of the HAProxy timeout http-keep-alive 300s. While this behaviour is desirable in some cases (e.g., for clients sensitive to connection resets), it can lead to temporary inconsistencies and unexpected routing behaviour during backend switching.

This PR addresses these regressions by making the behaviour configurable through a new knob.

Changes

  • Add a new knob, IdleConnectionTerminationPolicy, to the IngressController configuration.
  • Provide two modes:
  • Immediate: Closes idle connections immediately during router reloads.
  • Deferred: Maintains idle connections during router reloads.

Behavioural Differences

  • Immediate (New Default in OpenShift 4.19+):

  • Closes idle connections immediately during router reloads.

  • Ensures immediate propagation of route changes.

  • May reset connections for clients sensitive to connection interruptions.

  • Deferred (Default for backports to 4.14–4.18):

  • Keeps idle connections open until a terminating event, such as:

    • A new request on the connection (handled by the old HAProxy process).
    • Expiration of haproxy timeout http-keep-alive (300 seconds in OpenShift).
    • The client closing the connection.
  • Restores pre-2.4 HAProxy behaviour for idle connection handling.

  • May cause temporary inconsistencies during the first request on a persistent connection after a reload.

  • Can lead to resource accumulation if connections remain idle for extended periods, especially with frequent reloads.

References:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: ose-cluster-config-api
This PR has been included in build ose-cluster-config-api-container-v4.19.0-202412181337.p0.g2731647.assembly.stream.el9.
All builds following this will include this PR.

frobware added a commit to frobware/cluster-ingress-operator that referenced this pull request Dec 19, 2024
Pickup openshift/api#2102

% git show 27316471eb72fe8fcf0d44fb5a0602f698f253dc
commit 27316471eb72fe8fcf0d44fb5a0602f698f253dc
Merge: de9de05a8 b7417509c
Author: openshift-merge-bot[bot] <148852131+openshift-merge-bot[bot]@users.noreply.github.com>
Date:   Wed Dec 18 10:31:50 2024 +0000

    Merge pull request #2102 from frobware/OCPBUGS-43745-idle-close-on-response

    OCPBUGS-43745: Add IdleCloseOnResponse field to IngressControllerSpec

Vendoring steps:

$ go mod edit -replace github.com/openshift/api=github.com/openshift/api@27316471eb72fe8fcf0d44fb5a0602f698f253dc
$ go mod tidy
$ go mod vendor
$ make update
frobware added a commit to frobware/cluster-ingress-operator that referenced this pull request Jan 6, 2025
Pickup openshift/api#2102

% git show 27316471eb72fe8fcf0d44fb5a0602f698f253dc
commit 27316471eb72fe8fcf0d44fb5a0602f698f253dc
Merge: de9de05a8 b7417509c
Author: openshift-merge-bot[bot] <148852131+openshift-merge-bot[bot]@users.noreply.github.com>
Date:   Wed Dec 18 10:31:50 2024 +0000

    Merge pull request #2102 from frobware/OCPBUGS-43745-idle-close-on-response

    OCPBUGS-43745: Add IdleCloseOnResponse field to IngressControllerSpec

Vendoring steps:

$ go mod edit -replace github.com/openshift/api=github.com/openshift/api@27316471eb72fe8fcf0d44fb5a0602f698f253dc
$ go mod tidy
$ go mod vendor
$ make update
frobware added a commit to frobware/cluster-ingress-operator that referenced this pull request Jan 6, 2025
Pickup openshift/api#2102

% git show 27316471eb72fe8fcf0d44fb5a0602f698f253dc
commit 27316471eb72fe8fcf0d44fb5a0602f698f253dc
Merge: de9de05a8 b7417509c
Author: openshift-merge-bot[bot] <148852131+openshift-merge-bot[bot]@users.noreply.github.com>
Date:   Wed Dec 18 10:31:50 2024 +0000

    Merge pull request #2102 from frobware/OCPBUGS-43745-idle-close-on-response

    OCPBUGS-43745: Add IdleCloseOnResponse field to IngressControllerSpec

Vendoring steps:

$ go mod edit -replace github.com/openshift/api=github.com/openshift/api@27316471eb72fe8fcf0d44fb5a0602f698f253dc
$ go mod tidy
$ go mod vendor
$ make update
frobware added a commit to frobware/cluster-ingress-operator that referenced this pull request Jan 6, 2025
Pickup openshift/api#2102

% git show 27316471eb72fe8fcf0d44fb5a0602f698f253dc
commit 27316471eb72fe8fcf0d44fb5a0602f698f253dc
Merge: de9de05a8 b7417509c
Author: openshift-merge-bot[bot] <148852131+openshift-merge-bot[bot]@users.noreply.github.com>
Date:   Wed Dec 18 10:31:50 2024 +0000

    Merge pull request #2102 from frobware/OCPBUGS-43745-idle-close-on-response

    OCPBUGS-43745: Add IdleCloseOnResponse field to IngressControllerSpec

Vendoring steps:

$ go mod edit -replace github.com/openshift/api=github.com/openshift/api@27316471eb72fe8fcf0d44fb5a0602f698f253dc
$ go mod tidy
$ go mod vendor
$ make update
frobware added a commit to frobware/cluster-ingress-operator that referenced this pull request Jan 8, 2025
Pickup openshift/api#2102

% git show 27316471eb72fe8fcf0d44fb5a0602f698f253dc
commit 27316471eb72fe8fcf0d44fb5a0602f698f253dc
Merge: de9de05a8 b7417509c
Author: openshift-merge-bot[bot] <148852131+openshift-merge-bot[bot]@users.noreply.github.com>
Date:   Wed Dec 18 10:31:50 2024 +0000

    Merge pull request #2102 from frobware/OCPBUGS-43745-idle-close-on-response

    OCPBUGS-43745: Add IdleCloseOnResponse field to IngressControllerSpec

Vendoring steps:

$ go mod edit -replace github.com/openshift/api=github.com/openshift/api@27316471eb72fe8fcf0d44fb5a0602f698f253dc
$ go mod tidy
$ go mod vendor
$ make update
frobware added a commit to frobware/cluster-ingress-operator that referenced this pull request Jan 9, 2025
Pickup openshift/api#2102

% git show 27316471eb72fe8fcf0d44fb5a0602f698f253dc
commit 27316471eb72fe8fcf0d44fb5a0602f698f253dc
Merge: de9de05a8 b7417509c
Author: openshift-merge-bot[bot] <148852131+openshift-merge-bot[bot]@users.noreply.github.com>
Date:   Wed Dec 18 10:31:50 2024 +0000

    Merge pull request #2102 from frobware/OCPBUGS-43745-idle-close-on-response

    OCPBUGS-43745: Add IdleCloseOnResponse field to IngressControllerSpec

Vendoring steps:

$ go mod edit -replace github.com/openshift/api=github.com/openshift/api@27316471eb72fe8fcf0d44fb5a0602f698f253dc
$ go mod tidy
$ go mod vendor
$ make update
frobware added a commit to frobware/cluster-ingress-operator that referenced this pull request Jan 13, 2025
Pickup openshift/api#2102

% git show 27316471eb72fe8fcf0d44fb5a0602f698f253dc
commit 27316471eb72fe8fcf0d44fb5a0602f698f253dc
Merge: de9de05a8 b7417509c
Author: openshift-merge-bot[bot] <148852131+openshift-merge-bot[bot]@users.noreply.github.com>
Date:   Wed Dec 18 10:31:50 2024 +0000

    Merge pull request #2102 from frobware/OCPBUGS-43745-idle-close-on-response

    OCPBUGS-43745: Add IdleCloseOnResponse field to IngressControllerSpec

Vendor steps:

$ go mod edit -replace github.com/openshift/api=github.com/openshift/api@27316471eb72fe8fcf0d44fb5a0602f698f253dc
$ go mod tidy
$ go mod vendor
$ make update
@frobware
Copy link
Contributor Author

/cherry-pick release-4.18

@openshift-cherrypick-robot

@frobware: new pull request created: #2145

In response to this:

/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

alebedev87 added a commit to alebedev87/api that referenced this pull request Jan 29, 2025
Add an `IdleConnectionTerminationPolicy` field to control whether
HAProxy keeps idle frontend connections open during a soft stop (router
reload). Allow users to prevent errors in clients or load balancers that
do not properly handle connection resets.

This commit is based on openshift#2102 with a key in the default value.
For OCP versions <= 4.18, the default is `Deferred`, aligning with older releases
where HAProxy’s `idle-close-on-response` option was added unconditionally.
alebedev87 added a commit to alebedev87/api that referenced this pull request Jan 31, 2025
…ntrollerSpec

Add an `IdleConnectionTerminationPolicy` field to control whether
HAProxy keeps idle frontend connections open during a soft stop (router
reload). Allow users to prevent errors in clients or load balancers that
do not properly handle connection resets.

This commit is based on openshift#2102 with a key in the default value.
For OCP versions <= 4.18, the default is `Deferred`, aligning with older releases
where HAProxy’s `idle-close-on-response` option was added unconditionally.
alebedev87 added a commit to alebedev87/api that referenced this pull request Jan 31, 2025
…ntrollerSpec

Add an `IdleConnectionTerminationPolicy` field to control whether
HAProxy keeps idle frontend connections open during a soft stop (router
reload). Allow users to prevent errors in clients or load balancers that
do not properly handle connection resets.

This commit is based on openshift#2102 with a key in the default value.
For OCP versions <= 4.18, the default is `Deferred`, aligning with older releases
where HAProxy’s `idle-close-on-response` option was added unconditionally.
alebedev87 added a commit to alebedev87/api that referenced this pull request Feb 3, 2025
…ntrollerSpec

Add an `IdleConnectionTerminationPolicy` field to control whether
HAProxy keeps idle frontend connections open during a soft stop (router
reload). Allow users to prevent errors in clients or load balancers that
do not properly handle connection resets.

This commit is based on openshift#2102 with a key in the default value.
For OCP versions <= 4.18, the default is `Deferred`, aligning with older releases
where HAProxy’s `idle-close-on-response` option was added unconditionally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants