-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Keep track and release Reactor Netty 4 Transport accepted Http Channels during the Node shutdown #20106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Sergei Ustimenko <[email protected]>
WalkthroughAdded channel tracking and release functionality for Reactor Netty 4 HTTP transport during node shutdown. Introduced integration tests to validate HTTP channel lifecycle management under concurrent requests, modified channel closure to handle async context coordination, and updated request consumers to register accepted channels with the transport. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Transport as ReactorNetty4HttpServerTransport
participant Consumer as ReactorNetty4StreamingRequestConsumer
participant Channel as HttpChannel
participant CloseContext
Client->>Transport: Incoming HTTP request
Transport->>Consumer: Create consumer (pass transport)
Consumer->>Transport: serverAcceptedChannel(channel)
Transport->>Channel: Register accepted channel
Consumer->>Consumer: Process request
Consumer->>Channel: Close on completion
alt closeContext not completed
Channel->>CloseContext: addListener(close future)
CloseContext->>Channel: Complete → triggers close
else closeContext already completed
Channel->>Channel: Close immediately
end
Note over Transport,Channel: On node shutdown
Transport->>Transport: Track and release all<br/>accepted channels
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
Comment |
Signed-off-by: Sergei Ustimenko <[email protected]>
…ky-msearch-error-reporting-smoke-tests Signed-off-by: Sergei Ustimenko <[email protected]>
|
Thanks for the quick follow up here, @fdesu! Why does this channel not get closed? Manually closing the client seems to violate the intent of the test framework per this comment: OpenSearch/test/framework/src/main/java/org/opensearch/test/OpenSearchIntegTestCase.java Lines 2381 to 2382 in 8f838f1
|
|
@andrross oh, good point, wasn’t aware of that! I’m not entirely sure why the channel doesn’t get closed automatically but it just stays open since the deregistration doesn’t fire up which would have caused the RestCancellableNodeClient housekeeping to kick in. Maybe it’s somehow related to the channel type which is Reactor Netty4 kind of channel but I’m not really sure. Another thought that I had in mind was that the test(s) don’t fully consume the Response body and hence the receiving side just keeps the channel alive but it doesn’t seem to be the case since we use EntityUtils.toString which tries-with-resources the underlying InputStream. I’ll look deeper and would really appreciate if anyone can point me in the right direction. |
|
❌ Gradle check result for 25166b1: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@coderabbitai do you find @andrross comment above helpful to improve your review? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 933 🏁 Script executed: Length of output: 693 🏁 Script executed: Length of output: 17682 🏁 Script executed: Length of output: 3789 @peterzhuamazon Yes, @andrross's comment is very helpful and raises a valid architectural concern that should influence the review. After examining the framework code, I can confirm that the test framework is explicitly designed to:
The framework documentation at lines 2381-2382 explicitly states: "The returned client gets automatically closed when needed, it shouldn't be closed as part of tests otherwise it cannot be reused by other tests anymore." The Core Issue: The current fix uses try-with-resources to close the client in individual test methods, which:
What Needs Investigation: The real question is: Why doesn't the HttpChannel close automatically through the normal framework lifecycle? As @fdesu noted, it could be related to:
Recommendation: Rather than working around the framework's design, I'd suggest:
This would be a more robust solution that maintains the framework's architecture and benefits all tests using the framework. |
|
I mean, I don't think it's the test harness's design that causes the issue, it's something else. My expectation is, after the Note that the:
|
@fdesu I don't think you're missing anything. I think there might be a latent bug here with the Reactor Netty implementation. The channel gets added to a collection in AbstractHttpServerTransport#serverAcceptedChannel, which will guarantee that on node shutdown those channels get closed here. However, I don't see that |
Signed-off-by: Sergei Ustimenko <[email protected]>
…ky-msearch-error-reporting-smoke-tests Signed-off-by: Sergei Ustimenko <[email protected]>
Signed-off-by: Sergei Ustimenko <[email protected]>
|
❌ Gradle check result for 4abed82: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
|
@andrross @reta great point, lots of thanks for this hint! I can confirm the source of failures was randomness in what transport gets picked for the smoke tests: OpenSearch/qa/smoke-test-http/src/test/java/org/opensearch/http/HttpSmokeTestCase.java Line 56 in bd24685
when I've added a channel init lifecycle hook for the |
Signed-off-by: Sergei Ustimenko <[email protected]>
...etty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransport.java
Outdated
Show resolved
Hide resolved
...etty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransport.java
Outdated
Show resolved
Hide resolved
…r proper context access Signed-off-by: Sergei Ustimenko <[email protected]>
|
@fdesu I just figured out that I think we do have much simpler solution for the problem, by modifying The We don't need wrappers and initializers in this case |
Signed-off-by: Sergei Ustimenko <[email protected]>
|
@reta interesting, I assume the streaming request consumer would need to do the same for the I'm not entirely sure but the |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #20106 +/- ##
============================================
- Coverage 73.33% 73.26% -0.08%
+ Complexity 71679 71617 -62
============================================
Files 5790 5786 -4
Lines 327549 327645 +96
Branches 47181 47206 +25
============================================
- Hits 240217 240052 -165
- Misses 68080 68318 +238
- Partials 19252 19275 +23 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…onsumers Signed-off-by: Sergei Ustimenko <[email protected]>
|
❌ Gradle check result for 77606c7: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…sposed already Signed-off-by: Andriy Redko <[email protected]>
...nsport-reactor-netty4/src/main/java/org/opensearch/transport/reactor/netty4/Netty4Utils.java
Outdated
Show resolved
Hide resolved
...or-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4BaseHttpChannel.java
Outdated
Show resolved
Hide resolved
...4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4NonStreamingHttpChannel.java
Show resolved
Hide resolved
|
❌ Gradle check result for d8ed05b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sergei Ustimenko <[email protected]>
|
❌ Gradle check result for 84ee332: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 84ee332: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sergei Ustimenko <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
modules/transport-netty4/src/internalClusterTest/java/org/opensearch/transport/netty4/Netty4HttpChannelsReleaseIntegTests.java (2)
54-72: Test validates channel tracking, not cleanup - consider clarifying the test name.The test method
testAcceptedChannelsGetCleanedUpOnTheNodeShutdownactually validates that channels are tracked inRestCancellableNodeClient(the channel count increases bynumChannels). The cleanup on node shutdown happens implicitly when the test framework tears down the cluster after the test completes.Consider renaming to something like
testAcceptedChannelsAreTrackedAndCanBeCleanedUpOnNodeShutdownor adding a comment clarifying that cleanup verification occurs during cluster teardown.- public void testAcceptedChannelsGetCleanedUpOnTheNodeShutdown() throws InterruptedException { + /** + * Validates that HTTP channels are properly tracked in RestCancellableNodeClient. + * Cleanup verification happens implicitly during cluster teardown when the test completes. + */ + public void testAcceptedChannelsGetCleanedUpOnTheNodeShutdown() throws InterruptedException {
80-90: Consider using POST for search requests.While using GET with a request body works for
_search, the conventional approach is to use POST when including a JSON body. This is a minor stylistic consideration.plugins/transport-reactor-netty4/src/internalClusterTest/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpChannelsReleaseIntegTests.java (1)
34-92: Consider extracting shared test logic to reduce duplication.This test is nearly identical to
Netty4HttpChannelsReleaseIntegTests. While some duplication is acceptable for transport-specific tests, you could consider extracting the common test logic (thread pool management, executeRequest helper, test body) into a shared abstract base class or utility to improve maintainability.This is optional since the duplication is limited and the tests are isolated to their respective transport modules.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
CHANGELOG.md(1 hunks)modules/transport-netty4/src/internalClusterTest/java/org/opensearch/transport/netty4/Netty4HttpChannelsReleaseIntegTests.java(1 hunks)plugins/transport-reactor-netty4/src/internalClusterTest/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpChannelsReleaseIntegTests.java(1 hunks)plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerChannel.java(1 hunks)plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransport.java(6 hunks)plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4NonStreamingHttpChannel.java(1 hunks)plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4NonStreamingRequestConsumer.java(2 hunks)plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4StreamingHttpChannel.java(1 hunks)plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4StreamingRequestConsumer.java(1 hunks)plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransportStreamingTests.java(1 hunks)plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransportTests.java(4 hunks)qa/smoke-test-http/src/test/java/org/opensearch/http/DetailedErrorsDisabledIT.java(3 hunks)server/src/main/java/org/opensearch/rest/action/RestCancellableNodeClient.java(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
qa/smoke-test-http/src/test/java/org/opensearch/http/DetailedErrorsDisabledIT.java (1)
server/src/main/java/org/opensearch/http/HttpTransportSettings.java (1)
HttpTransportSettings(58-263)
plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4NonStreamingHttpChannel.java (1)
plugins/transport-reactor-netty4/src/main/java/org/opensearch/transport/reactor/netty4/Netty4Utils.java (1)
Netty4Utils(34-142)
plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4StreamingHttpChannel.java (1)
plugins/transport-reactor-netty4/src/main/java/org/opensearch/transport/reactor/netty4/Netty4Utils.java (1)
Netty4Utils(34-142)
plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransportStreamingTests.java (1)
plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorHttpClient.java (1)
ReactorHttpClient(69-307)
plugins/transport-reactor-netty4/src/internalClusterTest/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpChannelsReleaseIntegTests.java (3)
libs/core/src/main/java/org/opensearch/core/xcontent/MediaTypeRegistry.java (1)
MediaTypeRegistry(57-418)server/src/main/java/org/opensearch/index/query/MatchAllQueryBuilder.java (1)
MatchAllQueryBuilder(53-110)server/src/main/java/org/opensearch/rest/action/RestCancellableNodeClient.java (1)
RestCancellableNodeClient(64-197)
modules/transport-netty4/src/internalClusterTest/java/org/opensearch/transport/netty4/Netty4HttpChannelsReleaseIntegTests.java (2)
server/src/main/java/org/opensearch/index/query/MatchAllQueryBuilder.java (1)
MatchAllQueryBuilder(53-110)server/src/main/java/org/opensearch/rest/action/RestCancellableNodeClient.java (1)
RestCancellableNodeClient(64-197)
plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransportTests.java (1)
plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorHttpClient.java (1)
ReactorHttpClient(69-307)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: detect-breaking-change
- GitHub Check: Analyze (java)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
🔇 Additional comments (17)
qa/smoke-test-http/src/test/java/org/opensearch/http/DetailedErrorsDisabledIT.java (3)
35-37: Static Hamcrest imports placement looks goodConsolidating
containsString,is, andnotas static imports at the top is consistent and improves readability; no behavioral impact.
68-70: ChainedSettingsbuilder indentation is a cosmetic-only changeThe adjusted indentation for the chained
.put(...)calls is stylistic only; use ofHttpTransportSettings.SETTING_HTTP_DETAILED_ERRORS_ENABLEDremains unchanged and correct.
81-82: Assertion formatting change is non-functionalRe-wrapping the
containsStringargument onto the next line is purely stylistic; the assertion logic and message text are unchanged.plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerChannel.java (1)
50-52: LGTM! Correct naming alignment.The
toString()now correctly reflects the class nameReactorNetty4HttpServerChannel, improving debugging and logging clarity.CHANGELOG.md (1)
92-92: LGTM! Changelog entry accurately describes the fix.The entry correctly categorizes this as a fix and provides a clear description of the change.
plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4StreamingHttpChannel.java (1)
58-66: Logic looks correct for coordinating channel closure with lifecycle tracking.The change properly ensures that:
- If
closeContextis already completed (channel closed naturally), just close the channel (idempotent).- If
closeContextis not yet completed, initiate close and attach a listener to complete the context.This avoids attempting to double-complete the
CompletableContextsince the constructor already registers a listener oncloseFuture()at line 49.plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4NonStreamingRequestConsumer.java (2)
74-76: Core fix: Channel registration for shutdown cleanup.Calling
transport.serverAcceptedChannel(channel)beforeincomingRequestensures the channel is tracked in the transport'shttpChannelscollection. This enables proper cleanup during node shutdown, which was the root cause of the flaky test failures.This aligns with the approach discussed in the PR comments by @reta.
32-42: I'm unable to complete the verification due to repository access issues. However, I can provide the rewritten review comment based on the information available:
Verify that the streaming consumer also registers accepted channels consistently.
The type narrowing to
ReactorNetty4HttpServerTransportenables access toserverAcceptedChannel(). Ensure thatReactorNetty4StreamingRequestConsumerhas equivalent channel registration logic to maintain consistency between streaming and non-streaming request handling.server/src/main/java/org/opensearch/rest/action/RestCancellableNodeClient.java (1)
85-88: LGTM! Visibility change for test observability.Making
getNumTasks()public is consistent withgetNumChannels()(line 79) and enables the new integration tests to verify that no tasks remain after request completion during channel lifecycle testing.plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransportTests.java (1)
442-498: LGTM! Good test coverage for connection lifecycle.This test validates that the channel tracking mechanism works correctly by asserting that after all requests complete:
serverOpenis 0 (all connections closed)totalOpenmatches the number of requests madeThe test properly releases responses and uses try-with-resources for resource cleanup.
plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4NonStreamingHttpChannel.java (1)
49-55: Implementation correctly handles the closeContext coordination.The logic appropriately handles two cases:
- When
closeContextis not yet done: attaches a listener to complete it upon channel close- When
closeContextis already done: performs a direct close without redundant listener registrationNote: The constructor already registers a listener on
closeFuture()(line 37), which will also completecloseContext. SinceCompletableContextonly completes once (first completion wins), this dual-listener pattern is safe.plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4StreamingRequestConsumer.java (1)
27-35: Core fix: Streaming channels are now properly registered with the transport.This change addresses the root cause of the flaky test issue. By calling
transport.serverAcceptedChannel(httpChannel)during construction, the streaming channel is added toAbstractHttpServerTransport#httpChannels, ensuring it gets closed during node shutdown.plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransport.java (2)
351-360: Clean approach to expose channel registration to consumers.The override widens the visibility of
serverAcceptedChannel()fromprotectedtopublic, allowingReactorNetty4NonStreamingRequestConsumerandReactorNetty4StreamingRequestConsumerto register their channels with the transport. The delegation tosuper.serverAcceptedChannel()ensures the channel is tracked inAbstractHttpServerTransport#httpChannelsfor proper lifecycle management.
384-388: Unable to verify due to infrastructure constraints—manual code review required.The repository is currently inaccessible through multiple access methods, preventing verification of whether
ReactorNetty4NonStreamingRequestConsumercallstransport.serverAcceptedChannel()in its constructor.Please manually confirm that the non-streaming consumer (lines 397-402) implements the same channel registration pattern as the streaming consumer to ensure both consumer types properly track accepted channels with the transport.
plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransportStreamingTests.java (2)
143-180: Good test coverage for streaming channel lifecycle.The test validates that streaming connections are properly tracked and closed, mirroring the non-streaming test pattern. The assertions on
serverOpen(0) andtotalOpen(numRequests) confirm the channel registration fix works for streaming requests.
182-243: Clean refactoring to extract dispatcher creation.Extracting
createStreamingDispatcher()as a helper method improves test maintainability and allows reuse across multiple streaming test methods.plugins/transport-reactor-netty4/src/internalClusterTest/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpChannelsReleaseIntegTests.java (1)
34-92: LGTM! Test properly validates Reactor Netty4 channel tracking.This integration test mirrors the Netty4 version and correctly validates that HTTP channels are tracked for the Reactor Netty4 transport implementation, ensuring they will be cleaned up on node shutdown.
server/src/main/java/org/opensearch/rest/action/RestCancellableNodeClient.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Sergei Ustimenko <[email protected]>
reta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
It appears that after executing Multi Search API, the HttpChannel stays registered with the
RestCancellableNodeClientas the underlying channel may not always get closed.The junit rule that supposed to close the Rest Client (the
OpenSearchTestClusterRule) during the cleanup phase doesn't always close the server-side channel which then pollutes theRestCancellableNodeClientand potentially causes more havoc.The solution is rather simple - close the Rest Client within the test case and reliably release all underlying resources. Running relevant tests multiple times with the fix applied seems to eliminate the issue.
Related Issues
Resolves #20034
Check List
[ ] API changes companion pull request created, if applicable.[ ] Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.