Skip to content

Conversation

@fdesu
Copy link
Contributor

@fdesu fdesu commented Nov 26, 2025

Description

It appears that after executing Multi Search API, the HttpChannel stays registered with the RestCancellableNodeClient as the underlying channel may not always get closed.

The junit rule that supposed to close the Rest Client (the OpenSearchTestClusterRule) during the cleanup phase doesn't always close the server-side channel which then pollutes the RestCancellableNodeClient and potentially causes more havoc.

The solution is rather simple - close the Rest Client within the test case and reliably release all underlying resources. Running relevant tests multiple times with the fix applied seems to eliminate the issue.

Related Issues

Resolves #20034

Check List

  • Functionality includes testing.
  • [ ] API changes companion pull request created, if applicable.
  • [ ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • Bug Fixes

    • Reactor Netty 4 Transport now properly tracks and releases accepted HTTP channels during node shutdown, improving resource management.
  • Tests

    • Added integration tests to verify HTTP channel lifecycle and cleanup behavior under concurrent requests.
    • Enhanced connection handling tests for HTTP transport stability.

✏️ Tip: You can customize this high-level summary in your review settings.

@fdesu fdesu requested a review from a team as a code owner November 26, 2025 16:40
@github-actions github-actions bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run labels Nov 26, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 26, 2025

Walkthrough

Added channel tracking and release functionality for Reactor Netty 4 HTTP transport during node shutdown. Introduced integration tests to validate HTTP channel lifecycle management under concurrent requests, modified channel closure to handle async context coordination, and updated request consumers to register accepted channels with the transport.

Changes

Cohort / File(s) Summary
Documentation & Testing Setup
CHANGELOG.md, qa/smoke-test-http/src/test/java/org/opensearch/http/DetailedErrorsDisabledIT.java
Added changelog entry for channel tracking fix; minor formatting and static import reorganization in existing test
Channel Lifecycle Integration Tests
modules/transport-netty4/src/internalClusterTest/java/org/opensearch/transport/netty4/Netty4HttpChannelsReleaseIntegTests.java, plugins/transport-reactor-netty4/src/internalClusterTest/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpChannelsReleaseIntegTests.java
New integration test classes validating HTTP channel tracking and cleanup during concurrent requests and node shutdown across both transports
Channel Management & Lifecycle Coordination
plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerChannel.java, plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4NonStreamingHttpChannel.java, plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4StreamingHttpChannel.java
Updated toString representation and modified close implementations to handle async context coordination for proper channel lifecycle management
Transport & Consumer Channel Registration
plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransport.java, plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4NonStreamingRequestConsumer.java, plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4StreamingRequestConsumer.java
Added serverAcceptedChannel override, updated transport type specificity, and modified consumer constructors to register channels and pass transport instance through request pipeline
Streaming Request Tests
plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransportStreamingTests.java, plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransportTests.java
Refactored streaming test with helper method extraction and added new test for connection lifecycle validation during streaming and non-streaming scenarios

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Transport as ReactorNetty4HttpServerTransport
    participant Consumer as ReactorNetty4StreamingRequestConsumer
    participant Channel as HttpChannel
    participant CloseContext

    Client->>Transport: Incoming HTTP request
    Transport->>Consumer: Create consumer (pass transport)
    Consumer->>Transport: serverAcceptedChannel(channel)
    Transport->>Channel: Register accepted channel
    
    Consumer->>Consumer: Process request
    Consumer->>Channel: Close on completion
    
    alt closeContext not completed
        Channel->>CloseContext: addListener(close future)
        CloseContext->>Channel: Complete → triggers close
    else closeContext already completed
        Channel->>Channel: Close immediately
    end
    
    Note over Transport,Channel: On node shutdown
    Transport->>Transport: Track and release all<br/>accepted channels
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Files to focus on:
    • ReactorNetty4StreamingRequestConsumer.java and ReactorNetty4NonStreamingRequestConsumer.java — verify correct transport type narrowing and channel registration timing
    • ReactorNetty4StreamingHttpChannel.java and ReactorNetty4NonStreamingHttpChannel.java — carefully review async close coordination logic with closeContext completion state
    • ReactorNetty4HttpServerTransport.java — confirm serverAcceptedChannel override integrates properly with the request pipeline
    • Integration test classes — validate test assertions accurately capture channel lifecycle expectations

Suggested labels

transport, netty, test

Suggested reviewers

  • andrross
  • reta

Poem

A rabbit hops through channels bright,
Tracking requests, day and night,
Close contexts bind with async grace,
As channels find their resting place. 🐰✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.19% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: tracking and releasing Reactor Netty 4 Transport HTTP channels during node shutdown to address channel lifecycle management issues.
Description check ✅ Passed The description provides a clear explanation of the problem (HttpChannel not getting closed after Multi Search API), the root cause (channel registration issue), and the solution approach. It includes a reference to the related issue (#20034) and confirms testing was added.
Linked Issues check ✅ Passed The PR addresses the root cause of flaky DetailedErrorsDisabledIT tests by implementing proper channel lifecycle tracking in ReactorNetty4HttpServerTransport, ensuring accepted channels are registered and released on node shutdown as required by issue #20034.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the HTTP channel lifecycle issue: channel tracking infrastructure in transport classes, channel release logic, test infrastructure for validation, and related integration tests. No unrelated modifications detected.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d91400a and 93a3457.

📒 Files selected for processing (2)
  • modules/transport-netty4/src/internalClusterTest/java/org/opensearch/transport/netty4/Netty4HttpChannelsReleaseIntegTests.java (1 hunks)
  • plugins/transport-reactor-netty4/src/internalClusterTest/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpChannelsReleaseIntegTests.java (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • modules/transport-netty4/src/internalClusterTest/java/org/opensearch/transport/netty4/Netty4HttpChannelsReleaseIntegTests.java
  • plugins/transport-reactor-netty4/src/internalClusterTest/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpChannelsReleaseIntegTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: Analyze (java)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, windows-latest)

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Sergei Ustimenko <[email protected]>
…ky-msearch-error-reporting-smoke-tests

Signed-off-by: Sergei Ustimenko <[email protected]>
@andrross
Copy link
Member

Thanks for the quick follow up here, @fdesu! Why does this channel not get closed? Manually closing the client seems to violate the intent of the test framework per this comment:

* The returned client gets automatically closed when needed, it shouldn't be closed as part of tests otherwise
* it cannot be reused by other tests anymore.

@fdesu
Copy link
Contributor Author

fdesu commented Nov 26, 2025

@andrross oh, good point, wasn’t aware of that!

I’m not entirely sure why the channel doesn’t get closed automatically but it just stays open since the deregistration doesn’t fire up which would have caused the RestCancellableNodeClient housekeeping to kick in. Maybe it’s somehow related to the channel type which is Reactor Netty4 kind of channel but I’m not really sure.

Another thought that I had in mind was that the test(s) don’t fully consume the Response body and hence the receiving side just keeps the channel alive but it doesn’t seem to be the case since we use EntityUtils.toString which tries-with-resources the underlying InputStream.

I’ll look deeper and would really appreciate if anyone can point me in the right direction.

@github-actions
Copy link
Contributor

❌ Gradle check result for 25166b1: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@peterzhuamazon
Copy link
Member

Thanks for the quick follow up here, @fdesu! Why does this channel not get closed? Manually closing the client seems to violate the intent of the test framework per this comment:

* The returned client gets automatically closed when needed, it shouldn't be closed as part of tests otherwise
* it cannot be reused by other tests anymore.

@coderabbitai do you find @andrross comment above helpful to improve your review?

@coderabbitai
Copy link

coderabbitai bot commented Nov 26, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the referenced file to understand the framework's intent
cat test/framework/src/main/java/org/opensearch/test/OpenSearchIntegTestCase.java | sed -n '2370,2395p'

Length of output: 933


🏁 Script executed:

#!/bin/bash
# Look for how the rest client is created and managed in the test framework
rg -n -A5 -B5 "createRestClient|getRestClient" test/framework/src/main/java/org/opensearch/test/OpenSearchIntegTestCase.java | head -80

Length of output: 693


🏁 Script executed:

#!/bin/bash
# Check the OpenSearchTestClusterRule class to understand how it manages resources
fd -e java "OpenSearchTestClusterRule" --exec cat {}

Length of output: 17682


🏁 Script executed:

#!/bin/bash
# Look at how RestCancellableNodeClient manages channel registrations
rg -n -A10 -B5 "class RestCancellableNodeClient" --type java

Length of output: 3789


@peterzhuamazon Yes, @andrross's comment is very helpful and raises a valid architectural concern that should influence the review.

After examining the framework code, I can confirm that the test framework is explicitly designed to:

  1. Reuse the RestClient across tests: The getRestClient() method creates a singleton client that's shared within the test scope
  2. Handle cleanup centrally: The clearClusters() method in OpenSearchTestClusterRule is responsible for closing the client and verifying all channels are deregistered from RestCancellableNodeClient

The framework documentation at lines 2381-2382 explicitly states: "The returned client gets automatically closed when needed, it shouldn't be closed as part of tests otherwise it cannot be reused by other tests anymore."

The Core Issue:

The current fix uses try-with-resources to close the client in individual test methods, which:

  • ❌ Violates the framework's design intent
  • ❌ Breaks client reusability across tests in the same scope
  • ❌ Treats the symptom (channels not closing) rather than the root cause

What Needs Investigation:

The real question is: Why doesn't the HttpChannel close automatically through the normal framework lifecycle? As @fdesu noted, it could be related to:

  • Reactor Netty4 channel behavior
  • Channel lifecycle management differences
  • Incomplete response consumption (though EntityUtils.toString should handle this)

Recommendation:

Rather than working around the framework's design, I'd suggest:

  1. Investigating why the channel remains open after clearClusters() is called
  2. Determining if the framework's cleanup logic needs enhancement for Reactor Netty4 channels
  3. Fixing the root cause so the framework's intended lifecycle works correctly

This would be a more robust solution that maintains the framework's architecture and benefits all tests using the framework.

@fdesu
Copy link
Contributor Author

fdesu commented Nov 26, 2025

I mean, I don't think it's the test harness's design that causes the issue, it's something else. My expectation is, after the restClient.performRequest(MultisearchRequest) is called and the response is fully consumed, the channel should get closed which doesn't happen. Am I missing anything?

Note that the:

  1. org.opensearch.client.SearchIT#testMultiSearch works okay with the high level client
  2. and the SearchRestCancellationIT#testAutomaticCancellationMultiSearchDuringQueryPhase works okay with getRestClient().performRequestAsync() which gets cancelled.

@andrross
Copy link
Member

the channel should get closed which doesn't happen. Am I missing anything?

@fdesu I don't think you're missing anything. I think there might be a latent bug here with the Reactor Netty implementation. The channel gets added to a collection in AbstractHttpServerTransport#serverAcceptedChannel, which will guarantee that on node shutdown those channels get closed here. However, I don't see that AbstractHttpServerTransport#serverAcceptedChannel is ever called by ReactorNetty4HttpServerTransport, so when using the reactor netty implementation that httpChannels collection is always empty. I don't understand why the failure is intermittent though. @reta Do you have any idea what might be happening?

@fdesu fdesu marked this pull request as draft November 27, 2025 08:47
@fdesu fdesu changed the title Fix flaky DetailedErrorsDisabledIT and DetailedErrorsEnabledIT Keep track and release Reactor Netty 4 Transport accepted Http Channels during the Node shutdown Nov 27, 2025
Signed-off-by: Sergei Ustimenko <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for 4abed82: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@reta
Copy link
Contributor

reta commented Nov 27, 2025

However, I don't see that AbstractHttpServerTransport#serverAcceptedChannel is ever called by ReactorNetty4HttpServerTransport, so when using the reactor netty implementation that httpChannels collection is always empty. I don't understand why the failure is intermittent though. @reta Do you have any idea what might be happening?

Looking @andrross , thanks for taking it @fdesu

@fdesu
Copy link
Contributor Author

fdesu commented Nov 27, 2025

@andrross @reta great point, lots of thanks for this hint!

I can confirm the source of failures was randomness in what transport gets picked for the smoke tests:

nodeHttpTypeKey = getHttpTypeKey(randomFrom(Netty4ModulePlugin.class, ReactorNetty4Plugin.class));

when ReactorNetty4Plugin was picked, some request channels might not get closed on the node shutdown which would cause issues in the OpenSearchTestClusterRule#clearClusters rule.

I've added a channel init lifecycle hook for the ReactorNetty4HttpServerTransport which is similar to Netty4HttpServerTransport. The added channel wrapper gets registered with the AbstractHttpServerTransport#httpChannels when a new channel gets accepted. That would cause all channels to get closed on the node shutdown which is expected behaviour.

…r proper context access

Signed-off-by: Sergei Ustimenko <[email protected]>
@reta
Copy link
Contributor

reta commented Nov 27, 2025

@fdesu I just figured out that I think we do have much simpler solution for the problem, by modifying ReactorNetty4NonStreamingRequestConsumer::process method and adding transport.serverAcceptedChannel(channel):

void process(HttpContent in, FluxSink<HttpContent> emitter) {
        // Consume request body in full before dispatching it
        content.addComponent(true, in.content().retain());

        if (in instanceof LastHttpContent) {
            final ReactorNetty4NonStreamingHttpChannel channel = new ReactorNetty4NonStreamingHttpChannel(request, response, emitter);
            final HttpRequest r = createRequest(request, content);

            try {
                transport.serverAcceptedChannel(channel);
                transport.incomingRequest(r, channel);
            } catch (Exception ex) {
                emitter.error(ex);
                transport.onException(channel, ex);
            } finally {
                r.release();
                if (disposed.compareAndSet(false, true)) {
                    this.content.release();
                }
            }
        }
    }

The serverAcceptedChannel is not visible (protected) but we could add an override to ReactorNetty4HttpServerTransport and use ReactorNetty4HttpServerTransport instead of AbstractHttpServerTransport.

    @Override
    public void serverAcceptedChannel(HttpChannel httpChannel) {
        super.serverAcceptedChannel(httpChannel);
    }

We don't need wrappers and initializers in this case

Signed-off-by: Sergei Ustimenko <[email protected]>
@fdesu
Copy link
Contributor Author

fdesu commented Nov 27, 2025

@reta interesting, I assume the streaming request consumer would need to do the same for the LastHttpContent branch, correct?

I'm not entirely sure but the ReactorNetty4HttpChannel solution also looks pretty simple and being applied on the ReactorNetty4HttpServerTransport level. That makes it pretty independent of the consumers that are, I guess, only responsible for consuming the inbound request. What do you think?

@codecov
Copy link

codecov bot commented Nov 27, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.26%. Comparing base (97d3864) to head (93a3457).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
...ctor/netty4/ReactorNetty4StreamingHttpChannel.java 0.00% 5 Missing ⚠️
...r/netty4/ReactorNetty4NonStreamingHttpChannel.java 60.00% 1 Missing and 1 partial ⚠️
...reactor/netty4/ReactorNetty4HttpServerChannel.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20106      +/-   ##
============================================
- Coverage     73.33%   73.26%   -0.08%     
+ Complexity    71679    71617      -62     
============================================
  Files          5790     5786       -4     
  Lines        327549   327645      +96     
  Branches      47181    47206      +25     
============================================
- Hits         240217   240052     -165     
- Misses        68080    68318     +238     
- Partials      19252    19275      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

❌ Gradle check result for 77606c7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for d8ed05b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sergei Ustimenko <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for 84ee332: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 84ee332: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

✅ Gradle check result for 84ee332: SUCCESS

@fdesu fdesu marked this pull request as ready for review November 28, 2025 10:09
@fdesu fdesu requested a review from peternied as a code owner November 28, 2025 10:09
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
modules/transport-netty4/src/internalClusterTest/java/org/opensearch/transport/netty4/Netty4HttpChannelsReleaseIntegTests.java (2)

54-72: Test validates channel tracking, not cleanup - consider clarifying the test name.

The test method testAcceptedChannelsGetCleanedUpOnTheNodeShutdown actually validates that channels are tracked in RestCancellableNodeClient (the channel count increases by numChannels). The cleanup on node shutdown happens implicitly when the test framework tears down the cluster after the test completes.

Consider renaming to something like testAcceptedChannelsAreTrackedAndCanBeCleanedUpOnNodeShutdown or adding a comment clarifying that cleanup verification occurs during cluster teardown.

-    public void testAcceptedChannelsGetCleanedUpOnTheNodeShutdown() throws InterruptedException {
+    /**
+     * Validates that HTTP channels are properly tracked in RestCancellableNodeClient.
+     * Cleanup verification happens implicitly during cluster teardown when the test completes.
+     */
+    public void testAcceptedChannelsGetCleanedUpOnTheNodeShutdown() throws InterruptedException {

80-90: Consider using POST for search requests.

While using GET with a request body works for _search, the conventional approach is to use POST when including a JSON body. This is a minor stylistic consideration.

plugins/transport-reactor-netty4/src/internalClusterTest/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpChannelsReleaseIntegTests.java (1)

34-92: Consider extracting shared test logic to reduce duplication.

This test is nearly identical to Netty4HttpChannelsReleaseIntegTests. While some duplication is acceptable for transport-specific tests, you could consider extracting the common test logic (thread pool management, executeRequest helper, test body) into a shared abstract base class or utility to improve maintainability.

This is optional since the duplication is limited and the tests are isolated to their respective transport modules.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25166b1 and d91400a.

📒 Files selected for processing (13)
  • CHANGELOG.md (1 hunks)
  • modules/transport-netty4/src/internalClusterTest/java/org/opensearch/transport/netty4/Netty4HttpChannelsReleaseIntegTests.java (1 hunks)
  • plugins/transport-reactor-netty4/src/internalClusterTest/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpChannelsReleaseIntegTests.java (1 hunks)
  • plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerChannel.java (1 hunks)
  • plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransport.java (6 hunks)
  • plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4NonStreamingHttpChannel.java (1 hunks)
  • plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4NonStreamingRequestConsumer.java (2 hunks)
  • plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4StreamingHttpChannel.java (1 hunks)
  • plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4StreamingRequestConsumer.java (1 hunks)
  • plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransportStreamingTests.java (1 hunks)
  • plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransportTests.java (4 hunks)
  • qa/smoke-test-http/src/test/java/org/opensearch/http/DetailedErrorsDisabledIT.java (3 hunks)
  • server/src/main/java/org/opensearch/rest/action/RestCancellableNodeClient.java (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
qa/smoke-test-http/src/test/java/org/opensearch/http/DetailedErrorsDisabledIT.java (1)
server/src/main/java/org/opensearch/http/HttpTransportSettings.java (1)
  • HttpTransportSettings (58-263)
plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4NonStreamingHttpChannel.java (1)
plugins/transport-reactor-netty4/src/main/java/org/opensearch/transport/reactor/netty4/Netty4Utils.java (1)
  • Netty4Utils (34-142)
plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4StreamingHttpChannel.java (1)
plugins/transport-reactor-netty4/src/main/java/org/opensearch/transport/reactor/netty4/Netty4Utils.java (1)
  • Netty4Utils (34-142)
plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransportStreamingTests.java (1)
plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorHttpClient.java (1)
  • ReactorHttpClient (69-307)
plugins/transport-reactor-netty4/src/internalClusterTest/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpChannelsReleaseIntegTests.java (3)
libs/core/src/main/java/org/opensearch/core/xcontent/MediaTypeRegistry.java (1)
  • MediaTypeRegistry (57-418)
server/src/main/java/org/opensearch/index/query/MatchAllQueryBuilder.java (1)
  • MatchAllQueryBuilder (53-110)
server/src/main/java/org/opensearch/rest/action/RestCancellableNodeClient.java (1)
  • RestCancellableNodeClient (64-197)
modules/transport-netty4/src/internalClusterTest/java/org/opensearch/transport/netty4/Netty4HttpChannelsReleaseIntegTests.java (2)
server/src/main/java/org/opensearch/index/query/MatchAllQueryBuilder.java (1)
  • MatchAllQueryBuilder (53-110)
server/src/main/java/org/opensearch/rest/action/RestCancellableNodeClient.java (1)
  • RestCancellableNodeClient (64-197)
plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransportTests.java (1)
plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorHttpClient.java (1)
  • ReactorHttpClient (69-307)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: Analyze (java)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
🔇 Additional comments (17)
qa/smoke-test-http/src/test/java/org/opensearch/http/DetailedErrorsDisabledIT.java (3)

35-37: Static Hamcrest imports placement looks good

Consolidating containsString, is, and not as static imports at the top is consistent and improves readability; no behavioral impact.


68-70: Chained Settings builder indentation is a cosmetic-only change

The adjusted indentation for the chained .put(...) calls is stylistic only; use of HttpTransportSettings.SETTING_HTTP_DETAILED_ERRORS_ENABLED remains unchanged and correct.


81-82: Assertion formatting change is non-functional

Re-wrapping the containsString argument onto the next line is purely stylistic; the assertion logic and message text are unchanged.

plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerChannel.java (1)

50-52: LGTM! Correct naming alignment.

The toString() now correctly reflects the class name ReactorNetty4HttpServerChannel, improving debugging and logging clarity.

CHANGELOG.md (1)

92-92: LGTM! Changelog entry accurately describes the fix.

The entry correctly categorizes this as a fix and provides a clear description of the change.

plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4StreamingHttpChannel.java (1)

58-66: Logic looks correct for coordinating channel closure with lifecycle tracking.

The change properly ensures that:

  1. If closeContext is already completed (channel closed naturally), just close the channel (idempotent).
  2. If closeContext is not yet completed, initiate close and attach a listener to complete the context.

This avoids attempting to double-complete the CompletableContext since the constructor already registers a listener on closeFuture() at line 49.

plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4NonStreamingRequestConsumer.java (2)

74-76: Core fix: Channel registration for shutdown cleanup.

Calling transport.serverAcceptedChannel(channel) before incomingRequest ensures the channel is tracked in the transport's httpChannels collection. This enables proper cleanup during node shutdown, which was the root cause of the flaky test failures.

This aligns with the approach discussed in the PR comments by @reta.


32-42: I'm unable to complete the verification due to repository access issues. However, I can provide the rewritten review comment based on the information available:


Verify that the streaming consumer also registers accepted channels consistently.

The type narrowing to ReactorNetty4HttpServerTransport enables access to serverAcceptedChannel(). Ensure that ReactorNetty4StreamingRequestConsumer has equivalent channel registration logic to maintain consistency between streaming and non-streaming request handling.

server/src/main/java/org/opensearch/rest/action/RestCancellableNodeClient.java (1)

85-88: LGTM! Visibility change for test observability.

Making getNumTasks() public is consistent with getNumChannels() (line 79) and enables the new integration tests to verify that no tasks remain after request completion during channel lifecycle testing.

plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransportTests.java (1)

442-498: LGTM! Good test coverage for connection lifecycle.

This test validates that the channel tracking mechanism works correctly by asserting that after all requests complete:

  • serverOpen is 0 (all connections closed)
  • totalOpen matches the number of requests made

The test properly releases responses and uses try-with-resources for resource cleanup.

plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4NonStreamingHttpChannel.java (1)

49-55: Implementation correctly handles the closeContext coordination.

The logic appropriately handles two cases:

  1. When closeContext is not yet done: attaches a listener to complete it upon channel close
  2. When closeContext is already done: performs a direct close without redundant listener registration

Note: The constructor already registers a listener on closeFuture() (line 37), which will also complete closeContext. Since CompletableContext only completes once (first completion wins), this dual-listener pattern is safe.

plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4StreamingRequestConsumer.java (1)

27-35: Core fix: Streaming channels are now properly registered with the transport.

This change addresses the root cause of the flaky test issue. By calling transport.serverAcceptedChannel(httpChannel) during construction, the streaming channel is added to AbstractHttpServerTransport#httpChannels, ensuring it gets closed during node shutdown.

plugins/transport-reactor-netty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransport.java (2)

351-360: Clean approach to expose channel registration to consumers.

The override widens the visibility of serverAcceptedChannel() from protected to public, allowing ReactorNetty4NonStreamingRequestConsumer and ReactorNetty4StreamingRequestConsumer to register their channels with the transport. The delegation to super.serverAcceptedChannel() ensures the channel is tracked in AbstractHttpServerTransport#httpChannels for proper lifecycle management.


384-388: Unable to verify due to infrastructure constraints—manual code review required.

The repository is currently inaccessible through multiple access methods, preventing verification of whether ReactorNetty4NonStreamingRequestConsumer calls transport.serverAcceptedChannel() in its constructor.

Please manually confirm that the non-streaming consumer (lines 397-402) implements the same channel registration pattern as the streaming consumer to ensure both consumer types properly track accepted channels with the transport.

plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransportStreamingTests.java (2)

143-180: Good test coverage for streaming channel lifecycle.

The test validates that streaming connections are properly tracked and closed, mirroring the non-streaming test pattern. The assertions on serverOpen (0) and totalOpen (numRequests) confirm the channel registration fix works for streaming requests.


182-243: Clean refactoring to extract dispatcher creation.

Extracting createStreamingDispatcher() as a helper method improves test maintainability and allows reuse across multiple streaming test methods.

plugins/transport-reactor-netty4/src/internalClusterTest/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpChannelsReleaseIntegTests.java (1)

34-92: LGTM! Test properly validates Reactor Netty4 channel tracking.

This integration test mirrors the Netty4 version and correctly validates that HTTP channels are tracked for the Reactor Netty4 transport implementation, ensuring they will be cleaned up on node shutdown.

@github-actions
Copy link
Contributor

✅ Gradle check result for d91400a: SUCCESS

@fdesu fdesu requested a review from reta November 28, 2025 11:32
@github-actions
Copy link
Contributor

✅ Gradle check result for 93a3457: SUCCESS

Copy link
Contributor

@reta reta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @fdesu ! @andrross anything you would like to mention?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autocut flaky-test Random test failure that succeeds on second run >test-failure Test failure from CI, local build, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for DetailedErrorsDisabledIT

4 participants