Skip to content

Conversation

@guojialiang92
Copy link
Contributor

Description

This PR is to address the issues mentioned in issue #[19435].

Analysis

private void innerOpenEngineAndTranslog(LongSupplier globalCheckpointSupplier, boolean syncFromRemote) throws IOException {
    // ...

    final Engine newEngine = engineFactory.newReadWriteEngine(config);
    onNewEngine(newEngine);
    currentEngineReference.set(newEngine);

    // ...
}

Engine getEngine() {
    Engine engine = getEngineOrNull();
    if (engine == null) {
        throw new AlreadyClosedException("engine is closed");
    }
    return engine;
}

During restart, the recovery process will invoke IndexShard#innerOpenEngineAndTranslog.
There is a time interval between creating InternalEngine and setting IndexShard#currentEngineReference. Once the InternalEngine is created, segment merge may occur and trigger the warmer. If IndexShard#getEngine is invoked before setting IndexShard#currentEngineReference, an exception will be thrown. Throwing an exception during the merge process of Lucene is dangerous and can lead to shard failure.

Solution

To avoid the above issues and prevent other situations that may throw exceptions during the warmer process and lead to shard failure, we need to catch exceptions to ensure that the merge operation of the primary shard can be successfully completed.

Related Issues

Resolves #[19435]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

❌ Gradle check result for 0405cff: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@guojialiang92 guojialiang92 force-pushed the dev/avoid-primary-shard-fail-caused-by-merged-segment-warmer-exceptions branch from 0405cff to 0e84bd5 Compare September 26, 2025 14:51
@github-actions
Copy link
Contributor

❕ Gradle check result for 0e84bd5: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link

codecov bot commented Sep 26, 2025

Codecov Report

❌ Patch coverage is 80.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.09%. Comparing base (006a53f) to head (125766a).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
...g/opensearch/index/engine/MergedSegmentWarmer.java 80.00% 1 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #19436      +/-   ##
============================================
+ Coverage     73.05%   73.09%   +0.04%     
- Complexity    70627    70680      +53     
============================================
  Files          5723     5723              
  Lines        323489   323494       +5     
  Branches      46851    46851              
============================================
+ Hits         236311   236453     +142     
+ Misses        68174    67975     -199     
- Partials      19004    19066      +62     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: guojialiang <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for c5de961: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…rd-fail-caused-by-merged-segment-warmer-exceptions
@github-actions
Copy link
Contributor

❌ Gradle check result for 1d3353d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for c3ea914: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: guojialiang <[email protected]>
@guojialiang92 guojialiang92 force-pushed the dev/avoid-primary-shard-fail-caused-by-merged-segment-warmer-exceptions branch from c3ea914 to b9dcb6b Compare September 30, 2025 04:03
@github-actions
Copy link
Contributor

✅ Gradle check result for b9dcb6b: SUCCESS

…rd-fail-caused-by-merged-segment-warmer-exceptions
@github-actions
Copy link
Contributor

❌ Gradle check result for 5245ff8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@guojialiang92 guojialiang92 force-pushed the dev/avoid-primary-shard-fail-caused-by-merged-segment-warmer-exceptions branch from 5245ff8 to 987d266 Compare September 30, 2025 07:30
Signed-off-by: guojialiang <[email protected]>
@guojialiang92 guojialiang92 force-pushed the dev/avoid-primary-shard-fail-caused-by-merged-segment-warmer-exceptions branch from 987d266 to 5a41087 Compare September 30, 2025 09:12
@github-actions
Copy link
Contributor

✅ Gradle check result for 5a41087: SUCCESS

Copy link
Contributor

@kkewwei kkewwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kkewwei
Copy link
Contributor

kkewwei commented Oct 9, 2025

@guojialiang92 Can you please add the changelog?

…rd-fail-caused-by-merged-segment-warmer-exceptions
Signed-off-by: guojialiang <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2025

✅ Gradle check result for 125766a: SUCCESS

@kkewwei kkewwei merged commit 5167410 into opensearch-project:main Oct 10, 2025
33 checks passed
rgsriram pushed a commit to rgsriram/OpenSearch that referenced this pull request Oct 11, 2025
opensearch-project#19436)

* Avoid primary shard failure caused by merge segment warmer exceptions

Signed-off-by: guojialiang <[email protected]>

* add test

Signed-off-by: guojialiang <[email protected]>

* update

Signed-off-by: guojialiang <[email protected]>

* update

Signed-off-by: guojialiang <[email protected]>

* update

Signed-off-by: guojialiang <[email protected]>

* add change log

Signed-off-by: guojialiang <[email protected]>

---------

Signed-off-by: guojialiang <[email protected]>
peteralfonsi pushed a commit to peteralfonsi/OpenSearch that referenced this pull request Oct 15, 2025
opensearch-project#19436)

* Avoid primary shard failure caused by merge segment warmer exceptions

Signed-off-by: guojialiang <[email protected]>

* add test

Signed-off-by: guojialiang <[email protected]>

* update

Signed-off-by: guojialiang <[email protected]>

* update

Signed-off-by: guojialiang <[email protected]>

* update

Signed-off-by: guojialiang <[email protected]>

* add change log

Signed-off-by: guojialiang <[email protected]>

---------

Signed-off-by: guojialiang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants