-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Avoid primary shard failure caused by merged segment warmer exceptions #19436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid primary shard failure caused by merged segment warmer exceptions #19436
Conversation
|
❌ Gradle check result for 0405cff: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: guojialiang <[email protected]>
0405cff to
0e84bd5
Compare
|
❕ Gradle check result for 0e84bd5: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19436 +/- ##
============================================
+ Coverage 73.05% 73.09% +0.04%
- Complexity 70627 70680 +53
============================================
Files 5723 5723
Lines 323489 323494 +5
Branches 46851 46851
============================================
+ Hits 236311 236453 +142
+ Misses 68174 67975 -199
- Partials 19004 19066 +62 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: guojialiang <[email protected]>
|
❌ Gradle check result for c5de961: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…rd-fail-caused-by-merged-segment-warmer-exceptions
|
❌ Gradle check result for 1d3353d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for c3ea914: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: guojialiang <[email protected]>
c3ea914 to
b9dcb6b
Compare
…rd-fail-caused-by-merged-segment-warmer-exceptions
|
❌ Gradle check result for 5245ff8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
5245ff8 to
987d266
Compare
Signed-off-by: guojialiang <[email protected]>
987d266 to
5a41087
Compare
kkewwei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@guojialiang92 Can you please add the changelog? |
…rd-fail-caused-by-merged-segment-warmer-exceptions
Signed-off-by: guojialiang <[email protected]>
opensearch-project#19436) * Avoid primary shard failure caused by merge segment warmer exceptions Signed-off-by: guojialiang <[email protected]> * add test Signed-off-by: guojialiang <[email protected]> * update Signed-off-by: guojialiang <[email protected]> * update Signed-off-by: guojialiang <[email protected]> * update Signed-off-by: guojialiang <[email protected]> * add change log Signed-off-by: guojialiang <[email protected]> --------- Signed-off-by: guojialiang <[email protected]>
opensearch-project#19436) * Avoid primary shard failure caused by merge segment warmer exceptions Signed-off-by: guojialiang <[email protected]> * add test Signed-off-by: guojialiang <[email protected]> * update Signed-off-by: guojialiang <[email protected]> * update Signed-off-by: guojialiang <[email protected]> * update Signed-off-by: guojialiang <[email protected]> * add change log Signed-off-by: guojialiang <[email protected]> --------- Signed-off-by: guojialiang <[email protected]>
Description
This PR is to address the issues mentioned in issue #[19435].
Analysis
During restart, the recovery process will invoke
IndexShard#innerOpenEngineAndTranslog.There is a time interval between creating
InternalEngineand settingIndexShard#currentEngineReference. Once theInternalEngineis created, segment merge may occur and trigger the warmer. IfIndexShard#getEngineis invoked before settingIndexShard#currentEngineReference, an exception will be thrown. Throwing an exception during the merge process of Lucene is dangerous and can lead to shard failure.Solution
To avoid the above issues and prevent other situations that may throw exceptions during the warmer process and lead to shard failure, we need to catch exceptions to ensure that the merge operation of the primary shard can be successfully completed.
Related Issues
Resolves #[19435]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.