Skip to content

Conversation

dzane17
Copy link
Member

@dzane17 dzane17 commented Sep 18, 2025

Description

We identified a bug where negative search stats are causing an exception when processed using VLong in the following line:

https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/search/stats/SearchStats.java#L89

out.writeVLong(current);

The issue occurs because the VLong is used as a counter for tracking number of active phases. However, since there can be multiple paths for exiting a phase, the counters can be decremented more than once.

Solution

Check for negative values before read/write using VLong. If negative, write 0 instead.

I explored using ZLong, but this can cause serde issues between different OpenSearch versions.

Related Issues

Resolves #16598

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 6560f5e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 751d5ac: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for ed3c69e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 65b5bc9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should fix the issue of negative stats itself, but node should not be going down due to serialization issue, so LGTM!

Copy link
Contributor

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failures look related to me. I am wondering if this can break backward compatibility is some way:

[Test Result](https://build.ci.opensearch.org/job/gradle-check/64475/testReport/) (14 failures / +14)

    [org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=nodes.stats/10_basic/Nodes stats}](https://build.ci.opensearch.org/job/gradle-check/64475/testReport/junit/org.opensearch.backwards/MixedClusterClientYamlTestSuiteIT/test__p0_nodes_stats_10_basic_Nodes_stats_/)
    [org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=nodes.stats/20_response_filtering/Nodes Stats filtered using both includes and excludes filters}](https://build.ci.opensearch.org/job/gradle-check/64475/testReport/junit/org.opensearch.backwards/MixedClusterClientYamlTestSuiteIT/test__p0_nodes_stats_20_response_filtering_Nodes_Stats_filtered_using_both_includes_and_excludes_filters_/)
    [org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=nodes.stats/20_response_filtering/Nodes Stats with response filtering}](https://build.ci.opensearch.org/job/gradle-check/64475/testReport/junit/org.opensearch.backwards/MixedClusterClientYamlTestSuiteIT/test__p0_nodes_stats_20_response_filtering_Nodes_Stats_with_response_filtering_/)
    [org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=nodes.stats/11_indices_metrics/Metric - _all include_segment_file_sizes}](https://build.ci.opensearch.org/job/gradle-check/64475/testReport/junit/org.opensearch.backwards/MixedClusterClientYamlTestSuiteIT/test__p0_nodes_stats_11_indices_metrics_Metric____all_include_segment_file_sizes_/)
    [org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=nodes.stats/11_indices_metrics/Metric - indices _all}](https://build.ci.opensearch.org/job/gradle-check/64475/testReport/junit/org.opensearch.backwards/MixedClusterClientYamlTestSuiteIT/test__p0_nodes_stats_11_indices_metrics_Metric___indices__all_/)
    [org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=nodes.stats/10_basic/Nodes stats}](https://build.ci.opensearch.org/job/gradle-check/64475/testReport/junit/org.opensearch.backwards/MixedClusterClientYamlTestSuiteIT/test__p0_nodes_stats_10_basic_Nodes_stats__2/)
    [org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=nodes.stats/20_response_filtering/Nodes Stats with response filtering}](https://build.ci.opensearch.org/job/gradle-check/64475/testReport/junit/org.opensearch.backwards/MixedClusterClientYamlTestSuiteIT/test__p0_nodes_stats_20_response_filtering_Nodes_Stats_with_response_filtering__2/)
    [org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=nodes.stats/11_indices_metrics/Metric - _all include_segment_file_sizes}](https://build.ci.opensearch.org/job/gradle-check/64475/testReport/junit/org.opensearch.backwards/MixedClusterClientYamlTestSuiteIT/test__p0_nodes_stats_11_indices_metrics_Metric____all_include_segment_file_sizes__2/)
    [org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=nodes.stats/10_basic/Nodes stats}](https://build.ci.opensearch.org/job/gradle-check/64475/testReport/junit/org.opensearch.backwards/MixedClusterClientYamlTestSuiteIT/test__p0_nodes_stats_10_basic_Nodes_stats__3/)
    [org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=nodes.stats/20_response_filtering/Nodes Stats with response filtering}](https://build.ci.opensearch.org/job/gradle-check/64475/testReport/junit/org.opensearch.backwards/MixedClusterClientYamlTestSuiteIT/test__p0_nodes_stats_20_response_filtering_Nodes_Stats_with_response_filtering__3/)
    [org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=nodes.stats/11_indices_metrics/Metric - _all include_segment_file_sizes}](https://build.ci.opensearch.org/job/gradle-check/64475/testReport/junit/org.opensearch.backwards/MixedClusterClientYamlTestSuiteIT/test__p0_nodes_stats_11_indices_metrics_Metric____all_include_segment_file_sizes__3/)
    [org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=nodes.stats/10_basic/Nodes stats}](https://build.ci.opensearch.org/job/gradle-check/64475/testReport/junit/org.opensearch.backwards/MixedClusterClientYamlTestSuiteIT/test__p0_nodes_stats_10_basic_Nodes_stats__4/)
    [org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=nodes.stats/20_response_filtering/Nodes Stats with response filtering}](https://build.ci.opensearch.org/job/gradle-check/64475/testReport/junit/org.opensearch.backwards/MixedClusterClientYamlTestSuiteIT/test__p0_nodes_stats_20_response_filtering_Nodes_Stats_with_response_filtering__4/)
    [org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=nodes.stats/11_indices_metrics/Metric - _all include_segment_file_sizes}](https://build.ci.opensearch.org/job/gradle-check/64475/testReport/junit/org.opensearch.backwards/MixedClusterClientYamlTestSuiteIT/test__p0_nodes_stats_11_indices_metrics_Metric____all_include_segment_file_sizes__4/)

@dzane17 dzane17 force-pushed the negative-stats branch 3 times, most recently from d8bcd50 to 092cb9d Compare September 29, 2025 17:56
Copy link
Contributor

❌ Gradle check result for 092cb9d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@dzane17 dzane17 changed the title Use ZLong to prevent negative search stats exception Handle negative search request nodes stats Sep 29, 2025
@dzane17
Copy link
Member Author

dzane17 commented Sep 29, 2025

@jainankitk @sgup432 I revised the solution to avoid serde issues. No longer using ZLong

Copy link
Contributor

✅ Gradle check result for cbc5507: SUCCESS

Copy link

codecov bot commented Sep 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.92%. Comparing base (b4583c3) to head (8e664f8).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #19340      +/-   ##
============================================
- Coverage     72.96%   72.92%   -0.04%     
- Complexity    70117    70125       +8     
============================================
  Files          5687     5687              
  Lines        321659   321669      +10     
  Branches      46525    46527       +2     
============================================
- Hits         234684   234587      -97     
- Misses        68041    68171     +130     
+ Partials      18934    18911      -23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

❌ Gradle check result for fb422fd: TIMEOUT

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❕ Gradle check result for 8e664f8: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@jainankitk jainankitk merged commit 0819197 into opensearch-project:main Sep 30, 2025
33 checks passed
@dzane17 dzane17 deleted the negative-stats branch September 30, 2025 17:20
asimmahmood1 pushed a commit to asimmahmood1/OpenSearch that referenced this pull request Sep 30, 2025
peteralfonsi pushed a commit to peteralfonsi/OpenSearch that referenced this pull request Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Search Search query, autocomplete ...etc v3.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Negative Search Stats

4 participants