Skip to content

[Pull-based ingestion] Support multi-threaded writes in pull based ingestion #17912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

varunbharadwaj
Copy link
Contributor

Description

This PR adds multi-threaded writer support in the pull-based ingestion flow. The incoming message will be hashed by ID and written to one of the blocking queue partitions. A processor thread will be started to consume and process updates from each blocking queue partition. This thread will handoff the updates to the engine to update the index.

Number of processor threads can be defined at the time of index creation by setting ingestion_source.num_processor_threads. If not set, a default value of 1 will be used.During shard recovery, the minimum shard pointer tracked across processor threads will be used the the start point.

Related Issues

Resolves #17875

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing labels Apr 11, 2025
@varunbharadwaj varunbharadwaj force-pushed the vb/multithreadwrite branch 4 times, most recently from ec2b1ed to bd15ddb Compare April 12, 2025 00:25
Copy link
Contributor

❌ Gradle check result for bd15ddb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for f5027e7: TIMEOUT

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for f5027e7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for f5027e7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for f5027e7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

@yupeng9 yupeng9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

❕ Gradle check result for 7e0dd19: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Apr 17, 2025

Codecov Report

Attention: Patch coverage is 83.80282% with 23 lines in your changes missing coverage. Please review.

Project coverage is 72.54%. Comparing base (cbaddd3) to head (7e0dd19).
Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
...ndices/pollingingest/MessageProcessorRunnable.java 69.23% 6 Missing and 2 partials ⚠️
...llingingest/PartitionedBlockingQueueContainer.java 89.61% 6 Missing and 2 partials ⚠️
...rch/indices/pollingingest/DefaultStreamPoller.java 76.19% 4 Missing and 1 partial ⚠️
...g/opensearch/cluster/metadata/IngestionSource.java 77.77% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17912      +/-   ##
============================================
+ Coverage     72.51%   72.54%   +0.03%     
+ Complexity    67108    67074      -34     
============================================
  Files          5475     5478       +3     
  Lines        309916   310013      +97     
  Branches      45060    45065       +5     
============================================
+ Hits         224725   224911     +186     
+ Misses        66895    66685     -210     
- Partials      18296    18417     +121     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andrross andrross merged commit d18982c into opensearch-project:main Apr 17, 2025
31 checks passed
x-INFiN1TY-x pushed a commit to x-INFiN1TY-x/OpenSearch_Local that referenced this pull request Apr 24, 2025
Harsh-87 pushed a commit to Harsh-87/OpenSearch that referenced this pull request May 7, 2025
Harsh-87 pushed a commit to Harsh-87/OpenSearch that referenced this pull request May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Multi-threaded writes in pull-based ingestion
3 participants