Skip to content

Conversation

shadowshot-x
Copy link

@shadowshot-x shadowshot-x commented Sep 3, 2025

retry_limit parameter is not honored and is set to 5. This feature provides dynamic retry_limit based on configuration for out_s3 plugin.

This feature fixes the infinite retry when buffers are cleared and Bucket name does not exist. It also implements retry_limit when users provide it on a out_s3 plugin level.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change

[SERVICE]
    flush        1
    daemon       Off
    log_level    info
    http_server  Off
    http_listen  0.0.0.0
    http_port    2020

[INPUT]
        Name dummy
        Tag dummy.data
        Dummy {"this is":"dummy data"}
        Rate 1
        Samples 1

[OUTPUT]
        Name                         s3
        Match                        dummy.data
        bucket                       NON_EXISTENT_BUCKET
        region                       us-west-2
        retry_limit                  2
        total_file_size              60M
        upload_timeout               20s
        log_level                    info
        net.keepalive_idle_timeout   60
        tls.verify                   False
        compression                  gzip
        use_put_object               True
        store_dir_limit_size         5G
        role_arn                     <ROLE_NAME>
        s3_key_format                /testpath/zstd4/%H%M%S-$UUID.gz
        workers                      1

[OUTPUT]
        Name                         s3
        Match                        dummy.data
        bucket                      NON_EXISTENT_BUCKET
        region                       us-west-2
        total_file_size              60M
        retry_limit                  5
        upload_timeout               20s
        log_level                    info
        net.keepalive_idle_timeout   60
        tls.verify                   False
        compression                  gzip
        use_put_object               True
        store_dir_limit_size         5G
        role_arn                     arn:aws:iam::654907767108:role/oil-publisher-role-irsa
        s3_key_format                /testpath/zstd0/%H%M%S-$UUID.gz
        workers                      1

  • Debug log output from testing the change

flb.log

  • Attached Valgrind output that shows no leaks or memory corruption was found

This run is for the scenario where we do not push the records.
valgrind_report.txt

This Report is justified as we dont want to delete the file and use flb_fstore_file_inactive(ctx->fs, fsf) instead of flb_fstore_destroy(ctx->fs)

It correlates with the current Master version :
valgrind_mem_check_4.0.8.txt

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [NA] Run local packaging test showing all targets (including any new ones) build.
  • [NA] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • [NA] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Additional Tests Run with Logs :

[Buffers Cleared Everytime]
BUCKET DOES NOT EXIST

  1. 1 s3 output plugin; retry_limit 3 :: Stops after 3 times trying :: SUCCESS -> NON_EXISTENT_TEST_1s3_3retry.txt
  2. 1 s3 output plugin; retry_limit 2 :: Stops after 2 times trying :: SUCCESS -> NON_EXISTENT_TEST_1s3_2retry.txt
  3. 1 s3 output plugin; No retry_limit :: Stops after 1 time trying :: SUCCESS -> NON_EXISTENT_TEST_1s3_1retry.txt
  4. 2 s3 output plugin; 1 correct, 1 incorrect retry_limit 2 :: 1 successfully uploads, 1 stops after 3 times :: SUCCESS -> NON_EXISTENT_TEST_2s3_1correct_2retry.txt
  5. 2 s3 output plugin; 2 incorrect retry_limit 1 and 2 :: 1 stops after 3 retries and other stops after 2 :: SUCCESS -> NON_EXISTENT_TEST_2s3_2retry_3retry.txt

[Buffers Cleared Everytime]
BUCKET EXISTS

  1. 1 s3 output plugin; retry_limit 2 :: successfully uploads :: NORMAL_BUCKET_1s3_2retry.txt
  2. 1 s3 output plugins; retry_limit 0 :: successfully uploads :: NORMAL_BUCKET_1s3_1retry.txt
  3. 2 s3 output plugins; retry_limit 1 :: successfully uploads :: NORMAL_BUCKET_2s3_2retry.txt

Summary by CodeRabbit

  • New Features

    • Added a configurable retry_limit for the S3 output plugin (default: 5), allowing per-instance control of upload retry attempts.
    • Uploads/chunks that exceed the configured retry_limit are discarded/inactivated to avoid endless retries.
    • Log messages now reference the configured retry_limit for clearer diagnostics.
  • Documentation

    • Added configuration description for retry_limit and its default behavior.

fixes: #10819

retry_limit parameter is not honored and is set to 5. This feature provides dynamic retry_limit based on configuration for out_s3 plugin.

Signed-off-by: usharma <[email protected]>
Copy link

coderabbitai bot commented Sep 3, 2025

Walkthrough

Replaces the global MAX_UPLOAD_ERRORS with a per-instance retry_limit (accessed via ctx->ins->retry_limit). All retry/discard checks and log messages in S3 upload, multipart, queueing, and flush paths now use the instance-configured retry_limit; the MAX_UPLOAD_ERRORS macro is removed.

Changes

Cohort / File(s) Summary
Retry logic updates
plugins/out_s3/s3.c
Replaced uses of MAX_UPLOAD_ERRORS with ctx->ins->retry_limit across put_all_chunks, get_upload, s3_upload_queue, cb_s3_upload, multipart complete checks, and cb_s3_flush. Updated log messages and discard/inactivate behavior to reference the instance retry_limit.
Header & config change
plugins/out_s3/s3.h
Removed #define MAX_UPLOAD_ERRORS. Updated comments to note retry limit is configurable via retry_limit and adjusted struct multipart_upload/struct flb_s3 documentation to reference the configurable limit.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor FLB as S3 Output Instance
  participant CH as Chunk / Multipart Upload
  participant S3 as AWS S3
  participant INS as Instance (retry_limit)

  FLB->>CH: initiate upload/flush/part
  CH->>S3: send request
  S3-->>CH: success / error

  alt Success
    CH-->>FLB: mark complete
  else Error
    CH->>INS: increment failure counter
    INS->>INS: compare failures >= retry_limit
    alt failures < retry_limit
      INS-->>CH: schedule retry
    else failures >= retry_limit
      INS-->>CH: inactivate/discard upload
      CH-->>FLB: log exceeded retry_limit
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit taps the log with care,
Counting tries in evening air.
When hops exceed the burrow's cap,
We tuck the packet in the lap.
Soft thumps, neat limits — off they go. 🥕


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0ca2ff6 and b6b73da.

📒 Files selected for processing (2)
  • plugins/out_s3/s3.c (6 hunks)
  • plugins/out_s3/s3.h (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • plugins/out_s3/s3.h
  • plugins/out_s3/s3.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: PR - fuzzing test
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
plugins/out_s3/s3.c (3)

1873-1881: Gate retry logic by FLB_OUT_RETRY_UNLIMITED
Wrap the existing retry‐limit check so FLB_OUT_RETRY_UNLIMITED doesn’t prematurely stop retries (else any non-negative counter ≥ –1 is always true):

--- a/plugins/out_s3/s3.c
+++ b/plugins/out_s3/s3.c
@@ -1873,7 +1873,9 @@
             upload_contents->retry_counter++;
-            if (upload_contents->retry_counter >= ctx->ins->retry_limit) {
+            if (ctx->ins->retry_limit != FLB_OUT_RETRY_UNLIMITED &&
+                upload_contents->retry_counter >= ctx->ins->retry_limit) {
                 flb_plg_warn(ctx->ins, "Chunk file failed to send %d times, will not "
                              "retry", upload_contents->retry_counter);
                 s3_store_file_inactive(ctx, upload_contents->upload_file);

This change aligns with other plugins’ unlimited-retry gating and restores true unlimited behavior.


3291-3296: Gate complete_errors check by FLB_OUT_RETRY_UNLIMITED
Wrap the existing threshold test so it’s only applied when retry_limit isn’t unlimited:

-        if (m_upload->complete_errors >= ctx->ins->retry_limit) {
+        if (ctx->ins->retry_limit != FLB_OUT_RETRY_UNLIMITED &&
+            m_upload->complete_errors >= ctx->ins->retry_limit) {

1628-1633: Gate multipart upload error threshold when retry_limit is finite
Modify in plugins/out_s3/s3.c:16281633 as follows:

-        if (tmp_upload->upload_errors >= ctx->ins->retry_limit) {
+        if (ctx->ins->retry_limit != FLB_OUT_RETRY_UNLIMITED &&
+            tmp_upload->upload_errors >= ctx->ins->retry_limit) {

This skips the error threshold when retry_limit is FLB_OUT_RETRY_UNLIMITED (-1), avoiding immediate completion for unlimited retries. (fossies.org)

🧹 Nitpick comments (1)
plugins/out_s3/s3.h (1)

59-61: Clarify that retry_limit comes from the output instance (Retry_Limit).

To avoid confusion with plugin-local fields, consider wording this note to reference the generic output Retry_Limit (ins->retry_limit) that Fluent Bit core already provides, since that’s what the code uses.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 35885ce and 0ca2ff6.

📒 Files selected for processing (2)
  • plugins/out_s3/s3.c (7 hunks)
  • plugins/out_s3/s3.h (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/out_s3/s3.c (1)
src/flb_fstore.c (1)
  • flb_fstore_file_inactive (218-235)
🔇 Additional comments (1)
plugins/out_s3/s3.h (1)

100-103: Comment tweak looks fine.

Fixed the retry_limit as it was not needed and it part of default instance

Signed-off-by: usharma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3 output retries ignore retry_limit
1 participant