Skip to content

[Deferred] Improve default retry strategy #251

@rsamoilov

Description

@rsamoilov

Description

Currently, Rage::Deferred retries failed tasks up to 5 times using the formula rand(5 * 2**attempt) + 1. This prioritises the distribution of failed tasks, but the delays are too short — on average, a task exhausts all 5 retries in under 3 minutes:

Attempt Avg delay
1 ~6s
2 ~11s
3 ~21s
4 ~41s
5 ~81s
Total ~2.7 minutes

The problem is that most transient failures (service outages, deployment issues, configuration errors) take much longer than 3 minutes to resolve. By the time a developer discovers the issue, the task has already exhausted all its retries and won't be attempted again.

The goal of this issue is to increase the default retry count and use a formula that spaces retries further apart with each attempt, giving developers time to discover and fix the underlying issue.

Suggested formula

(attempt**4) + 10 + (rand(15) * attempt)

This uses polynomial growth (attempt⁴) instead of exponential (2ⁿ), combined with a small jitter component. The delays look like this:

Attempt Avg delay
1 ~18s
2 ~41s
3 ~2m
4 ~5m
5 ~11m
10 ~2.8h
15 ~14.1h
20 ~44.5h

With 15 retries, the total time to exhaust all attempts is roughly 2 days. With 20 retries, it's roughly 8 days.

The default retry count should be increased to either 15 or 20 (see design considerations below).

Design considerations

  • 15 vs 20 retries. 15 retries (~2 days total) may be enough for issues caught during the work week, but tight if something breaks on a Friday evening. 20 retries (~8 days total) is more forgiving and covers weekend scenarios, but means failed tasks occupy memory for longer. Worth discussing which default makes more sense.

Tips

  • Look at the existing retry logic in the codebase to find where the current formula and retry count are defined.
  • Check the Deferred docs to understand how Rage::Deferred works.
  • Check the architecture doc that shows how Rage's core components interact with each other and outlines the design principles.
  • Feel free to ask any questions or request help in the comments below!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions