fix(flagd): configurable retry backoff after each sync cycle error (#756) #806

alexandraoberaigner · 2025-12-01T15:30:02Z

This PR

adds configurable retry backoff options to the flagd in-process provider, which allows to configure the retry policy of gRPC connections.

Aims to make the implementation consistent with Java. See this change to prevent tight loops when the retry policy doesn't take effect

Related Issues

it originated from discussions in Infinite retry to establish connection to FlagSyncService in Flagd golang provider #756
split from fix(flagd): do not retry for certain status codes (#756) #799

Details

Configuration and API Enhancements:

Added RetryBackoffMs and RetryBackoffMaxMs to ProviderConfiguration, with corresponding default values, environment variable support, and provider options (WithRetryBackoffMs, WithRetryBackoffMaxMs). These allow users to configure retry backoff durations for connection retries. [1] [2] [3] [4] [5] [6]
Updated the in-process sync and service layers to accept and use the new retry backoff configuration values, passing them through all relevant constructors and methods. [1] [2] [3] [4]

Retry Logic and Policy:

Modified the gRPC retry policy builder to use the configured backoff values for InitialBackoff and MaxBackoff instead of hardcoded durations, making retry behavior fully configurable.
Updated the sync retry logic to log and sleep for the configured maximum backoff duration after a failed sync cycle, improving observability and control.

Testing and Validation:

Added a new unit test (TestBuildRetryPolicy) for the retry policy builder, verifying that the generated JSON policy reflects the configured backoff values and other retry parameters.
Updated integration and unit tests to include the new retry backoff options, ensuring end-to-end coverage. [1] [2] [3] [4] [5]

Signed-off-by: Alexandra Oberaigner <[email protected]>

…t behaviour to Java flagd provider Signed-off-by: Alexandra Oberaigner <[email protected]>

Signed-off-by: Alexandra Oberaigner <[email protected]>

aepfli

I am still not the biggest fan of a sleep, as this will block everything. even if do some kind of context cancellation and want to stop execution. we are stopped by a pretty long Sleep. Hence i think we should really try to use a different construct than a sleep.

The rest of the pull request looks good to me, and is ready to merge.

Disclaimer: For me the sleep is a reason to not approving this, but not a big enough reason to block this pull request from getting merged, if the rest of the approvers are fine with this.

Signed-off-by: Alexandra Oberaigner <[email protected]>

sahidvelji · 2025-12-10T13:22:06Z

providers/flagd/pkg/configuration.go

 }

+// WithRetryBackoffMs sets the initial backoff duration (in milliseconds) for retrying failed connections
+func WithRetryBackoffMs(retryBackoffMs int) ProviderOption {


A parameter that represents a duration should be of type time.Duration. The name of the parameter should be changed to remove any reference to a specific unit like milliseconds.

sahidvelji · 2025-12-10T13:59:40Z

providers/flagd/pkg/configuration.go

-		}
-	}
+
+	cfg.RetryGracePeriod = getIntFromEnvVarOrDefault(flagdGracePeriodVariableName, defaultGracePeriod, cfg.log)


Use time.ParseDuration to parse durations.

toddbaert · 2025-12-10T20:51:14Z

I am still not the biggest fan of a sleep, as this will block everything. even if do some kind of context cancellation and want to stop execution. we are stopped by a pretty long Sleep. Hence i think we should really try to use a different construct than a sleep.

The rest of the pull request looks good to me, and is ready to merge.

Disclaimer: For me the sleep is a reason to not approving this, but not a big enough reason to block this pull request from getting merged, if the rest of the approvers are fine with this.

Ya, personally I'm fine with it. We already have the same pattern in Java, so I would go with it for now. This sleep is only during what's already an unusual error scenario (connection is healthy, but stream keeps returning errors immediately), within a retry loop, so I think it's acceptable.... but looks like @alexandraoberaigner already used a timer.

providers/flagd/pkg/service/in_process/grpc_sync.go

toddbaert

I added a recommended comment for why we are doing our own weird simple backoff for unusual situations. I also agree with @sahidvelji 's duration comment.

Otherwise LGTM.

Co-authored-by: Todd Baert <[email protected]> Signed-off-by: alexandraoberaigner <[email protected]>

github-actions bot assigned bacherfl, Kavindu-Dodan and toddbaert Dec 1, 2025

github-actions bot requested review from Kavindu-Dodan, bacherfl and toddbaert December 1, 2025 15:30

This comment was marked as outdated.

Sign in to view

alexandraoberaigner force-pushed the fix/retry-backoff branch from 4762cab to 9903081 Compare December 1, 2025 15:34

alexandraoberaigner added 6 commits December 9, 2025 13:03

fix: return fatal on certain error codes during first stream cycle

90d8dcf

Signed-off-by: Alexandra Oberaigner <[email protected]>

update testbed, remove PR-split leftovers

9ab3191

Signed-off-by: Alexandra Oberaigner <[email protected]>

fix fatal codes default parsing

5e3660b

Signed-off-by: Alexandra Oberaigner <[email protected]>

fix: return fatal on certain error codes during first stream cycle

b5478f6

Signed-off-by: Alexandra Oberaigner <[email protected]>

chore: make backoff configurable in grpc retry policy, have consisten…

f3c2394

…t behaviour to Java flagd provider Signed-off-by: Alexandra Oberaigner <[email protected]>

fix e2e tests: backoff after sending error event

dc1cf05

Signed-off-by: Alexandra Oberaigner <[email protected]>

alexandraoberaigner force-pushed the fix/retry-backoff branch from 9903081 to dc1cf05 Compare December 9, 2025 12:07

alexandraoberaigner marked this pull request as ready for review December 9, 2025 14:17

alexandraoberaigner requested review from a team as code owners December 9, 2025 14:17

aepfli reviewed Dec 10, 2025

View reviewed changes

use timer instead of sleep to allow context cancellation

0edd671

Signed-off-by: Alexandra Oberaigner <[email protected]>

alexandraoberaigner requested a review from aepfli December 10, 2025 12:53

sahidvelji reviewed Dec 10, 2025

View reviewed changes

toddbaert reviewed Dec 10, 2025

View reviewed changes

providers/flagd/pkg/service/in_process/grpc_sync.go Outdated Show resolved Hide resolved

toddbaert approved these changes Dec 10, 2025

View reviewed changes

This was referenced Dec 11, 2025

[flagd-provider] fix non-conformant config options open-feature/js-sdk-contrib#533

Closed

[flagd] add retry policy, and make configurable open-feature/js-sdk-contrib#1424

Open

fix: add comment for more explaination (pr suggestion)

e19a15a

Co-authored-by: Todd Baert <[email protected]> Signed-off-by: alexandraoberaigner <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(flagd): configurable retry backoff after each sync cycle error (#756) #806

fix(flagd): configurable retry backoff after each sync cycle error (#756) #806

Uh oh!

alexandraoberaigner commented Dec 1, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

aepfli left a comment

Uh oh!

sahidvelji Dec 10, 2025

Uh oh!

sahidvelji Dec 10, 2025

Uh oh!

toddbaert commented Dec 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

toddbaert left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fix(flagd): configurable retry backoff after each sync cycle error (#756) #806

Are you sure you want to change the base?

fix(flagd): configurable retry backoff after each sync cycle error (#756) #806

Uh oh!

Conversation

alexandraoberaigner commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR

Related Issues

Details

Uh oh!

This comment was marked as outdated.

aepfli left a comment

Choose a reason for hiding this comment

Uh oh!

sahidvelji Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

sahidvelji Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

toddbaert commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

toddbaert left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

alexandraoberaigner commented Dec 1, 2025 •

edited

Loading

toddbaert commented Dec 10, 2025 •

edited

Loading