fix: Add retries for obtaining runtime metrics #5732

zacharycmontoya · 2025-11-18T15:36:12Z

Motivation

The Test_Config_RuntimeMetrics_Enabled test cases have been flaky

Changes

Add retries to the get_runtime_metrics helper totaling over 10s of waiting, which is the standard runtime metrics reporting interval. This means the "runtime metrics disabled" tests will run for at least 10s, but it should make the "runtime metrics enabled" tests more reliable.

Workflow

⚠️ Create your PR as draft ⚠️
Work on you PR until the CI passes
Mark it as ready for review
- Test logic is modified? -> Get a review from RFC owner.
- Framework is modified, or non obvious usage of it -> get a review from R&P team

🚀 Once your PR is reviewed and the CI green, you can merge it!

🛟 #apm-shared-testing 🛟

Reviewer checklist

If PR title starts with [<language>], double-check that only <language> is impacted by the change
No system-tests internal is modified. Otherwise, I have the approval from R&P team
A docker base image is modified?
- the relevant build-XXX-image label is present
A scenario is added (or removed)?
- Get a review from R&P team

…aiting, which is the standard runtime metrics reporting interval. This means the "runtime metrics disabled" tests will run for at least 10s, but it should make the "runtime metrics enabled" tests more reliable.

github-actions · 2025-11-18T15:40:10Z

CODEOWNERS have been resolved as:

tests/test_config_consistency.py                                        @DataDog/system-tests-core

cbeauchesne · 2025-11-18T23:07:03Z

tests/test_config_consistency.py

+    for _ in range(retry):
+        runtime_metrics_gauges = [
+            metric
+            for _, metric in interfaces.agent.get_metrics()


Unless I've missed a point, when this code is executed, the agent container is down, and the data are loaded from files on filesystem, so adding a retry here should ne change anything.

The proper way to harden this test is

get from the agent team the timeout where we consider a missing data being a bug

then adds a call to interfaces.agent.wait_for() in the setup function, using this timeout as an argument.

wait_for takes two arguments :

a callable that takes a data, which is a payload sent by the agent, and returns true to stop waiting

a timeout function

wait_for does not returns anything, neither fail. it's the responsability of the callable to store whatever is needed to assertion.

A typical pattern will be :

def setup(self): self.metric_found = False def is_metric(data): self.metric_found = data["type"] == "metric" return self.metric_found interfaces.agent.wait_for(is_metric, 10) def test(self): assert self.metric_found

Feel free to DM me if anything is not clear enough :)

Add retries to the get_runtime_metrics helper totalling over 10s of w…

bbc703f

…aiting, which is the standard runtime metrics reporting interval. This means the "runtime metrics disabled" tests will run for at least 10s, but it should make the "runtime metrics enabled" tests more reliable.

Address lint issue

46936b9

zacharycmontoya marked this pull request as ready for review November 18, 2025 21:29

zacharycmontoya requested a review from a team as a code owner November 18, 2025 21:29

cbeauchesne requested changes Nov 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Add retries for obtaining runtime metrics #5732

fix: Add retries for obtaining runtime metrics #5732

Uh oh!

zacharycmontoya commented Nov 18, 2025

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

cbeauchesne Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: Add retries for obtaining runtime metrics #5732

Are you sure you want to change the base?

fix: Add retries for obtaining runtime metrics #5732

Uh oh!

Conversation

zacharycmontoya commented Nov 18, 2025

Motivation

Changes

Workflow

Reviewer checklist

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

cbeauchesne Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants