Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Collection Timeout #3270

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

mitanshu001
Copy link

@mitanshu001 mitanshu001 commented Mar 16, 2025

Summary

This PR introduces a collection timeout feature to the Node Exporter. Posting the draft PR to get initial feedback, I will add an option to configure the timeout value. This will provide users with the flexibility to adjust the timeout based on their specific needs and environments(scrape timeout)

Need

If the time to collect metrics from any one collector is greater than scrape timeout than None of the other metrics are collected and results in false Node down alerts.

Timeout Implementation

  • Added a constant collectorTimeout
  • Utilized the context package to manage the timeout for each collector operation.
  • Implemented a buffered channel collectorCh to reduce blocking during metric collection.
  • Introduced an error channel errCh to capture errors from the Update function.
  • Enhanced error handling to log timeout-specific errors.

Testing
Ensure that the timeout is enforced correctly and that appropriate logs are generated when a timeout occurs.
Verify that metrics are collected and sent without blocking or errors.

Signed-off-by: Mitanshu Mittal <[email protected]>
Signed-off-by: Mitanshu Mittal <[email protected]>
@@ -27,6 +28,7 @@ import (

// Namespace defines the common namespace to be used by all metrics.
const namespace = "node"
const collectorTimeout = 20 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout should be read from the Prometheus X-Prometheus-Scrape-Timeout-Seconds header.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SuperQ I was thinking if we should expose this as a config variable, If not passed we can have X-Prometheus-Scrape-Timeout-Seconds - .5sec as default.

If overall implementation of timeout looks good to you I will go ahead and make the above change as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants