Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add autoupdate controller metrics #50807

Merged
merged 2 commits into from
Jan 15, 2025
Merged

Conversation

hugoShaka
Copy link
Contributor

@hugoShaka hugoShaka commented Jan 7, 2025

Part of: RFD-184

Goal (internal): https://github.com/gravitational/cloud/issues/10289

This PR adds metrics to monitor and troubleshoot automatic agent rollouts. There are two metrics potentially increasing cardinality:

  • the metrics containing labeled with the start or target versions. There a mechanism to cleanup old time series and remove older labels so they don't pile up.
  • the metric containing the stage for each group. This metric is per-group, so as long as we control the number of groups we control the metric cardinality. The metric is the group name, so renaming groups will not increase cardinality and a misconfigured metrics server cannot accidentally disclose the agent groups.

Depends on:

@hugoShaka hugoShaka force-pushed the hugo/autoupdate-rollout-metrics branch from 37ff41f to fcfc8fd Compare January 9, 2025 23:06
@hugoShaka hugoShaka changed the base branch from master to hugo/teleport-use-non-global-metrics-registry January 9, 2025 23:07
@hugoShaka hugoShaka force-pushed the hugo/autoupdate-rollout-metrics branch from 1264570 to b3ba472 Compare January 9, 2025 23:27
@hugoShaka hugoShaka marked this pull request as ready for review January 9, 2025 23:27
@hugoShaka hugoShaka requested review from sclevine and vapopov January 9, 2025 23:27
@github-actions github-actions bot requested review from fheinecke and tigrato January 9, 2025 23:27
Base automatically changed from hugo/teleport-use-non-global-metrics-registry to master January 10, 2025 16:03
@hugoShaka hugoShaka force-pushed the hugo/autoupdate-rollout-metrics branch from b3ba472 to 4be59b9 Compare January 10, 2025 21:07
@public-teleport-github-review-bot public-teleport-github-review-bot bot removed the request for review from fheinecke January 10, 2025 21:15
@hugoShaka hugoShaka force-pushed the hugo/autoupdate-rollout-metrics branch 2 times, most recently from e7159a6 to e0ada45 Compare January 14, 2025 18:42
@hugoShaka hugoShaka changed the base branch from master to hugo/diagnostics-service-use-local-metrics-registry January 14, 2025 18:42
@hugoShaka hugoShaka added the no-changelog Indicates that a PR does not require a changelog entry label Jan 15, 2025
Base automatically changed from hugo/diagnostics-service-use-local-metrics-registry to master January 15, 2025 16:55
@hugoShaka hugoShaka force-pushed the hugo/autoupdate-rollout-metrics branch from e0ada45 to e9e6bb2 Compare January 15, 2025 18:56
@hugoShaka hugoShaka added this pull request to the merge queue Jan 15, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 15, 2025
@hugoShaka hugoShaka added this pull request to the merge queue Jan 15, 2025
Merged via the queue into master with commit c3881e1 Jan 15, 2025
41 checks passed
@hugoShaka hugoShaka deleted the hugo/autoupdate-rollout-metrics branch January 15, 2025 22:28
mvbrock pushed a commit that referenced this pull request Jan 18, 2025
* Add autoupdate controller metrics

* Do no panic in case of error conflict
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog Indicates that a PR does not require a changelog entry size/lg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants