MON-4361: Annotate optional monitoring manifests #2675

rexagod · 2025-09-18T00:10:43Z

Metric rules and metrics exporters have not been opted-in to keep the telemetry rules functioning.

I added CHANGELOG entry for this change.
No user facing changes, so no entry in CHANGELOG was needed.

openshift-ci-robot · 2025-09-18T00:10:47Z

@rexagod: This pull request references MON-4361 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

In response to this:

Metric rules and metrics exporters have not been opted-in to keep the telemetry rules functioning.

I added CHANGELOG entry for this change.

No user facing changes, so no entry in CHANGELOG was needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-09-18T00:11:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rexagod

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rexagod]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

simonpasquier

It might be good to wait for #2649 since it's migrating all the dashboards to static assets.

On the jsonnet implementation side, I wonder if it wouldn't easier to read/maintain if we inject the annotation into each component that needs it.
E.g. here for components for which all resources are OptionalMonitoring

cluster-monitoring-operator/jsonnet/main.jsonnet

Lines 520 to 539 in ea9a533

    
           { ['alertmanager/' + name]: inCluster.alertmanager[name] for name in std.objectFields(inCluster.alertmanager) } + 
        
           { ['alertmanager-user-workload/' + name]: userWorkload.alertmanager[name] for name in std.objectFields(userWorkload.alertmanager) } + 
        
           { ['cluster-monitoring-operator/' + name]: inCluster.clusterMonitoringOperator[name] for name in std.objectFields(inCluster.clusterMonitoringOperator) } + 
        
           { ['dashboards/' + name]: inCluster.dashboards[name] for name in std.objectFields(inCluster.dashboards) } + 
        
           { ['kube-state-metrics/' + name]: inCluster.kubeStateMetrics[name] for name in std.objectFields(inCluster.kubeStateMetrics) } + 
        
           { ['node-exporter/' + name]: inCluster.nodeExporter[name] for name in std.objectFields(inCluster.nodeExporter) } + 
        
           { ['openshift-state-metrics/' + name]: inCluster.openshiftStateMetrics[name] for name in std.objectFields(inCluster.openshiftStateMetrics) } + 
        
           { ['prometheus-k8s/' + name]: inCluster.prometheus[name] for name in std.objectFields(inCluster.prometheus) } + 
        
           { ['admission-webhook/' + name]: inCluster.admissionWebhook[name] for name in std.objectFields(inCluster.admissionWebhook) } + 
        
           { ['prometheus-operator/' + name]: inCluster.prometheusOperator[name] for name in std.objectFields(inCluster.prometheusOperator) } + 
        
           { ['prometheus-operator-user-workload/' + name]: userWorkload.prometheusOperator[name] for name in std.objectFields(userWorkload.prometheusOperator) } + 
        
           { ['prometheus-user-workload/' + name]: userWorkload.prometheus[name] for name in std.objectFields(userWorkload.prometheus) } + 
        
           { ['metrics-server/' + name]: inCluster.metricsServer[name] for name in std.objectFields(inCluster.metricsServer) } + 
        
           // needs to be removed once remote-write is allowed for sending telemetry 
        
           { ['telemeter-client/' + name]: inCluster.telemeterClient[name] for name in std.objectFields(inCluster.telemeterClient) } + 
        
           { ['monitoring-plugin/' + name]: inCluster.monitoringPlugin[name] for name in std.objectFields(inCluster.monitoringPlugin) } + 
        
           { ['thanos-querier/' + name]: inCluster.thanosQuerier[name] for name in std.objectFields(inCluster.thanosQuerier) } + 
        
           { ['thanos-ruler/' + name]: inCluster.thanosRuler[name] for name in std.objectFields(inCluster.thanosRuler) } + 
        
           { ['control-plane/' + name]: inCluster.controlPlane[name] for name in std.objectFields(inCluster.controlPlane) } + 
        
           { ['manifests/' + name]: inCluster.manifests[name] for name in std.objectFields(inCluster.manifests) } +

Or at the level of the jsonnet component file in case it's per resource.

simonpasquier · 2025-09-19T07:27:23Z

CHANGELOG.md

  - `KubePdbNotEnoughHealthyPods`
  - `KubeNodePressure`
  - `KubeNodeEviction`
+- []() Allow cluster-admins to opt-into optional monitoring using the `OptionalMonitoring` capability.


I realize that adding the annotation to the manifests under the assets/ directory will have no direct effect since there's no logic in CMO to deploy these resources conditionally, right?

I've raised a PR for that: https://github.com/openshift/cluster-monitoring-operator/pull/2688/files.

rexagod · 2025-09-30T17:22:33Z

Reverted the capability.openshift.io/name: Console to respect being able to support dashboards in optional monitoring since we'll still be scraping all targets anyway (to not break any telemetry rules).

Metric rules and metrics exporters have not been opted-in to keep the telemetry rules functioning. Optional components include: * Alertmanager * AlertmanagerUWM * ClusterMonitoringOperatorDeps (partially, for AM) * MonitoringPlugin * PromtheusOperator (partially, for AM) * PromtheusOperatorUWM * ThanosRuler Signed-off-by: Pranshu Srivastava <[email protected]>

openshift-ci · 2025-09-30T21:28:20Z

@rexagod: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/ginkgo-tests	`366ad50`	link	false	`/test ginkgo-tests`
ci/prow/okd-scos-e2e-aws-ovn	`366ad50`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/e2e-agnostic-operator	`366ad50`	link	true	`/test e2e-agnostic-operator`
ci/prow/e2e-aws-ovn-single-node	`366ad50`	link	false	`/test e2e-aws-ovn-single-node`
ci/prow/go-fmt	`366ad50`	link	true	`/test go-fmt`
ci/prow/e2e-hypershift-conformance	`366ad50`	link	true	`/test e2e-hypershift-conformance`
ci/prow/generate	`366ad50`	link	true	`/test generate`
ci/prow/e2e-aws-ovn-techpreview	`366ad50`	link	true	`/test e2e-aws-ovn-techpreview`
ci/prow/e2e-aws-ovn	`366ad50`	link	true	`/test e2e-aws-ovn`
ci/prow/e2e-aws-ovn-upgrade	`366ad50`	link	true	`/test e2e-aws-ovn-upgrade`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

simonpasquier · 2025-10-01T08:40:24Z

assets/admission-webhook/alertmanager-config-validating-webhook.yaml

 kind: ValidatingWebhookConfiguration
 metadata:
  annotations:
+    capability.openshift.io/name: OptionalMonitoring


If the service is optional (*), shouldn't we apply the annotation to all admission-webhook resources?

(*) there could be an argument that we still want the admission webhook for PrometheusRule resources because of telemetry?

simonpasquier · 2025-10-01T08:42:28Z

assets/monitoring-plugin/deployment.yaml

Not directly related to this change but if the console is disabled, wouldn't it be logical to avoid deploying the monitoring plugin resources?

simonpasquier · 2025-10-01T08:43:32Z

manifests/0000_50_cluster-monitoring-operator_00_0alertingrules-custom-resource-definition.yaml

  annotations:
    api-approved.openshift.io: https://github.com/openshift/api/pull/1406
    api.openshift.io/merged-by-featuregates: "true"
+    capability.openshift.io/name: OptionalMonitoring


not sure that CMO will start if the CRDs aren't present.

simonpasquier · 2025-10-01T08:46:25Z

.../0000_50_cluster-monitoring-operator_00_0alertmanager-config-custom-resource-definition.yaml

 kind: CustomResourceDefinition
 metadata:
  annotations:
+    capability.openshift.io/name: OptionalMonitoring


same here: IIRC the Prometheus operator will (at the minimum) log errors if the CRDs aren't installed.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 18, 2025

openshift-ci bot requested review from danielmellado and machine424 September 18, 2025 00:11

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 18, 2025

rexagod force-pushed the MON-4361 branch 3 times, most recently from c4db7d8 to a322ff8 Compare September 18, 2025 14:43

simonpasquier reviewed Sep 18, 2025

View reviewed changes

simonpasquier reviewed Sep 19, 2025

View reviewed changes

rexagod force-pushed the MON-4361 branch 2 times, most recently from 461f6b9 to cbdd203 Compare September 30, 2025 15:59

rexagod changed the title ~~MON-4361: [WIP] Annotate optional monitoring manifests~~ MON-4361: Annotate optional monitoring manifests Sep 30, 2025

rexagod force-pushed the MON-4361 branch from cbdd203 to 83bb61e Compare September 30, 2025 17:20

rexagod force-pushed the MON-4361 branch from 83bb61e to 366ad50 Compare September 30, 2025 17:58

simonpasquier reviewed Oct 1, 2025

View reviewed changes

	{ ['alertmanager/' + name]: inCluster.alertmanager[name] for name in std.objectFields(inCluster.alertmanager) } +
	{ ['alertmanager-user-workload/' + name]: userWorkload.alertmanager[name] for name in std.objectFields(userWorkload.alertmanager) } +
	{ ['cluster-monitoring-operator/' + name]: inCluster.clusterMonitoringOperator[name] for name in std.objectFields(inCluster.clusterMonitoringOperator) } +
	{ ['dashboards/' + name]: inCluster.dashboards[name] for name in std.objectFields(inCluster.dashboards) } +
	{ ['kube-state-metrics/' + name]: inCluster.kubeStateMetrics[name] for name in std.objectFields(inCluster.kubeStateMetrics) } +
	{ ['node-exporter/' + name]: inCluster.nodeExporter[name] for name in std.objectFields(inCluster.nodeExporter) } +
	{ ['openshift-state-metrics/' + name]: inCluster.openshiftStateMetrics[name] for name in std.objectFields(inCluster.openshiftStateMetrics) } +
	{ ['prometheus-k8s/' + name]: inCluster.prometheus[name] for name in std.objectFields(inCluster.prometheus) } +
	{ ['admission-webhook/' + name]: inCluster.admissionWebhook[name] for name in std.objectFields(inCluster.admissionWebhook) } +
	{ ['prometheus-operator/' + name]: inCluster.prometheusOperator[name] for name in std.objectFields(inCluster.prometheusOperator) } +
	{ ['prometheus-operator-user-workload/' + name]: userWorkload.prometheusOperator[name] for name in std.objectFields(userWorkload.prometheusOperator) } +
	{ ['prometheus-user-workload/' + name]: userWorkload.prometheus[name] for name in std.objectFields(userWorkload.prometheus) } +
	{ ['metrics-server/' + name]: inCluster.metricsServer[name] for name in std.objectFields(inCluster.metricsServer) } +
	// needs to be removed once remote-write is allowed for sending telemetry
	{ ['telemeter-client/' + name]: inCluster.telemeterClient[name] for name in std.objectFields(inCluster.telemeterClient) } +
	{ ['monitoring-plugin/' + name]: inCluster.monitoringPlugin[name] for name in std.objectFields(inCluster.monitoringPlugin) } +
	{ ['thanos-querier/' + name]: inCluster.thanosQuerier[name] for name in std.objectFields(inCluster.thanosQuerier) } +
	{ ['thanos-ruler/' + name]: inCluster.thanosRuler[name] for name in std.objectFields(inCluster.thanosRuler) } +
	{ ['control-plane/' + name]: inCluster.controlPlane[name] for name in std.objectFields(inCluster.controlPlane) } +
	{ ['manifests/' + name]: inCluster.manifests[name] for name in std.objectFields(inCluster.manifests) } +

MON-4361: Annotate optional monitoring manifests #2675

Are you sure you want to change the base?

MON-4361: Annotate optional monitoring manifests #2675

Uh oh!

Conversation

rexagod commented Sep 18, 2025

Uh oh!

openshift-ci-robot commented Sep 18, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Sep 18, 2025

Uh oh!

simonpasquier left a comment

Choose a reason for hiding this comment

Uh oh!

simonpasquier Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

rexagod Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

simonpasquier Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

rexagod commented Sep 30, 2025

Uh oh!

openshift-ci bot commented Sep 30, 2025

Uh oh!

simonpasquier Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

simonpasquier Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

simonpasquier Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

simonpasquier Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

openshift-ci-robot commented Sep 18, 2025 •

edited by openshift-ci bot

Loading