Skip to content

Conversation

rexagod
Copy link
Member

@rexagod rexagod commented Sep 18, 2025

Metric rules and metrics exporters have not been opted-in to keep the telemetry rules functioning.

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 18, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 18, 2025

@rexagod: This pull request references MON-4361 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

In response to this:

Metric rules and metrics exporters have not been opted-in to keep the telemetry rules functioning.

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Sep 18, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rexagod

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 18, 2025
@rexagod rexagod force-pushed the MON-4361 branch 3 times, most recently from c4db7d8 to a322ff8 Compare September 18, 2025 14:43
Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to wait for #2649 since it's migrating all the dashboards to static assets.

On the jsonnet implementation side, I wonder if it wouldn't easier to read/maintain if we inject the annotation into each component that needs it.
E.g. here for components for which all resources are OptionalMonitoring

{ ['alertmanager/' + name]: inCluster.alertmanager[name] for name in std.objectFields(inCluster.alertmanager) } +
{ ['alertmanager-user-workload/' + name]: userWorkload.alertmanager[name] for name in std.objectFields(userWorkload.alertmanager) } +
{ ['cluster-monitoring-operator/' + name]: inCluster.clusterMonitoringOperator[name] for name in std.objectFields(inCluster.clusterMonitoringOperator) } +
{ ['dashboards/' + name]: inCluster.dashboards[name] for name in std.objectFields(inCluster.dashboards) } +
{ ['kube-state-metrics/' + name]: inCluster.kubeStateMetrics[name] for name in std.objectFields(inCluster.kubeStateMetrics) } +
{ ['node-exporter/' + name]: inCluster.nodeExporter[name] for name in std.objectFields(inCluster.nodeExporter) } +
{ ['openshift-state-metrics/' + name]: inCluster.openshiftStateMetrics[name] for name in std.objectFields(inCluster.openshiftStateMetrics) } +
{ ['prometheus-k8s/' + name]: inCluster.prometheus[name] for name in std.objectFields(inCluster.prometheus) } +
{ ['admission-webhook/' + name]: inCluster.admissionWebhook[name] for name in std.objectFields(inCluster.admissionWebhook) } +
{ ['prometheus-operator/' + name]: inCluster.prometheusOperator[name] for name in std.objectFields(inCluster.prometheusOperator) } +
{ ['prometheus-operator-user-workload/' + name]: userWorkload.prometheusOperator[name] for name in std.objectFields(userWorkload.prometheusOperator) } +
{ ['prometheus-user-workload/' + name]: userWorkload.prometheus[name] for name in std.objectFields(userWorkload.prometheus) } +
{ ['metrics-server/' + name]: inCluster.metricsServer[name] for name in std.objectFields(inCluster.metricsServer) } +
// needs to be removed once remote-write is allowed for sending telemetry
{ ['telemeter-client/' + name]: inCluster.telemeterClient[name] for name in std.objectFields(inCluster.telemeterClient) } +
{ ['monitoring-plugin/' + name]: inCluster.monitoringPlugin[name] for name in std.objectFields(inCluster.monitoringPlugin) } +
{ ['thanos-querier/' + name]: inCluster.thanosQuerier[name] for name in std.objectFields(inCluster.thanosQuerier) } +
{ ['thanos-ruler/' + name]: inCluster.thanosRuler[name] for name in std.objectFields(inCluster.thanosRuler) } +
{ ['control-plane/' + name]: inCluster.controlPlane[name] for name in std.objectFields(inCluster.controlPlane) } +
{ ['manifests/' + name]: inCluster.manifests[name] for name in std.objectFields(inCluster.manifests) } +

Or at the level of the jsonnet component file in case it's per resource.

CHANGELOG.md Outdated
- `KubePdbNotEnoughHealthyPods`
- `KubeNodePressure`
- `KubeNodeEviction`
- []() Allow cluster-admins to opt-into optional monitoring using the `OptionalMonitoring` capability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that adding the annotation to the manifests under the assets/ directory will have no direct effect since there's no logic in CMO to deploy these resources conditionally, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@rexagod rexagod force-pushed the MON-4361 branch 2 times, most recently from 461f6b9 to cbdd203 Compare September 30, 2025 15:59
@rexagod rexagod changed the title MON-4361: [WIP] Annotate optional monitoring manifests MON-4361: Annotate optional monitoring manifests Sep 30, 2025
@rexagod
Copy link
Member Author

rexagod commented Sep 30, 2025

Reverted the capability.openshift.io/name: Console to respect being able to support dashboards in optional monitoring since we'll still be scraping all targets anyway (to not break any telemetry rules).

Metric rules and metrics exporters have not been opted-in to keep the
telemetry rules functioning. Optional components include:
* Alertmanager
* AlertmanagerUWM
* ClusterMonitoringOperatorDeps (partially, for AM)
* MonitoringPlugin
* PromtheusOperator (partially, for AM)
* PromtheusOperatorUWM
* ThanosRuler

Signed-off-by: Pranshu Srivastava <[email protected]>
Copy link
Contributor

openshift-ci bot commented Sep 30, 2025

@rexagod: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ginkgo-tests 366ad50 link false /test ginkgo-tests
ci/prow/okd-scos-e2e-aws-ovn 366ad50 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-agnostic-operator 366ad50 link true /test e2e-agnostic-operator
ci/prow/e2e-aws-ovn-single-node 366ad50 link false /test e2e-aws-ovn-single-node
ci/prow/go-fmt 366ad50 link true /test go-fmt
ci/prow/e2e-hypershift-conformance 366ad50 link true /test e2e-hypershift-conformance
ci/prow/generate 366ad50 link true /test generate
ci/prow/e2e-aws-ovn-techpreview 366ad50 link true /test e2e-aws-ovn-techpreview
ci/prow/e2e-aws-ovn 366ad50 link true /test e2e-aws-ovn
ci/prow/e2e-aws-ovn-upgrade 366ad50 link true /test e2e-aws-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

kind: ValidatingWebhookConfiguration
metadata:
annotations:
capability.openshift.io/name: OptionalMonitoring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the service is optional (*), shouldn't we apply the annotation to all admission-webhook resources?

(*) there could be an argument that we still want the admission webhook for PrometheusRule resources because of telemetry?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly related to this change but if the console is disabled, wouldn't it be logical to avoid deploying the monitoring plugin resources?

annotations:
api-approved.openshift.io: https://github.com/openshift/api/pull/1406
api.openshift.io/merged-by-featuregates: "true"
capability.openshift.io/name: OptionalMonitoring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure that CMO will start if the CRDs aren't present.

kind: CustomResourceDefinition
metadata:
annotations:
capability.openshift.io/name: OptionalMonitoring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here: IIRC the Prometheus operator will (at the minimum) log errors if the CRDs aren't installed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants