Skip to content

Conversation

marioferh
Copy link
Contributor

No description provided.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 1, 2025
Copy link
Contributor

openshift-ci bot commented Sep 1, 2025

Hello @marioferh! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 1, 2025
@marioferh marioferh force-pushed the prometheus_k8s_config_api branch from 4b7b3a6 to 5cef22c Compare September 15, 2025 13:11
@openshift-ci-robot
Copy link

/test remaining-required

Scheduling tests matching the pipeline_run_if_changed parameter:
/test e2e-aws-ovn
/test e2e-aws-ovn-hypershift
/test e2e-aws-ovn-hypershift-conformance
/test e2e-aws-ovn-techpreview
/test e2e-aws-serial-1of2
/test e2e-aws-serial-2of2
/test e2e-aws-serial-techpreview-1of2
/test e2e-aws-serial-techpreview-2of2
/test e2e-azure
/test e2e-gcp
/test e2e-upgrade
/test e2e-upgrade-out-of-change

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2025
@marioferh marioferh force-pushed the prometheus_k8s_config_api branch from 5cef22c to 86df303 Compare September 16, 2025 08:52
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 16, 2025
@JoelSpeed
Copy link
Contributor

@marioferh You appear to have some conflict markers here? Can you PTAL and make sure the correct content is pushed.

Let me know when you want this API reviewed

@marioferh
Copy link
Contributor Author

yep I've mixed the commit, let me fix it

@marioferh marioferh force-pushed the prometheus_k8s_config_api branch 2 times, most recently from 483f697 to bf0b7b7 Compare September 16, 2025 13:50
Signed-off-by: Mario Fernandez <[email protected]>
@marioferh marioferh force-pushed the prometheus_k8s_config_api branch from bf0b7b7 to 64ce7b7 Compare September 16, 2025 17:23
@marioferh marioferh changed the title WIP: Monitoring API prometheus k8s config API Monitoring API: prometheus k8s config API Sep 17, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 17, 2025
@everettraven
Copy link
Contributor

/assign

Comment on lines 92 to 95
// prometheusK8sConfig provides configuration options for the Prometheus instance
// Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations.
// prometheusK8sConfig is optional.
// When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this follow the existing pattern of related fields like the metricsServerConfig field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, some of the questions I have from reading this as-is are:

  • What is "the Prometheus instance"?
  • Where does it run? Is it run by default?
  • Is Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations all the things I can configure? Why might I care, as a user, to configure these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include helpful information, for end-users that may read this API documentation, in the GoDoc here.

All of the information you included in your response here is helpful context in understanding what changes to this configuration impacts.

Please also include some information in the description as to why users may want to configure this field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// prometheusK8sConfig is optional.
// When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
// +optional
PrometheusK8sConfig PrometheusK8sConfig `json:"prometheusK8sConfig,omitempty,omitzero"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the K8s in the name here? Feels a bit redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to differentiate from prometheusOperator, coming from config map. Fixed.

Comment on lines +430 to +431
// additionalAlertmanagerConfigs configures additional Alertmanager instances that receive alerts from
// the Prometheus component. By default, no additional Alertmanager instances are configured.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone unfamiliar with configuring Prometheus, I don't really understand what any of this means. Why would I care to configure additional alertmanager instances that receive alerts from Prometheus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AdditionalAlertmanagerConfig is used if the client has other Alertmanager and needs there the prometheus Alerts. Also to decouple alerts if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, additionalAlertmanagerConfigs is an optional field that configures Prometheus to send alerts to this set of Alertmanager instances. This is useful because it decouples alerts from what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not decouple alerts from nothing, sorry. It just sends the same alerts from Prometheus to another Alertmanager provided by the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this potentially clarify what this field does for end users?:

additionalAlertmanagerConfigs is an optional field used to configure how Prometheus communicates
with Alertmanager instances when sending alerts.

We should also include information as to what happens when this field is omitted in the GoDoc. Based on my current understanding, that would likely be something like:

When omitted, Prometheus will not send alerts to any Alertmanager instances.

Is this correct? Is there a default Alertmanager instance that will always be included in the configuration for Prometheus instances?

// the Prometheus component. By default, no additional Alertmanager instances are configured.
// +optional
// +kubebuilder:validation:MaxItems=10
// +listType=atomic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why atomic? Using atomic makes it so you can't have multiple-owner entries here, which doesn't seem like something unreasonable.

Would this be more appropriate as a listType=map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. Thx

// url is the URL of the remote write endpoint.
// +required
// +kubebuilder:validation:MaxLength=2048
URL *string `json:"url,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other constraints can we put in place to ensure the URL is valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// +kubebuilder:validation:MaxLength=2048
URL *string `json:"url,omitempty"`
// name is the name of the remote write configuration.
// +optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for this name to not be provided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, if you need to add and endpoint and you don't fill the name, data is not sent correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then this should be required?

Comment on lines 637 to 640
// remoteTimeout is the timeout for requests to the remote write endpoint.
// +optional
// +kubebuilder:validation:MaxLength=20
RemoteTimeout *string `json:"remoteTimeout,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for this to not be provided? what are the allowed values here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// +kubebuilder:validation:MaxLength=20
RemoteTimeout *string `json:"remoteTimeout,omitempty"`
// writeRelabelConfigs is a list of relabeling rules to apply before sending data to the remote endpoint.
// +optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for this to be omitted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if is optional and you dont fill it, then no relabel is done.

Comment on lines 649 to 676
type RelabelConfig struct {
// sourceLabels is a list of source label names.
// +optional
// +kubebuilder:validation:MaxItems=10
// +kubebuilder:validation:items:MaxLength=63
// +listType=set
SourceLabels []string `json:"sourceLabels,omitempty"`
// separator is the separator used to join source label values.
// +optional
// +kubebuilder:validation:MaxLength=10
Separator *string `json:"separator,omitempty"`
// regex is the regular expression to match against the concatenated source label values.
// +optional
// +kubebuilder:validation:MaxLength=1000
Regex *string `json:"regex,omitempty"`
// targetLabel is the target label name.
// +optional
// +kubebuilder:validation:MaxLength=63
TargetLabel *string `json:"targetLabel,omitempty"`
// replacement is the replacement value for the target label.
// +optional
// +kubebuilder:validation:MaxLength=255
Replacement *string `json:"replacement,omitempty"`
// action is the action to perform.
// +optional
// +kubebuilder:validation:MaxLength=20
Action *string `json:"action,omitempty"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate in the godoc more on why a user may care to configure these fields, constraints, and what it means for them to be omitted.

As-is I don't really understand what any of these fields result in when configured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that explanations are pretty similar to prometheus ones:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config

Copy link
Contributor

openshift-ci bot commented Sep 17, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from everettraven. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Mario Fernandez <[email protected]>
@marioferh
Copy link
Contributor Author

Some comments addressed, I will continue tomorrow

Signed-off-by: Mario Fernandez <[email protected]>
Signed-off-by: Mario Fernandez <[email protected]>
@marioferh marioferh changed the title Monitoring API: prometheus k8s config API MON-4030: prometheus k8s config API Sep 23, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 23, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 23, 2025

@marioferh: This pull request references MON-4030 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.21.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Signed-off-by: Mario Fernandez <[email protected]>
Copy link
Contributor

openshift-ci bot commented Sep 23, 2025

@marioferh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn 5cef22c link true /test e2e-aws-ovn
ci/prow/e2e-gcp 5cef22c link false /test e2e-gcp
ci/prow/e2e-aws-serial-1of2 5cef22c link true /test e2e-aws-serial-1of2
ci/prow/okd-scos-e2e-aws-ovn 505533b link false /test okd-scos-e2e-aws-ovn
ci/prow/minor-e2e-upgrade-minor 505533b link true /test minor-e2e-upgrade-minor

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of comments

// prometheusK8sConfig provides configuration options for the Prometheus instance, which is the pod running Prometheus in the cluster.
// By default, at least one Prometheus pod is deployed in the `openshift-monitoring` namespace to collect and store metrics for the platform.
//
// This field allows you to customize how Prometheus is deployed and operated, including:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this configure how all Prometheus instances are deployed and operated or just the default one?

Comment on lines +109 to +111
// For more information on Prometheus configuration, see:
// https://prometheus.io/docs/prometheus/latest/configuration/configuration/
// https://prometheus.io/docs/prometheus/latest/storage/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally avoid linking to documentation where possible because these docs can be updated and the references may no longer make sense for users.

I don't know that these links are necessary either. We should be appropriately documenting each field in the configuration here such that users can understand what they are configuring and why.

Comment on lines +430 to +431
// additionalAlertmanagerConfigs configures additional Alertmanager instances that receive alerts from
// the Prometheus component. By default, no additional Alertmanager instances are configured.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this potentially clarify what this field does for end users?:

additionalAlertmanagerConfigs is an optional field used to configure how Prometheus communicates
with Alertmanager instances when sending alerts.

We should also include information as to what happens when this field is omitted in the GoDoc. Based on my current understanding, that would likely be something like:

When omitted, Prometheus will not send alerts to any Alertmanager instances.

Is this correct? Is there a default Alertmanager instance that will always be included in the configuration for Prometheus instances?

// +optional
// +kubebuilder:validation:MaxItems=10
// +listType=map
// +listMapKey=apiVersion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this must be unique based on the apiVersion?

// +kubebuilder:validation:MaxLength=50
// +kubebuilder:validation:Pattern=`^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$`
// +optional
EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it actually make sense for this to be a string value? This makes it more complex to evaluate valid values.

For example, would it make sense for me to specify 3,000,000,000B here? The "integer" value there is higher than the maximum possible int32 value - is that problematic? What if I specified higher than the maximum possible int64 value in bytes - is that problematic?

Something like:

enforcedBodySizeLimit:
  value: 4000
  units: KB

would probably be easier to put constraints on that forces end users to make reasonable decisions here.

Alternatively, we decide a reasonable base unit that all values provided by a user will use and name the field accordingly (i.e enforcedBodySizeLimitMB)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, would it make sense to nest this under something like a scrapeConfig field?

Looking at https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config there are a lot of options that we aren't yet exposing. Do you ever intend to expand the set of scraping-related configuration options a user can configure?

Comment on lines +767 to +771
// replacement is the value against which a regex replace is performed if the
// regular expression matches. Regex capture groups are available.
// +optional
// +kubebuilder:validation:MaxLength=255
Replacement *string `json:"replacement,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if I don't supply this value?

What is the difference between omitting and providing an empty string?

Comment on lines +772 to +775
// action is the action to perform.
// +optional
// +kubebuilder:validation:MaxLength=20
Action *string `json:"action,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if I don't specify an action?

What are the allowed actions that I can specify?

// +optional
// +kubebuilder:validation:Enum=Verify;InsecureSkipVerify
// +kubebuilder:default=Verify
InsecureSkipVerify string `json:"insecureSkipVerify,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insecureSkipVerify: Verify and insecureSkipVerify: InsecureSkipVerify read weird and could be confusing for users. Consider using a new field name that better reflects what this field is used for.

// Allowed values are "Verify" (default, secure) and "InsecureSkipVerify" (skip certificate verification, insecure).
// By default, certificate verification is performed ("Verify").
// +optional
// +kubebuilder:validation:Enum=Verify;InsecureSkipVerify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using an enum, create a type alias with constants for the allowed values.

Comment on lines +805 to +847
// LocalObjectReference contains enough information to let you locate the
// referenced object inside the same namespace.
// ---
// New uses of this type are discouraged because of difficulty describing its usage when embedded in APIs.
// 1. Invalid usage help. It is impossible to add specific help for individual usage. In most embedded usages, there are particular
// restrictions like, "must refer only to types A and B" or "UID not honored" or "name must be restricted".
// Those cannot be well described when embedded.
// 2. Inconsistent validation. Because the usages are different, the validation rules are different by usage, which makes it hard for users to predict what will happen.
// 3. We cannot easily change it. Because this type is embedded in many locations, updates to this type
// will affect numerous schemas. Don't make new APIs embed an underspecified API type they do not control.
//
// Instead of using this type, create a locally provided and used type that is well-focused on your reference.
// For example, ServiceReferences for admission registration: https://github.com/kubernetes/api/blob/release-1.17/admissionregistration/v1/types.go#L533 .
// +structType=atomic
type LocalObjectReference struct {
// name of the referent.
// This field is effectively required, but due to backwards compatibility is
// allowed to be empty. Instances of this type with an empty value here are
// almost certainly wrong.
// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// +optional
// +default=""
// +kubebuilder:default=""
// +kubebuilder:validation:MaxLength=253
// TODO: Drop `kubebuilder:default` when controller-gen doesn't need it https://github.com/kubernetes-sigs/kubebuilder/issues/3896.
Name *string `json:"name,omitempty" protobuf:"bytes,1,opt,name=name"`
}

// SecretKeySelector selects a key of a Secret.
// +structType=atomic
type SecretKeySelector struct {
// The name of the secret in the pod's namespace to select from.
LocalObjectReference `json:",inline" protobuf:"bytes,1,opt,name=localObjectReference"`
// key of the secret to select from. Must be a valid secret key.
// +required
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
Key string `json:"key,omitempty" protobuf:"bytes,2,opt,name=key"`
// optional specifies whether the Secret or its key must be defined
// +optional
// +kubebuilder:validation:Enum=true;false
Optional string `json:"optional,omitempty" protobuf:"varint,3,opt,name=optional"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different than the existing SecretKeyReference type you've created earlier in this file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants