-
Notifications
You must be signed in to change notification settings - Fork 567
MON-4030: prometheus k8s config API #2463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hello @marioferh! Some important instructions when contributing to openshift/api: |
4b7b3a6
to
5cef22c
Compare
/test remaining-required Scheduling tests matching the |
5cef22c
to
86df303
Compare
@marioferh You appear to have some conflict markers here? Can you PTAL and make sure the correct content is pushed. Let me know when you want this API reviewed |
yep I've mixed the commit, let me fix it |
483f697
to
bf0b7b7
Compare
Signed-off-by: Mario Fernandez <[email protected]>
bf0b7b7
to
64ce7b7
Compare
/assign |
// prometheusK8sConfig provides configuration options for the Prometheus instance | ||
// Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations. | ||
// prometheusK8sConfig is optional. | ||
// When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have this follow the existing pattern of related fields like the metricsServerConfig
field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, some of the questions I have from reading this as-is are:
- What is "the Prometheus instance"?
- Where does it run? Is it run by default?
- Is
Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations
all the things I can configure? Why might I care, as a user, to configure these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- What is "the Prometheus instance"?
Prometheus instance is the pod that contains Prometheus. - Where does it run? Is it run by default?
By default there are at least one Prometheus pod in openshift-monitoring - Is Prometheus deployment, pod scheduling, resource allocation, retention policies, and external integrations all the things I can configure? Why might I care, as a user, to configure these?
How to configure Prometheus is important and depends on the user's needs.
Some of the config:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/
https://prometheus.io/docs/prometheus/latest/storage/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include helpful information, for end-users that may read this API documentation, in the GoDoc here.
All of the information you included in your response here is helpful context in understanding what changes to this configuration impacts.
Please also include some information in the description as to why users may want to configure this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// prometheusK8sConfig is optional. | ||
// When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time. | ||
// +optional | ||
PrometheusK8sConfig PrometheusK8sConfig `json:"prometheusK8sConfig,omitempty,omitzero"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the K8s
in the name here? Feels a bit redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe to differentiate from prometheusOperator, coming from config map. Fixed.
// additionalAlertmanagerConfigs configures additional Alertmanager instances that receive alerts from | ||
// the Prometheus component. By default, no additional Alertmanager instances are configured. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As someone unfamiliar with configuring Prometheus, I don't really understand what any of this means. Why would I care to configure additional alertmanager instances that receive alerts from Prometheus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AdditionalAlertmanagerConfig is used if the client has other Alertmanager and needs there the prometheus Alerts. Also to decouple alerts if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, additionalAlertmanagerConfigs
is an optional field that configures Prometheus to send alerts to this set of Alertmanager instances. This is useful because it decouples alerts from what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not decouple alerts from nothing, sorry. It just sends the same alerts from Prometheus to another Alertmanager provided by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this potentially clarify what this field does for end users?:
additionalAlertmanagerConfigs is an optional field used to configure how Prometheus communicates
with Alertmanager instances when sending alerts.
We should also include information as to what happens when this field is omitted in the GoDoc. Based on my current understanding, that would likely be something like:
When omitted, Prometheus will not send alerts to any Alertmanager instances.
Is this correct? Is there a default Alertmanager instance that will always be included in the configuration for Prometheus instances?
// the Prometheus component. By default, no additional Alertmanager instances are configured. | ||
// +optional | ||
// +kubebuilder:validation:MaxItems=10 | ||
// +listType=atomic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why atomic
? Using atomic makes it so you can't have multiple-owner entries here, which doesn't seem like something unreasonable.
Would this be more appropriate as a listType=map
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct. Thx
// url is the URL of the remote write endpoint. | ||
// +required | ||
// +kubebuilder:validation:MaxLength=2048 | ||
URL *string `json:"url,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other constraints can we put in place to ensure the URL is valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// +kubebuilder:validation:MaxLength=2048 | ||
URL *string `json:"url,omitempty"` | ||
// name is the name of the remote write configuration. | ||
// +optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for this name to not be provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, if you need to add and endpoint and you don't fill the name, data is not sent correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then this should be required?
// remoteTimeout is the timeout for requests to the remote write endpoint. | ||
// +optional | ||
// +kubebuilder:validation:MaxLength=20 | ||
RemoteTimeout *string `json:"remoteTimeout,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for this to not be provided? what are the allowed values here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// +kubebuilder:validation:MaxLength=20 | ||
RemoteTimeout *string `json:"remoteTimeout,omitempty"` | ||
// writeRelabelConfigs is a list of relabeling rules to apply before sending data to the remote endpoint. | ||
// +optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for this to be omitted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if is optional and you dont fill it, then no relabel is done.
type RelabelConfig struct { | ||
// sourceLabels is a list of source label names. | ||
// +optional | ||
// +kubebuilder:validation:MaxItems=10 | ||
// +kubebuilder:validation:items:MaxLength=63 | ||
// +listType=set | ||
SourceLabels []string `json:"sourceLabels,omitempty"` | ||
// separator is the separator used to join source label values. | ||
// +optional | ||
// +kubebuilder:validation:MaxLength=10 | ||
Separator *string `json:"separator,omitempty"` | ||
// regex is the regular expression to match against the concatenated source label values. | ||
// +optional | ||
// +kubebuilder:validation:MaxLength=1000 | ||
Regex *string `json:"regex,omitempty"` | ||
// targetLabel is the target label name. | ||
// +optional | ||
// +kubebuilder:validation:MaxLength=63 | ||
TargetLabel *string `json:"targetLabel,omitempty"` | ||
// replacement is the replacement value for the target label. | ||
// +optional | ||
// +kubebuilder:validation:MaxLength=255 | ||
Replacement *string `json:"replacement,omitempty"` | ||
// action is the action to perform. | ||
// +optional | ||
// +kubebuilder:validation:MaxLength=20 | ||
Action *string `json:"action,omitempty"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please elaborate in the godoc more on why a user may care to configure these fields, constraints, and what it means for them to be omitted.
As-is I don't really understand what any of these fields result in when configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that explanations are pretty similar to prometheus ones:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Mario Fernandez <[email protected]>
Some comments addressed, I will continue tomorrow |
Signed-off-by: Mario Fernandez <[email protected]>
Signed-off-by: Mario Fernandez <[email protected]>
@marioferh: This pull request references MON-4030 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.21.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Signed-off-by: Mario Fernandez <[email protected]>
@marioferh: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another round of comments
// prometheusK8sConfig provides configuration options for the Prometheus instance, which is the pod running Prometheus in the cluster. | ||
// By default, at least one Prometheus pod is deployed in the `openshift-monitoring` namespace to collect and store metrics for the platform. | ||
// | ||
// This field allows you to customize how Prometheus is deployed and operated, including: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this configure how all Prometheus instances are deployed and operated or just the default one?
// For more information on Prometheus configuration, see: | ||
// https://prometheus.io/docs/prometheus/latest/configuration/configuration/ | ||
// https://prometheus.io/docs/prometheus/latest/storage/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally avoid linking to documentation where possible because these docs can be updated and the references may no longer make sense for users.
I don't know that these links are necessary either. We should be appropriately documenting each field in the configuration here such that users can understand what they are configuring and why.
// additionalAlertmanagerConfigs configures additional Alertmanager instances that receive alerts from | ||
// the Prometheus component. By default, no additional Alertmanager instances are configured. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this potentially clarify what this field does for end users?:
additionalAlertmanagerConfigs is an optional field used to configure how Prometheus communicates
with Alertmanager instances when sending alerts.
We should also include information as to what happens when this field is omitted in the GoDoc. Based on my current understanding, that would likely be something like:
When omitted, Prometheus will not send alerts to any Alertmanager instances.
Is this correct? Is there a default Alertmanager instance that will always be included in the configuration for Prometheus instances?
// +optional | ||
// +kubebuilder:validation:MaxItems=10 | ||
// +listType=map | ||
// +listMapKey=apiVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this must be unique based on the apiVersion?
// +kubebuilder:validation:MaxLength=50 | ||
// +kubebuilder:validation:Pattern=`^(automatic|[0-9]+(B|KB|MB|GB|TB)?)$` | ||
// +optional | ||
EnforcedBodySizeLimit string `json:"enforcedBodySizeLimit,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it actually make sense for this to be a string value? This makes it more complex to evaluate valid values.
For example, would it make sense for me to specify 3,000,000,000B
here? The "integer" value there is higher than the maximum possible int32 value - is that problematic? What if I specified higher than the maximum possible int64 value in bytes - is that problematic?
Something like:
enforcedBodySizeLimit:
value: 4000
units: KB
would probably be easier to put constraints on that forces end users to make reasonable decisions here.
Alternatively, we decide a reasonable base unit that all values provided by a user will use and name the field accordingly (i.e enforcedBodySizeLimitMB
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, would it make sense to nest this under something like a scrapeConfig
field?
Looking at https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config there are a lot of options that we aren't yet exposing. Do you ever intend to expand the set of scraping-related configuration options a user can configure?
// replacement is the value against which a regex replace is performed if the | ||
// regular expression matches. Regex capture groups are available. | ||
// +optional | ||
// +kubebuilder:validation:MaxLength=255 | ||
Replacement *string `json:"replacement,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if I don't supply this value?
What is the difference between omitting and providing an empty string?
// action is the action to perform. | ||
// +optional | ||
// +kubebuilder:validation:MaxLength=20 | ||
Action *string `json:"action,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if I don't specify an action?
What are the allowed actions that I can specify?
// +optional | ||
// +kubebuilder:validation:Enum=Verify;InsecureSkipVerify | ||
// +kubebuilder:default=Verify | ||
InsecureSkipVerify string `json:"insecureSkipVerify,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
insecureSkipVerify: Verify
and insecureSkipVerify: InsecureSkipVerify
read weird and could be confusing for users. Consider using a new field name that better reflects what this field is used for.
// Allowed values are "Verify" (default, secure) and "InsecureSkipVerify" (skip certificate verification, insecure). | ||
// By default, certificate verification is performed ("Verify"). | ||
// +optional | ||
// +kubebuilder:validation:Enum=Verify;InsecureSkipVerify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using an enum, create a type alias with constants for the allowed values.
// LocalObjectReference contains enough information to let you locate the | ||
// referenced object inside the same namespace. | ||
// --- | ||
// New uses of this type are discouraged because of difficulty describing its usage when embedded in APIs. | ||
// 1. Invalid usage help. It is impossible to add specific help for individual usage. In most embedded usages, there are particular | ||
// restrictions like, "must refer only to types A and B" or "UID not honored" or "name must be restricted". | ||
// Those cannot be well described when embedded. | ||
// 2. Inconsistent validation. Because the usages are different, the validation rules are different by usage, which makes it hard for users to predict what will happen. | ||
// 3. We cannot easily change it. Because this type is embedded in many locations, updates to this type | ||
// will affect numerous schemas. Don't make new APIs embed an underspecified API type they do not control. | ||
// | ||
// Instead of using this type, create a locally provided and used type that is well-focused on your reference. | ||
// For example, ServiceReferences for admission registration: https://github.com/kubernetes/api/blob/release-1.17/admissionregistration/v1/types.go#L533 . | ||
// +structType=atomic | ||
type LocalObjectReference struct { | ||
// name of the referent. | ||
// This field is effectively required, but due to backwards compatibility is | ||
// allowed to be empty. Instances of this type with an empty value here are | ||
// almost certainly wrong. | ||
// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names | ||
// +optional | ||
// +default="" | ||
// +kubebuilder:default="" | ||
// +kubebuilder:validation:MaxLength=253 | ||
// TODO: Drop `kubebuilder:default` when controller-gen doesn't need it https://github.com/kubernetes-sigs/kubebuilder/issues/3896. | ||
Name *string `json:"name,omitempty" protobuf:"bytes,1,opt,name=name"` | ||
} | ||
|
||
// SecretKeySelector selects a key of a Secret. | ||
// +structType=atomic | ||
type SecretKeySelector struct { | ||
// The name of the secret in the pod's namespace to select from. | ||
LocalObjectReference `json:",inline" protobuf:"bytes,1,opt,name=localObjectReference"` | ||
// key of the secret to select from. Must be a valid secret key. | ||
// +required | ||
// +kubebuilder:validation:MinLength=1 | ||
// +kubebuilder:validation:MaxLength=253 | ||
Key string `json:"key,omitempty" protobuf:"bytes,2,opt,name=key"` | ||
// optional specifies whether the Secret or its key must be defined | ||
// +optional | ||
// +kubebuilder:validation:Enum=true;false | ||
Optional string `json:"optional,omitempty" protobuf:"varint,3,opt,name=optional"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different than the existing SecretKeyReference
type you've created earlier in this file?
No description provided.