-
Notifications
You must be signed in to change notification settings - Fork 832
Add Overrides API component and rename old overrides to overrides-configs #6975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Bogdan Stancu <[email protected]>
29f5299
to
1dbe402
Compare
Signed-off-by: Bogdan Stancu <[email protected]>
1dbe402
to
60d09fb
Compare
Signed-off-by: Bogdan Stancu <[email protected]>
Signed-off-by: Bogdan Stancu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am liking it. I have do have a couple of suggestions
Signed-off-by: Bogdan Stancu <[email protected]>
90512a2
to
608f1ab
Compare
Signed-off-by: Bogdan Stancu <[email protected]>
Signed-off-by: Bogdan Stancu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it another pass. Thanks for your hard work!
# CLI flag: -query-scheduler.grpc-client-config.connect-timeout | ||
[connect_timeout: <duration> | default = 5s] | ||
|
||
overrides: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You added a new target "overrides", that's good. No need to add new flags. Use the flags available: https://cortexmetrics.io/docs/configuration/configuration-file/#runtime_configuration_storage_config
pkg/overrides/overrides.go
Outdated
if c.Backend == bucket.Filesystem { | ||
return errors.New(ErrFilesystemBackendNotSupported) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filesystem is supported by the runtime configuration overrides. The API should support it. Remove this.
pkg/overrides/overrides.go
Outdated
) | ||
|
||
// Config holds configuration for the overrides module | ||
type Config struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this config. Use
cortex/pkg/util/runtimeconfig/manager.go
Line 31 in 2c07f00
type Config struct { |
pkg/cortex/cortex.go
Outdated
RuntimeConfig runtimeconfig.Config `yaml:"runtime_config"` | ||
MemberlistKV memberlist.KVConfig `yaml:"memberlist"` | ||
QueryScheduler scheduler.Config `yaml:"query_scheduler"` | ||
Overrides overrides.Config `yaml:"overrides"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overrides overrides.Config `yaml:"overrides"` |
pkg/cortex/cortex.go
Outdated
c.RuntimeConfig.RegisterFlags(f) | ||
c.MemberlistKV.RegisterFlags(f) | ||
c.QueryScheduler.RegisterFlags(f) | ||
c.Overrides.RegisterFlags(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.Overrides.RegisterFlags(f) |
docs/api/_index.md
Outdated
| [Tenant delete request](#tenant-delete-request) | Purger || `POST /purger/delete_tenant` | | ||
| [Tenant delete status](#tenant-delete-status) | Purger || `GET /purger/delete_tenant_status` | | ||
| [Get user overrides](#get-user-overrides) | Overrides || `GET /api/v1/user-overrides` | | ||
| [Set user overrides](#set-user-overrides) | Overrides || `PUT /api/v1/user-overrides` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| [Set user overrides](#set-user-overrides) | Overrides || `PUT /api/v1/user-overrides` | | |
| [Set user overrides](#set-user-overrides) | Overrides || `POST /api/v1/user-overrides` | |
pkg/overrides/api.go
Outdated
type RuntimeConfigFile struct { | ||
Overrides map[string]map[string]interface{} `yaml:"overrides"` | ||
HardOverrides map[string]map[string]interface{} `yaml:"hard_overrides"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type RuntimeConfigFile struct { | |
Overrides map[string]map[string]interface{} `yaml:"overrides"` | |
HardOverrides map[string]map[string]interface{} `yaml:"hard_overrides"` | |
} |
This overrides the other values saved in the runtime like ingester_limits
Use
cortex/pkg/cortex/runtime_config.go
Line 26 in 2c07f00
type RuntimeConfigValues struct { |
And make sure the API doesn't delete ingester_limits values and similar
pkg/overrides/limits.go
Outdated
var AllowedLimits = []string{ | ||
"max_global_series_per_user", | ||
"max_global_series_per_metric", | ||
"ingestion_rate", | ||
"ingestion_burst_size", | ||
"ruler_max_rules_per_rule_group", | ||
"ruler_max_rule_groups_per_tenant", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this configuration in the runtime config
https://cortexmetrics.io/docs/configuration/configuration-file/#runtime_configuration_storage_config
pkg/overrides/overrides.go
Outdated
} | ||
|
||
// RegisterFlags registers the overrides module flags | ||
func (c *Config) RegisterFlags(f *flag.FlagSet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
pkg/overrides/overrides.go
Outdated
} | ||
|
||
// Validate validates the configuration and returns an error if validation fails | ||
func (c *Config) Validate() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this
Also add the overrides API to https://github.com/cortexproject/cortex/blob/master/docs/configuration/v1-guarantees.md |
Signed-off-by: Bogdan Stancu <[email protected]>
86bd67e
to
948fd7a
Compare
Signed-off-by: Bogdan Stancu <[email protected]>
948fd7a
to
61632a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, long needed feature 🙇
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I left some questions and suggestions to think about.
I didn't review all of overrides_test.go
but reviewed everything else.
integration/overrides_test.go
Outdated
require.NoError(t, err) | ||
defer resp.Body.Close() | ||
|
||
assert.Equal(t, http.StatusOK, resp.StatusCode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A 4xx of some kind is probably more appropriate in this case.
pkg/overrides/api.go
Outdated
) | ||
|
||
const ( | ||
// HTTP status codes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the existing http.StatusOK
, etc. constants?
pkg/overrides/api.go
Outdated
|
||
var config runtimeconfig.RuntimeConfigValues | ||
if err := yaml.NewDecoder(reader).Decode(&config); err != nil { | ||
return []string{}, nil // No allowed limits if config can't be decoded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function returns an error
but it is always nil
.
I agree that a non-existing or invalid config file should be equivalent to having no allowed limits, but it could be expressed better. A couple options:
- Remove the
error
from the return, but log each error. - Return the empty slice and an appropriate error. It is then the caller's responsibility to handle the error.
|
||
// Read overrides from bucket storage | ||
overrides, err := a.getOverridesFromBucket(r.Context(), userID) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error should be logged.
Also, it might be safer not to return the raw error to the user. A 500 and maybe a general error message would be sufficient. The raw error could expose underlying detail (like storage configurations, etc.) that some providers might consider sensitive. Also, while unlikely, attackers can sometimes learn more than they should from such error messages. This applies anywhere we are returning an error message to the user.
pkg/overrides/api.go
Outdated
func (a *API) getOverridesFromBucket(ctx context.Context, userID string) (map[string]interface{}, error) { | ||
reader, err := a.bucketClient.Get(ctx, a.runtimeConfigPath) | ||
if err != nil { | ||
return map[string]interface{}{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how widely this opinion is shared, but I'd prefer this error to be returned. Currently, as a user of getOverridesFromBucket()
I can't distinguish between an empty configuration and a situation where there is a configuration but it couldn't be read for some reason (network error, etc.).
pkg/overrides/limits.go
Outdated
var invalidLimits []string | ||
|
||
for limitName := range overrides { | ||
if !IsLimitAllowed(limitName, allowedLimits) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use slices.Contains()
.
pkg/overrides/limits.go
Outdated
|
||
// GetAllowedLimits returns the allowed limits from runtime config | ||
// If no allowed limits are configured, returns empty slice (no limits allowed) | ||
func GetAllowedLimits(allowedLimits []string) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
pkg/overrides/limits.go
Outdated
// Read the runtime config to get hard limits | ||
reader, err := a.bucketClient.Get(context.Background(), a.runtimeConfigPath) | ||
if err != nil { | ||
// If we can't read the config, skip hard limit validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now I think this function is too lenient when the hard limits can't be read. If we corrupt (or don't set) the hard limits, users could give themselves unreasonable limits (e.g. 5 trillion active series).
Two options:
- If hard limits can't be read, fail requests to update user overrides (this shouldn't really happen anyway).
- Create a default hard limit, maybe configured via CLI, so if a hard limit isn't set for a tenant, it will default to something that will prevent completely unreasonable values.
Also, these scenarios might be good to have in the e2e tests.
// validateHardLimits checks if the provided overrides exceed any hard limits from the runtime config | ||
func (a *API) validateHardLimits(overrides map[string]interface{}, userID string) error { | ||
// Read the runtime config to get hard limits | ||
reader, err := a.bucketClient.Get(context.Background(), a.runtimeConfigPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do cortex operators set these hard limits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same way overrides work today, manually in the runtime config file. This just adds the option for users to change a subset of limits within a set threshold through the api, cortex operator overrides stuff stays manual.
Signed-off-by: Bogdan Stancu <[email protected]>
0346f20
to
dff25e3
Compare
Signed-off-by: Bogdan Stancu <[email protected]>
e1bc9be
to
8e50e9a
Compare
Signed-off-by: Bogdan Stancu <[email protected]>
Signed-off-by: Bogdan Stancu <[email protected]>
Adds initial draft for the User Overrides API, described in https://cortexmetrics.io/docs/proposals/overrides-api/
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]