Skip to content

Conversation

@lc525
Copy link
Member

@lc525 lc525 commented Oct 3, 2025

Why

Motivation

Need for a smooth upgrade path from 2.9.x to 2.10.0. Previously:

  • existing SeldonConfig CRs would not smoothly work after upgrading to 2.10
  • changes in SeldonConfig (for example to scalability options) would not be picked-up by the rest of the install

The smooth upgrade plan is the following:

  • When using an existing SeldonConfig CR, we keep everything the same as in 2.9.1 (i.e dataflow, model-gw reading number of partitions from env vars). The scheduler considers the old numPartitions setting by fetching it via a ConfigMap from SeldonConfig.config.scalingConfig.Pipelines.MaxShardCountMultiplier. An empty scaling config in SeldonConfig.config.scalingConfig is interpreted as if SeldonConfig.config.scalingConfig.Pipelines.MaxShardCountMultiplier == 1 (one processing replica per pipeline/model)
  • Any other settings in scalingConfig can be set but have no effect in 2.10. Similarly, settings in SeldonConfig.config.kafkaConfig.Topics can be set but have no effect -- this is to give customers an opportunity to upgrade their CRs in preparation for 2.11 which will use those settings
  • In particular, we will ask clients to upgrade their SeldonConfig to contain settings such as general autoscaling options, kafka number of partitions, etc, to be in sync with the values they have already set in other places (env vars for various pods). Again, modifing those will have no effect until 2.11
  • In 2.11, we switch to activating those values, or considering defaults if they are missing. Customers that haven't upgraded their CRs incrementally in 2.10 will need to take manual steps to configure the upgrade to 2.11. Other customers that have already set the right options in SeldonConfig will be able to smoothly upgrade to 2.11
  • In 2.10, customers will be able to set/modify SeldonConfig.config.scalingConfig.Pipelines.MaxShardCountMultiplier and this change will propagate automatically through our microservices (this changes the way pipelines/models are sharded over the various microservices: it controls how many replicas of pipeline-gw process a given pipeline, how many dataflow-engine replicas process a given pipeline, and how many model-gw replicas process a given model)
  • In 2.11 the other settings will work the same (in terms of propagation of changes throughout the Core 2 install)

What

Summary of changes

  • Introduce a ScalingConfig setting in SeldonConfig, the settings of which are read by the scheduler only
  • No longer rely on SeldonConfig.Config.KafkaConfig.Topics.* settings for microservices (aim to port over to those in 2.11) - leave the config options in place so that customers can upgrade their SeldonConfig progressively
  • Add configuration watchers to react to ScalingConfig changes.

Checklist

  • Added/updated unit tests
  • Added/updated documentation
  • Checked for typos in variable names, comments, etc.
  • Added licences for new files

Notes for reviewers

The crux of the changes are in (it would help to review those first):

  • operator/apis/mlops/v1alpha1/seldonconfig_types.go
  • scheduler/pkg/config/watcher.go
  • scheduler/pkg/scaling/config/config.go
  • scheduler/cmd/scheduler/main.go
  • scheduler/pkg/kafka/dataflow/server.go
  • scheduler/pkg/server/server.go

@lc525 lc525 marked this pull request as draft October 3, 2025 06:11
@lc525 lc525 added the v2 label Oct 3, 2025
@lc525 lc525 force-pushed the quickfix/fix-seldonconfig-kafkaconfig-topics branch 3 times, most recently from 6f28a51 to d52eec1 Compare October 7, 2025 14:01
lc525 added 9 commits October 7, 2025 17:41
The new generic config watcher is able to watch configs residing in plain
files, files mounted via k8s ConfigMap volumes, and k8s ConfigMaps directly
(via informers) when configured.
The operator reads the SeldonConfig and updates a ConfigMap called
`seldon-scaling`, which is then mounted by components and watched for
changes.

The scaling settings from SeldonConfig are also used to validate the
replica count of components in SeldonRuntime.
automatically-generated files only, safe to ignore during review
….topics

Before Core 2.11, keep the number of partitions and replication factor as
microservice environment variables. Those settings are also present in the
newly introduced ScalingConfig, but they have no effect atm.

This is to ensure a smooth upgrade path, and make it possible in the future
to react to changes in those variables by watching the ConfigMap.
…chart values

This ensures that for helm-chart users, the smoothest upgrade possible exists:
The values will be correctly set in 2.10, and will be then upgradeable at
runtime in 2.11.
@lc525 lc525 force-pushed the quickfix/fix-seldonconfig-kafkaconfig-topics branch from 30d0ad5 to 92e3f4b Compare October 7, 2025 21:21
@lc525 lc525 marked this pull request as ready for review October 7, 2025 21:23
@lc525 lc525 merged commit ef24bf9 into SeldonIO:v2 Oct 7, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants