Skip to content

Conversation

@qurname2
Copy link
Collaborator

@qurname2 qurname2 commented Oct 14, 2025

PR is aiming to solve Issue #374.
Overview
This PR implements comprehensive StatefulSet spec change detection to enable the operator to detect and respond to configuration changes (resources, env vars, volumes, etc.) beyond just replica count changes and images.

Changes:

  1. Field-by-Field Pod Template Comparison, link
    That logic compares:
  • Containers & InitContainers: Image, command, args, env vars, resources, volume mounts
  • Volumes: ConfigMap, Secret, PersistentVolumeClaim, EmptyDir, HostPath
  • Scheduling: Node selectors, tolerations, affinity rules
  • Service Account: Service account name
    I tried to implement comparison using k8s.io/apimachinery/pkg/api/equality package, but it led to the infinite recreating loop. The same applies to the strategicpatch.CreateTwoWayMergePatch (it uses json document merging, not semantic equality, but the problem is similar)
    Both of them detects Kubernetes default values, that doesn't matter for us to compare, but led to the false-positive Update status on the operator.
    Here you can see how postgres-operator compares sts's fields: https://github.com/zalando/postgres-operator/blob/3bc244fe397ad084de8ddc3e02a1670f6b4c0949/pkg/cluster/cluster.go#L648
    tl;dr - also field-by-field
  1. Component-Specific Modifications
  • Fixed YqlAgent false positive detection.

Problem: YqlAgent adds environment variables (YT_FORCE_IPV4, YT_FORCE_IPV6, token) and modifies commands AFTER building the base spec, causing mismatches during comparison
Solution: Extracted modifications into applyYqlAgentContainerModifications()
Overrode needUpdate() and needStatefulSetSpecUpdate() to apply modifications before comparison

  • execNode - ExecNode combines NodeResources + JobResources, requiring special handling. Overrode needStatefulSetSpecUpdate() to compute expected total resources before comparison

  • httProxy - because httpProxy adds volumes for TLS encryption after the buildStatefulSet() and it will cause infinite update loop, cause during every reconcilation loop operator will think there is difference in the statefulSet spec.

  • rpcProxy - the same as for http-proxy

  1. Server Infrastructure Updates: pkg/components/server.go
    Added needUpdateWithSpecCheck() to allow components to provide custom spec comparison logic.
    Updated needUpdate to use the new comparison function
  2. E2E Tests (test/e2e/ytsaurus_controller_test.go)
    Added several tests:
  • Discovery Resource Update: Tests CPU/memory limit changes with proper UpdateSelector set.
  • Discovery Environment Variables Update: Verifies env var changes trigger updates with proper UpdateSelector set.
  • Discovery nodeSelector update: Verifies nodeSelector will be updated with proper UpdateSelector set.
  • Blocked Updates: Validates that UpdatePlan correctly blocks updates when configured

@qurname2 qurname2 force-pushed the qurname2/nytsre-922 branch from 1b65b03 to 99ceb81 Compare October 15, 2025 21:20
@koct9i
Copy link
Collaborator

koct9i commented Oct 20, 2025

Why we cannot just generate new sts spec, marshal .Spec and compare?

If you want to ignore some difference in order that could be normalization which sorts some list before serialization into json

@l0kix2
Copy link
Collaborator

l0kix2 commented Oct 20, 2025

Why we cannot just generate new sts spec, marshal .Spec and compare?

If you want to ignore some difference in order that could be normalization which sorts some list before serialization into json

Agree, I think with so many fields it would be a painful experience to support explicit comparison of each one.

Also do the mutable/immutable sts fields considered? Do we ignore immutable or everything will just stuck on error?

@qurname2
Copy link
Collaborator Author

qurname2 commented Oct 20, 2025

Why we cannot just generate new sts spec, marshal .Spec and compare?

If you want to ignore some difference in order that could be normalization which sorts some list before serialization into json

@koct9i @l0kix2
The answer in the description:
I tried to implement comparison using k8s.io/apimachinery/pkg/api/equality package, but it led to the infinite recreating loop. The same applies to the strategicpatch.CreateTwoWayMergePatch (it uses json document merging, not semantic equality, but the problem is similar)
Both of them detects Kubernetes default values, that doesn't matter for us to compare, but led to the false-positive Update status on the operator.

Also do the mutable/immutable sts fields considered? Do we ignore immutable or everything will just stuck on error?

I added only mutable fields to the comparison, for the immutable ones we need to create additional Issue and implementation with "delete sts --cascade=orphan" command under the hood, but it is different story

@koct9i
Copy link
Collaborator

koct9i commented Oct 23, 2025

Maybe we could completely eliminate content comparison of managed resources.

"Controller" resorce (ytsaurus) has generation which is updated at any changes.
We could save this generation into managed resources metadata.

We just compare controller resource generation and generation saved in managed resource.
At difference we generate new managed resource and apply.
If nothing have changed -> next level controller will do nothing.

@l0kix2
Copy link
Collaborator

l0kix2 commented Oct 24, 2025

For now, I think that field-by-field comparison approach is not our best option.

  • it looks very hard to maintain,
  • some components need extra code for compare,
  • we explicitly need to add each new field to detect
  • it probably fragile — if someone changes build logic without changing comparision logic — we may end up with ugly situation that ytop is in endless loop trying to hopefully update something.
  • all those extra need-update-methods in server/sts code are adding complexity and I'm not sure that we can afford that because no one would be able to understand all this code.

I suggest we go back to the issue and discuss our options #374

@koct9i koct9i added the Update Cluster update and reconfiguration label Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Update Cluster update and reconfiguration

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants