Skip to content

Conversation

@fabriziopandini
Copy link
Member

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

/area documentation

@k8s-ci-robot k8s-ci-robot added area/documentation Issues or PRs related to documentation cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 1, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign chrischdi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Dec 1, 2025
@fabriziopandini fabriziopandini requested review from alexander-demicev and furkatgofurov7 and removed request for elmiko and richardcase December 1, 2025 13:50
@fabriziopandini fabriziopandini force-pushed the Update-in-place-update-proposal branch from 873a1ce to cffc8a3 Compare December 1, 2025 14:00
@fabriziopandini
Copy link
Member Author

/cherry-pick release-1.12

@k8s-infra-cherrypick-robot

@fabriziopandini: once the present PR merges, I will cherry-pick it on top of release-1.12 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@fabriziopandini fabriziopandini force-pushed the Update-in-place-update-proposal branch from cffc8a3 to e9b8505 Compare December 1, 2025 14:03
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 1, 2025
@fabriziopandini
Copy link
Member Author

/cherry-pick release-1.12

@k8s-infra-cherrypick-robot

@fabriziopandini: once the present PR merges, I will cherry-pick it on top of release-1.12 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@fabriziopandini fabriziopandini added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Dec 2, 2025
@fabriziopandini fabriziopandini force-pushed the Update-in-place-update-proposal branch from ef31d28 to 595ae83 Compare December 2, 2025 11:57
Copy link
Contributor

@alexander-demicev alexander-demicev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot! We should add a note to not forget to update infra machine contract too.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 2, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 18d3cbb6c27cde15c6e9dab3dee09e56952ad61e

@fabriziopandini
Copy link
Member Author

We should add a note to not forget to update infra machine contract too.

Added to the tracking issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a separate PR / Bugfix so it shows up in the release notes properly too? (ok to keep it on this, just curious)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes definitely (and this PR already merged)

* [Implementing Runtime Extensions](./implement-extensions.md)
* [Implementing Lifecycle Hook Extensions](./implement-lifecycle-hooks.md)
* [Implementing Topology Mutation Hook Extensions](./implement-topology-mutation-hook.md)
* [Implementing In-Place Update Hooks Extensions](./implement-in-place-update-hooks.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we should either use In-place or In-Place everywhere :-)


## Introduction

The proposal for [n-place updates in Cluster API](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/20240807-in-place-updates.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The proposal for [n-place updates in Cluster API](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/20240807-in-place-updates.md)
The proposal for [in-place updates in Cluster API](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/20240807-in-place-updates.md)

is “buffer” for in-place, in-place update can proceed.
- When in-place is possible, the system should try to in-place update as many machines as possible.
In practice, this means that maxSurge might be not fully used (it is used only for scale up by one if maxUnavailable=0).
- No in-place updates are performed for workers machines when using rollout strategy on delete.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- No in-place updates are performed for workers machines when using rollout strategy on delete.
- No in-place updates are performed for workers machines when using rollout strategy `OnDelete`.


This hook is called by KCP when performing the "can update in-place" for a control plane machine.

Example request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example request
Example request:

- Only spec is provided, status fields are not included
- When more than one extension will be supported, the current state will already include changes that can handle in-place by other runtime extensions.
Example Response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example Response
Example Response:

Note:
- All the objects will have the latest API version known by Cluster API.
- Only spec is provided, status fields are not included
- When more than one extension will be supported, the current state will already include changes that can handle in-place by other runtime extensions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"will be supported" is unclear to me, can we be more specific?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified, PTAL

This hook is called by the MachineDeployment controller when performing the "can update in-place" for all the Machines controlled by
a MachineSet.
Example request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example request
Example request:

- Only spec is provided, status fields are not included
- When more than one extension will be supported, the current state will already include changes that can handle in-place by other runtime extensions.
Example Response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example Response
Example Response:

This hook is called by the Machine controller when performing the in-place updates for a Machine.
Example request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example request
Example request:

- Only desired is provided (the external updater extension should know current state of the Machine).
- Only spec is provided, status fields are not included
Example Response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example Response
Example Response:


<h1>High complexity</h1>

Implementing the in-place update transition in a race condition free, re-entrant way is more complex that it might seem.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Implementing the in-place update transition in a race condition free, re-entrant way is more complex that it might seem.
Implementing the in-place update transition in a race condition-free, re-entrant way is more complex than it might seem.

As a first step, CAPI controllers will compute the set of desired changes (current and desired state).

If any of the desired changes cannot be covered by the updaters capabilities, CAPI will determine the desired state cannot be reached through external updaters. In this case, it will fallback to the rolling update strategy, replacing machines as needed.
Then CAPI controllers will then iterate over the registered external updaters, requesting to each updater if it can handle required changes through the `CanUpdateMachineSet` (MachineDeployment) and `CanUpdateMachine` (KCP).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Then CAPI controllers will then iterate over the registered external updaters, requesting to each updater if it can handle required changes through the `CanUpdateMachineSet` (MachineDeployment) and `CanUpdateMachine` (KCP).
Then CAPI controllers will iterate over the registered external updaters, requesting each updater if it can handle required changes through the `CanUpdateMachineSet` (MachineDeployment) and `CanUpdateMachine` (KCP).

@fabriziopandini fabriziopandini force-pushed the Update-in-place-update-proposal branch from 66f9925 to f577682 Compare December 3, 2025 11:57
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 3, 2025
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 3, 2025
@fabriziopandini fabriziopandini force-pushed the Update-in-place-update-proposal branch from f577682 to b3e756d Compare December 3, 2025 12:01
@fabriziopandini
Copy link
Member Author

fabriziopandini commented Dec 3, 2025

Feedback addressed, also found a two other places where to add a few words about in-place

@stmcginnis
Copy link
Contributor

Looks like all comments have been addressed, and overall looks good to me.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 3, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 6e60c5f91e39ba912b6cfef8b988dbc6c9e9e94b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/documentation Issues or PRs related to documentation cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants