Skip to content

MachinePool: avoid SetNotReady during normal processing #5537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions azure/scope/machinepool.go
Original file line number Diff line number Diff line change
Expand Up @@ -615,19 +615,20 @@ func (m *MachinePoolScope) setProvisioningStateAndConditions(v infrav1.Provision
} else {
conditions.MarkFalse(m.AzureMachinePool, infrav1.ScaleSetDesiredReplicasCondition, infrav1.ScaleSetScaleDownReason, clusterv1.ConditionSeverityInfo, "")
}
m.SetNotReady()
m.SetReady()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is probably evolving from the observation listed in #5515, i.e.

We are observing issues with a customer that sees NICs sometimes enter a ProvisioningFailed state yet continue operating, which then cascades to prevent any further action on the dependent resources, such as the virtual machines.

Shouldn't the right approach in addressing this be something along the lines of below ?

These states are metadata properties of the resource. They're independent from the functionality of the resource itself. Being in the failed state doesn't necessarily mean that the resource isn't functional. In most cases, it can continue operating and serving traffic without issues.

In several scenarios, if the resource is in the failed state, further operations on the resource or on other resources that depend on it might fail. You need to revert the state back to succeeded before running other operations.
...
To restore succeeded state, run another write (PUT) operation on the resource.

The issue that caused the previous operation might no longer be current. The newer write operation should be successful and restore the provisioning state.

Reference: #5515

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the referenced issue but didn't investigate if the underlying issue is similar. From my understanding (what I wrote in the other comment reply), a VMSS might be different in that it makes it's provisioningState dependant on the provisioningStates of the running VMs/Instances in it.
Marking the VMSS as not ready has many implications, most notably that providerIdList is not processed anymore and therefore having dangling VMs lying around which don't get into CAPI/CAPZ at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a VMSS might be different in that it makes it's provisioningState dependant on the provisioningStates of the running VMs/Instances in it.
Marking the VMSS as not ready has many implications, most notably that providerIdList is not processed anymore and therefore having dangling VMs lying around which don't get into CAPI/CAPZ at all.

Thank you for providing context on this.


As a users, it seems quite odd to me that the CAPZ controller would mark the AzureMachinePool's Status as Ready when

  1. *m.MachinePool.Spec.Replicas != m.AzureMachinePool.Status.Replicas
  2. infrav1.ProvisioningState == infrav1.Updating

If I were to stretch my understanding, it is sort of acceptable for AzureMachinePool to mark itself ready when infrav1.ProvisioningState == infrav1.Succeeded and the *m.MachinePool.Spec.Replicas != m.AzureMachinePool.Status.Replicas. This implies to me that Azure has acknowledged AzureMachinePools request(hence infrav1.ProvisioningState == infrav1.Succeeded) and is working to get the desired replicas.

However, when infrav1.ProvisioningState == infrav1.Updating, it does not seem right to be broadcasting that AzureMachinePool is Ready when Azure is clearly working to get the VMs assigned and added to the VMSS.

Is my understanding wrong here ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most notably that providerIdList is not processed anymore and therefore having dangling VMs lying around which don't get into CAPI/CAPZ at all.

I need to probe this further, but relying on its own (AMP) status to progress in its reconciliation logic is not a good pattern. It will only lead to more dependence on its own status.

Copy link
Contributor Author

@mweibel mweibel Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I outlined what happens when the AMP is not ready in the issue here: #4982 (comment)

see also the code from CAPI: https://github.com/kubernetes-sigs/cluster-api/blob/8d639f1fad564eecf5bda0a2ee03c8a38896a184/exp/internal/controllers/machinepool_controller_phases.go#L290-L319

From what I understand:

  • The .Status.Ready is used by CAPI to determine if it should be reconciled.
  • If not ready the whole MP does not get reconciled anymore

The new v1beta2 status make it much more clear (Initialization status).

This change here ensures that we don't avoid processing the AMP/MP with the current API version.

As a users, it seems quite odd to me that the CAPZ controller would mark the AzureMachinePool's Status as Ready when

I always read the AMP Ready status as: It's possible to work with the AMP, i.e. scale up and down is possible. Which is the case, even if with a Replica mismatch or a updating provisioningState.

The AMP Ready status is read by the CAPI MP controller. Therefore it has implications broader than just for the CAPZ controller. I think the Replicas difference or the ProvisioningState difference should not be reflected in the Ready flag but instead with conditions.

Does that make sense or do I see something not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on getting back on this.

The AMP Ready status is read by the CAPI MP controller. Therefore it has implications broader than just for the CAPZ controller.

That makes sense. It is not fault-tolerant for a resource controller to determine next steps based on its own status, rather than spec.

I think the Replicas difference or the ProvisioningState difference should not be reflected in the Ready flag but instead with conditions.

I agree with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic to update the status of a MachinePool needs to be revisited in my opinion. It needs to be better. Sorry to push back on this so much.

The area of controversy to me is that the way we are updating the status of MachinePool even when it is in Updating state or if the total replicas requested is more than current.
For instance, is it valid to update the status of AzureMachinePool to Ready when the actual number of replicas is 0, but the desired replica count is greater than 1 ? Meaning, when the Azure Machine Pool is still spinning up ?

case v == infrav1.Updating:
conditions.MarkFalse(m.AzureMachinePool, infrav1.ScaleSetModelUpdatedCondition, infrav1.ScaleSetModelOutOfDateReason, clusterv1.ConditionSeverityInfo, "")
m.SetNotReady()
m.SetReady()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a comment on rational behind changing this behavior ?

case v == infrav1.Creating:
conditions.MarkFalse(m.AzureMachinePool, infrav1.ScaleSetRunningCondition, infrav1.ScaleSetCreatingReason, clusterv1.ConditionSeverityInfo, "")
m.SetNotReady()
case v == infrav1.Deleting:
conditions.MarkFalse(m.AzureMachinePool, infrav1.ScaleSetRunningCondition, infrav1.ScaleSetDeletingReason, clusterv1.ConditionSeverityInfo, "")
m.SetNotReady()
case v == infrav1.Failed:
conditions.MarkFalse(m.AzureMachinePool, infrav1.ScaleSetRunningCondition, infrav1.ScaleSetProvisionFailedReason, clusterv1.ConditionSeverityInfo, "")
Comment on lines +628 to +629
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be marking the resource as NotReady when the resource's Provisioning state is infrav1.Failed ?

Copy link
Contributor Author

@mweibel mweibel Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I gathered in my experiments, the VMSS marks itself as failed if one VM has failed provisoning state. If we mark the VMSS in this case as failed, further reconciliation is prevented until that one VM is either removed from the VMSS, or the VM goes into state succeeded again.

Because a VMSS can still scale up and down even if provisioningState is failed, it's wrong to mark the whole AMP as failed and prevent reconciling.

I explicitely didn't add m.SetReady() here because in case the VMSS is already in NotReady state due to something else, we shouldn't reset that flag when the provisioningState is failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self notes:
I agree that a VMSS can still scale up and down even if provisioningState is failed and marking it as failed is incorrect.
But is setting an AMP's status to NotReady equivalent to marking it as failed ?
Will probe..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in the scenario AzureMachinePool.Status.ProvisioningState == infrav1.Failed shouldn't we be setting the ready status to m.SetReady() so that the reconciler does not carry over the earlier state?

default:
conditions.MarkFalse(m.AzureMachinePool, infrav1.ScaleSetRunningCondition, string(v), clusterv1.ConditionSeverityInfo, "")
m.SetNotReady()
}
}

Expand Down
174 changes: 174 additions & 0 deletions azure/scope/machinepool_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ import (
"k8s.io/utils/ptr"
clusterv1 "sigs.k8s.io/cluster-api/api/v1beta1"
expv1 "sigs.k8s.io/cluster-api/exp/api/v1beta1"
"sigs.k8s.io/cluster-api/util/conditions"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/client/fake"

Expand Down Expand Up @@ -1577,6 +1578,179 @@ func TestMachinePoolScope_applyAzureMachinePoolMachines(t *testing.T) {
}
}

func TestMachinePoolScope_setProvisioningStateAndConditions(t *testing.T) {
scheme := runtime.NewScheme()
_ = clusterv1.AddToScheme(scheme)
_ = infrav1exp.AddToScheme(scheme)

tests := []struct {
Name string
Setup func(mp *expv1.MachinePool, amp *infrav1exp.AzureMachinePool, cb *fake.ClientBuilder)
Verify func(g *WithT, amp *infrav1exp.AzureMachinePool, c client.Client)
ProvisioningState infrav1.ProvisioningState
}{
{
Name: "if provisioning state is set to Succeeded and replicas match, MachinePool is ready and conditions match",
Setup: func(mp *expv1.MachinePool, amp *infrav1exp.AzureMachinePool, cb *fake.ClientBuilder) {
mp.Spec.Replicas = ptr.To[int32](1)
amp.Status.Replicas = 1
},
Verify: func(g *WithT, amp *infrav1exp.AzureMachinePool, c client.Client) {
g.Expect(amp.Status.Ready).To(BeTrue())
g.Expect(conditions.Get(amp, infrav1.ScaleSetRunningCondition).Status).To(Equal(corev1.ConditionTrue))
g.Expect(conditions.Get(amp, infrav1.ScaleSetModelUpdatedCondition).Status).To(Equal(corev1.ConditionTrue))
g.Expect(conditions.Get(amp, infrav1.ScaleSetDesiredReplicasCondition).Status).To(Equal(corev1.ConditionTrue))
},
ProvisioningState: infrav1.Succeeded,
},
{
Name: "if provisioning state is set to Succeeded and replicas are higher on AzureMachinePool, MachinePool is ready and ScalingDown",
Setup: func(mp *expv1.MachinePool, amp *infrav1exp.AzureMachinePool, cb *fake.ClientBuilder) {
mp.Spec.Replicas = ptr.To[int32](1)
amp.Status.Replicas = 2
},
Verify: func(g *WithT, amp *infrav1exp.AzureMachinePool, c client.Client) {
g.Expect(amp.Status.Ready).To(BeTrue())
condition := conditions.Get(amp, infrav1.ScaleSetDesiredReplicasCondition)
g.Expect(condition.Status).To(Equal(corev1.ConditionFalse))
g.Expect(condition.Reason).To(Equal(infrav1.ScaleSetScaleDownReason))
},
ProvisioningState: infrav1.Succeeded,
},
{
Name: "if provisioning state is set to Succeeded and replicas are lower on AzureMachinePool, MachinePool is ready and ScalingUp",
Setup: func(mp *expv1.MachinePool, amp *infrav1exp.AzureMachinePool, cb *fake.ClientBuilder) {
mp.Spec.Replicas = ptr.To[int32](2)
amp.Status.Replicas = 1
},
Verify: func(g *WithT, amp *infrav1exp.AzureMachinePool, c client.Client) {
g.Expect(amp.Status.Ready).To(BeTrue())
condition := conditions.Get(amp, infrav1.ScaleSetDesiredReplicasCondition)
g.Expect(condition.Status).To(Equal(corev1.ConditionFalse))
g.Expect(condition.Reason).To(Equal(infrav1.ScaleSetScaleUpReason))
},
ProvisioningState: infrav1.Succeeded,
},
{
Name: "if provisioning state is set to Updating, MachinePool is ready and scale set model is set to OutOfDate",
Setup: func(mp *expv1.MachinePool, amp *infrav1exp.AzureMachinePool, cb *fake.ClientBuilder) {},
Verify: func(g *WithT, amp *infrav1exp.AzureMachinePool, c client.Client) {
g.Expect(amp.Status.Ready).To(BeTrue())
condition := conditions.Get(amp, infrav1.ScaleSetModelUpdatedCondition)
g.Expect(condition.Status).To(Equal(corev1.ConditionFalse))
g.Expect(condition.Reason).To(Equal(infrav1.ScaleSetModelOutOfDateReason))
},
ProvisioningState: infrav1.Updating,
},
{
Name: "if provisioning state is set to Creating, MachinePool is NotReady and scale set running condition is set to Creating",
Setup: func(mp *expv1.MachinePool, amp *infrav1exp.AzureMachinePool, cb *fake.ClientBuilder) {},
Verify: func(g *WithT, amp *infrav1exp.AzureMachinePool, c client.Client) {
g.Expect(amp.Status.Ready).To(BeFalse())
condition := conditions.Get(amp, infrav1.ScaleSetRunningCondition)
g.Expect(condition.Status).To(Equal(corev1.ConditionFalse))
g.Expect(condition.Reason).To(Equal(infrav1.ScaleSetCreatingReason))
},
ProvisioningState: infrav1.Creating,
},
{
Name: "if provisioning state is set to Deleting, MachinePool is NotReady and scale set running condition is set to Deleting",
Setup: func(mp *expv1.MachinePool, amp *infrav1exp.AzureMachinePool, cb *fake.ClientBuilder) {},
Verify: func(g *WithT, amp *infrav1exp.AzureMachinePool, c client.Client) {
g.Expect(amp.Status.Ready).To(BeFalse())
condition := conditions.Get(amp, infrav1.ScaleSetRunningCondition)
g.Expect(condition.Status).To(Equal(corev1.ConditionFalse))
g.Expect(condition.Reason).To(Equal(infrav1.ScaleSetDeletingReason))
},
ProvisioningState: infrav1.Deleting,
},
{
Name: "if provisioning state is set to Failed, MachinePool ready state is not adjusted, and scale set running condition is set to Failed",
Setup: func(mp *expv1.MachinePool, amp *infrav1exp.AzureMachinePool, cb *fake.ClientBuilder) {},
Verify: func(g *WithT, amp *infrav1exp.AzureMachinePool, c client.Client) {
condition := conditions.Get(amp, infrav1.ScaleSetRunningCondition)
g.Expect(condition.Status).To(Equal(corev1.ConditionFalse))
g.Expect(condition.Reason).To(Equal(infrav1.ScaleSetProvisionFailedReason))
},
ProvisioningState: infrav1.Failed,
},
{
Name: "if provisioning state is set to something not explicitly handled, MachinePool ready state is not adjusted, and scale set running condition is set to the ProvisioningState",
Setup: func(mp *expv1.MachinePool, amp *infrav1exp.AzureMachinePool, cb *fake.ClientBuilder) {},
Verify: func(g *WithT, amp *infrav1exp.AzureMachinePool, c client.Client) {
condition := conditions.Get(amp, infrav1.ScaleSetRunningCondition)
g.Expect(condition.Status).To(Equal(corev1.ConditionFalse))
g.Expect(condition.Reason).To(Equal(string(infrav1.Migrating)))
},
ProvisioningState: infrav1.Migrating,
},
}
for _, tt := range tests {
t.Run(tt.Name, func(t *testing.T) {
var (
g = NewWithT(t)
mockCtrl = gomock.NewController(t)
cb = fake.NewClientBuilder().WithScheme(scheme)
cluster = &clusterv1.Cluster{
ObjectMeta: metav1.ObjectMeta{
Name: "cluster1",
Namespace: "default",
},
Spec: clusterv1.ClusterSpec{
InfrastructureRef: &corev1.ObjectReference{
Name: "azCluster1",
},
},
Status: clusterv1.ClusterStatus{
InfrastructureReady: true,
},
}
mp = &expv1.MachinePool{
ObjectMeta: metav1.ObjectMeta{
Name: "mp1",
Namespace: "default",
OwnerReferences: []metav1.OwnerReference{
{
Name: "cluster1",
Kind: "Cluster",
APIVersion: clusterv1.GroupVersion.String(),
},
},
},
}
amp = &infrav1exp.AzureMachinePool{
ObjectMeta: metav1.ObjectMeta{
Name: "amp1",
Namespace: "default",
OwnerReferences: []metav1.OwnerReference{
{
Name: "mp1",
Kind: "MachinePool",
APIVersion: expv1.GroupVersion.String(),
},
},
},
}
vmssState = &azure.VMSS{}
)
defer mockCtrl.Finish()

tt.Setup(mp, amp, cb.WithObjects(amp, cluster))
s := &MachinePoolScope{
client: cb.Build(),
ClusterScoper: &ClusterScope{
Cluster: cluster,
},
MachinePool: mp,
AzureMachinePool: amp,
vmssState: vmssState,
}
s.setProvisioningStateAndConditions(tt.ProvisioningState)
tt.Verify(g, s.AzureMachinePool, s.client)
})
}
}

func TestBootstrapDataChanges(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
Expand Down
Loading