Skip to content

Conversation

@eznix86
Copy link

@eznix86 eznix86 commented Dec 23, 2025

fixes #86 . The shim now flips its scaledDown flag before calling CRIU

@ctrox
Copy link
Owner

ctrox commented Dec 27, 2025

I don't really understand how this would fix liveness probes preventing scaledown. The referenced issue was most likely already fixed in #89, it's just that I never heard back from the reporter.

If the checkpoint is successful it will set the state to scaled down, in all other cases not. If we set it to scaled down before the checkpoint is done, we are changing the state too early and even if we roll back it will lead to a flapping state on failed checkpoints.

beforeCheckpoint := time.Now()
c.setScaledDownFlag(true)
defer func() {
if retErr != nil {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retErr is always nil or am I missing something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i was testing something forgot to remove it.

@eznix86
Copy link
Author

eznix86 commented Dec 27, 2025

I don't really understand how this would fix liveness probes preventing scaledown. The referenced issue was most likely already fixed in #89, it's just that I never heard back from the reporter.

If the checkpoint is successful it will set the state to scaled down, in all other cases not. If we set it to scaled down before the checkpoint is done, we are changing the state too early and even if we roll back it will lead to a flapping state on failed checkpoints.

Actually the problem appeared when i tested nginx with a liveness probe, when the nginx was scaled down, it was on crashbackloop. Then i tried to look for the problem and ended here. (At least thats the reason of the fix)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Liveness probes are preventing scaling to zero

2 participants