Skip to content

design: Node Image Version Pinning in Karpenter#1720

Open
rakechill wants to merge 13 commits into
mainfrom
rakechill/node-image-pinning-design
Open

design: Node Image Version Pinning in Karpenter#1720
rakechill wants to merge 13 commits into
mainfrom
rakechill/node-image-pinning-design

Conversation

@rakechill

@rakechill rakechill commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Design document for adding imageVersion pinning support to AKSNodeClass, allowing customers to pin node image selection to a specific AKS node image version (e.g., 202604.24.0).

Motivation

Karpenter users can set imageFamily but cannot currently:

  • Pin to a known-good version when the latest image has a regression
  • Decouple node image updates from Kubernetes version upgrades
  • Stay on a stable image version for a bounded period

The only workaround today is disabling Karpenter entirely and falling back to AKS autoscaling groups.

Key Design Points

  • New imageVersion field on AKSNodeClassSpec:
    • Pins node image selection to a specific version
    • Validated at admission via regex ^\d{6}\.\d{2}\.\d+$ (current YYYYMM.DD.patch format only)
  • imageCreateDate per image in status.images[]:
    • Immutable release date stored once at reconciliation time
    • Consumers compute age on read, avoiding daily status churn
  • ImageWithinSupportWindow status condition:
    • Warning-only (never blocks provisioning)
    • Not a readiness-dependent condition
  • karpenter_image_age_days gauge metric:
    • Computed at scrape time from imageCreateDate
  • Cache key fix:
    • imageVersion is included in cacheKey()
    • Pin changes force a fresh lookup so status reflects .spec.imageVersion correctly
  • Maintenance window behavior with pinning:
    • When imageVersion is set, pin is authoritative
    • Reconciler skips maintenance-window update gating/override logic for latest selection
  • Drift detection:
    • Continues to work via existing status.images[] vs node image comparison
  • AKSNodeClassSpec.ImageID cleanup:
    • Existing hidden/unused stub is removed as part of this design
  • Alternatives considered:
    • do-not-disrupt annotation
    • EKS-style amiSelectorTerms pattern
    • NodePool.expireAfter
    • All evaluated against rollback, k8s/image decoupling, and bounded stability-window goals

Open Questions (pending Node SIG response)

  • Is there a supported compatibility contract between Kubernetes version and node image version?
  • If a user pins a node image version and upgrades Kubernetes, should pinning across that upgrade be allowed, blocked, or explicitly validated?
  • On Kubernetes upgrades that trigger image refresh (ImagesReady=False), should pinned versions be re-validated against the new Kubernetes version?
  • Is ListNodeImageVersions presence the intended compatibility signal, or is there another authoritative source?

Related

rakechill added 5 commits May 21, 2026 21:39
- Reframe support window without hardcoding 90-day duration
- 90 days is now a warning threshold (matching AppLens), not policy
- Add 'Path to enforcement' explaining how to promote condition to readiness dependent
- Add open questions for threshold configurability and enforcement strategy
- Fix SLA references to use 'AKS support window' throughout
- Move Prepared Image Spec background to section 6 (future extensibility)
- Replace imageAge (metav1.Duration) with imageAgeDays (int32) for readability
- Clarify image version existence validation flow across definitions
- Fix relationship map alignment
- Renumber all sections and cross-references
@rakechill rakechill marked this pull request as ready for review June 10, 2026 19:12
AKS image versions exist in two observed formats:
- Current: YYYYMM.DD.patch (e.g. 202512.06.0)
- Legacy:  YYYY.MM.DD     (e.g. 2022.10.03)

The existing isNewerVersion() in nodeimageversionsclient.go already
handles both (added in commit 3a9cbc0, PR #526). The admission regex
and age-parsing description are updated to match.

Copy link
Copy Markdown
Contributor Author

Finding: imageVersion regex should accept both AKS version formats

While cross-referencing the design against the existing codebase, we found that AKS image versions exist in two distinct formats, not just one:

Format Example Used for
YYYYMM.DD.patch 202512.06.0 Current AKS releases (since ~2023)
YYYY.MM.DD 2022.10.03 Legacy AKS releases

The original design had a single-format regex (^\d{6}\.\d{2}\.\d+$) that would silently reject legacy version strings as valid pin targets — for example, a customer attempting to rollback to an older image using a YYYY.MM.DD version would get an admission error.

Evidence in the codebase:

Fix applied: Updated the admission regex to an OR pattern:

!!:s^(\d{6}\.\d{2}\.\d+|\d{4}\.\d{2}\.\d{2})$

This matches both formats at admission time. Gallery existence validation in the status reconciler remains the authoritative check regardless of format.

- Cache key: require imageVersion in cacheKey() to force immediate
  refresh on pin set/change (was silently stale for up to 3 days)
- Non-goals: AKS Machine API + CIG pinning is explicitly out of scope;
  CIG + bootstrapping client path works naturally via full ID comparison
- Drift table: document per-path drift support clearly, call out
  AKS Machine API + CIG as non-goal
- Age parsing: add two-branch pseudocode for YYYYMM.DD.patch vs
  YYYY.MM.DD formats, following isNewerVersion() pattern
- Support window threshold: resolve as controller flag
  (--image-support-window-warning-days, default 90), not Helm value,
  so self-hosted users can configure it too
|---|---|
| **In-window** | Image age ≤ 90 days from release date. No restrictions; full support. |
| **Outside-supported-window** | Image age > 90 days. Support may require upgrading to a newer image as a prerequisite for resolution. |
| **Unsupported** | Potential future enforcement state where specific operations (e.g., scale-up) could be blocked. Not part of initial rollout. |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something we need to talk/work with node sig on, but I saw Jorge pushing back on this and I agree with that.

I guess not critical here as we're punting on it though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I already discussed with node sig folks and PMs (including Jorge). They are still ironing out the exact long-term policy, but are aligned with adding the 90 days warning at the moment.

1. AKS ships node images frequently: **weekly for Linux, monthly for Windows**.
2. Customers are expected to update regularly — via auto-upgrade channels or manual upgrades.
3. **Rollback and pinning are supported flexibility mechanisms**, but they **do not override lifecycle expectations**. They are meant for short-term operational recovery, not long-term avoidance of updates.
4. Within the supported window, scale-up is expected to be **unconditionally allowed** regardless of cluster state.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and at least for now, outside of the supported window scale up is also allowed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify that here in the doc?

If we expose a version-pinning mechanism, it should:

- Allow customers to pin within the supported window.
- Surface image age as observable state (e.g., status conditions).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Age, or publish date?

Is there a pattern for computing an always-increasing counter in k8s resources? I think usually you see something like: lastUpdatedAt, rather than timeSinceLastUpdated.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see you propose imageAgeDays, which we could do, but I'd suggest doing a date (2026-06-12) or datetime (2026-06-12 00:00:00).

I think that's more standard?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI agrees:
🤖

Image Creation Date vs. Image Age in Days

Option 1: imageCreateDate: 2026-06-12 is the better choice for a CRD field.

Why Creation Date wins:

Aspect Creation Date Age in Days
Staleness Always accurate — immutable fact Stale immediately after being written; requires continuous reconciliation to stay correct
Reconciliation cost No updates needed Must be updated daily or becomes wrong
Consistency Same value regardless of when you read it Value depends on when it was last reconciled vs. when you read it
Conflict/churn No status churn Daily updates cause unnecessary resource version bumps, watch events, and potential conflicts
Expressiveness Consumer can compute age, compare dates, bucket by month, etc. Loses the original date; can't derive it back without knowing when the field was last updated
Time zones / clock skew Standard metav1.Time or date string — well-understood Integer that's only meaningful relative to "now", which differs across controllers
Kubernetes convention Matches patterns like creationTimestamp, lastTransitionTime No precedent in upstream APIs for "age in days" fields

When age might be preferable:

  • If you want a user-facing display hint (like kubectl get columns showing "3d") — but that's better done in a printer column computed from the date.
  • If the source data genuinely only provides age without an absolute timestamp (rare).

Recommendation:

Store the creation date (metav1.Time or a date string). Let consumers (controllers, CEL validation, printer columns) compute age on read. This follows the same principle as ConditionTypeConsistentStateFound in the file you're looking at — conditions store lastTransitionTime, not "duration since last transition."

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: update this to be a static date rather than calculated days since

Comment thread designs/0012-node-image-version-pinning.md
AKSNodeClass.status.images[] (reconciled by nodeclass status controller)
│ e.g. [{ ID: "/CommunityGalleries/.../versions/202604.24.0",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, will reflect

When `imageVersion` is **unset** (default), drift detection works as
today — comparing against the latest available version.

No changes are needed to `isImageVersionDrifted()` itself — it already

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to make sure that the Status controller takes the .spec field for pinning into account when doing the 72h cache thing it does.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: add this detail into the design

- Prepared Image Spec operates **outside** the AKS-managed galleries —
it references a customer-owned image resource.

A well-shaped API keeps these as distinct, non-overlapping fields

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

`ImagesReady` to `False`, which triggers a full image refresh
(Scenario A, case 2 in the existing code).

**When `imageVersion` is set**, this refresh should still resolve to the

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the nodes roll or they don't roll?

I think they have to roll? Technically they can roll to the same image but they have to roll.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They'll still roll, but node image (likely) shouldn't change -- still need to check w/ node sig on this

TODO: add that the nodes still get upgraded, but same image version

included to demonstrate that the API shape in section 5 is
forward-compatible with upcoming AKS capabilities.

### 6.0 Background: Prepared Image Specification

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good we have this -- just noting I didn't review it as closely.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm still deciding if adding this as a secondary (related) design might be better. Maybe a different PR left in draft until we return to this. I think @comtalyst's meeting this week will help inform/shape this design, too.

Remaining questions not covered by the design sections above:

### Operational
1. Should we emit Kubernetes events when a pinned version crosses the

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where/on what resource?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably AKSNodeclass.


## 1. Background: The Node Image Support Window

AKS is formalizing its **support window for node images**. While no

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after review: We also don't discuss different options here. I personally tend towards pinning as you have proposed here, because I think it solves the problem as well as some other user complaints as well, but we should at least have some discussion/investigation about what other possible solutions look like, where we enable the user to control when the node image version is applied, but:

  • Don't let them pick the specific version
  • Don't let them pin across k8s version updates

Maybe in the appendix/at the end we can discuss those options at a high level and why we feel that they do not suit?

@rakechill rakechill requested a review from matthchr June 20, 2026 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for setting ImageID for nodeClass

2 participants