Skip to content

Conversation

joshfrench
Copy link

@joshfrench joshfrench commented Jul 2, 2025

What type of PR is this?
/kind feature

What this PR does / why we need it:
Support setting EKS AuthenticationMode.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Partially addresses #4854

Special notes for your reviewer:
This PR just finishes the work started in #5108 by rebasing, removing the API changes to v1beta1, and fixing the fuzzy tests. There were two seemingly unrelated changes in the original PR that I've removed:

  • Add CGFLAGS to build command (I'm dropping this as unrelated)
  • One documentation fix that I'll resubmit in its own PR

Checklist:

  • squashed commits
  • includes documentation
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:

Added support for setting EKS AuthenticationMode (required for migrating to EKS Access Entries)

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/contains-merge-commits cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority labels Jul 2, 2025
@k8s-ci-robot k8s-ci-robot requested review from AndiDog and faiq July 2, 2025 13:24
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fabriziopandini for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Welcome @joshfrench!

It looks like this is your first PR to kubernetes-sigs/cluster-api-provider-aws 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-provider-aws has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 2, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @joshfrench. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

@punkwalker punkwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshfrench
Thank you for working on this.This will unblock my work for EKS Hybrid Node Cluster #5245.
Just few suggestion/question. Otherwise LGTM.

Comment on lines 257 to 269
type AccessConfig struct {
// AuthenticationMode specifies the desired authentication mode for the cluster
// Defaults to CONFIG_MAP
// +kubebuilder:default=CONFIG_MAP
// +kubebuilder:validation:Enum=CONFIG_MAP;API;API_AND_CONFIG_MAP
AuthenticationMode EKSAuthenticationMode `json:"authenticationMode,omitempty"`
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are creating a AccessConfig, I suggest we add bootstrapClusterCreatorAdminPermissions as well,
If we do not plan to support this field, do we really need the AccessConfig struct?

WDYT @nrb @damdo ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bootstrapClusterCreationAdminPermissions can only be set at cluster creation; the UpdateAccessConfigRequest API doesn't allow changing this. I don't have a strong opinion on adding the new field vs. removing the struct, but we'll need to account for that.

(For additional context, #5583 adds AccessEntry definitions to this struct but now I'm second-guessing where those should live.)

Copy link
Contributor

@punkwalker punkwalker Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is what I think,
We have spec.iamAuthenticatorConfig in ManagedControlPlane. This is used for adding entries in aws-auth configmap.

Similary, we should have separate spec.AccessEntries for role/user mapping and spec.authenticationMode to specify access mode. Because, the default value for mode is CONFIG_MAP which would be required for creating mapping from [spec.iamAuthenticatorConfig].

@joshfrench Let me know what you think?

Also, bootstrapClusterCreatorAdminPermissions can be part of AuthenticationMode struct and have default value true as in CONFIG_MAP mode, regardless of the value, clusterCreator will have admin permissions.

@AndiDog Please share your thoughts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@punkwalker Agreed that spec.AccessEntries should be a separate field (still planning to handle that in a separate PR.)

To clarify, you are suggesting we add bootstrapClusterCreatorAdminPermissions to AccessConfig and default it to true?

type AccessConfig struct {
    // +kubebuilder:default=CONFIG_MAP
    // +kubebuilder:validation:Enum=CONFIG_MAP;API;API_AND_CONFIG_MAP
    AuthenticationMode EKSAuthenticationMode `json:"authenticationMode,omitempty"`

    // +kubebuilder:default=true
    BootstrapClusterCreatorAdminPermissions bool `json:"bootstrapClusterCreatorAdminPermissions"`
}

That feels right to me, since it matches the shape of the EKS API.

My question was what we should do if the user changes bootstrapClusterCreatorAdminPermissions on an existing cluster. The UpdateAccessConfig API doesn't allow it, because it makes no sense once the cluster has been boostrapped already. The only valid use of UpdateAccessConfig is to change the authentication mode.

So if a user changes bootstrapClusterCreationAdminPermissions on an existing cluster we could:

  1. Ignore bootstrapClusterCreatorAdminPermissions entirely and only submit the request if it changes the authentication mode
  2. Submit the request if it changes the authentication mode and log a warning if bootstrapClusterCreatorAdminPermissions also changes
  3. Fail validation and return an error if bootstrapClusterCreationAdminPermissions changes, regardless of whether the authentication mode also changes

1 feels bad, 2 is low friction, 3 is the most explicit. Thoughts?

Copy link
Contributor

@punkwalker punkwalker Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, you are suggesting we add bootstrapClusterCreatorAdminPermissions to AccessConfig and default it to true?

Yes.

2 feels a golden midway instead of failing. But it should be documented explicitly in the CR.

Comment on lines 565 to 566
expectedAuthenticationMode := string(s.scope.ControlPlane.Spec.AccessConfig.AuthenticationMode)
currentAuthenticationMode := string(accessConfig.AuthenticationMode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this type conversion? Should be enough to compare the values as we already have a Type for AuthenticationMode.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of these is from the AWS ekstypes pkg, the other from our own v1beta2

@joshfrench
Copy link
Author

@punkwalker Thanks for the feedback, I'll be able to circle back to this next week.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 4, 2025
@k8s-ci-robot k8s-ci-robot removed do-not-merge/contains-merge-commits needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 8, 2025
@nrb
Copy link
Contributor

nrb commented Aug 12, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 12, 2025
@joshfrench
Copy link
Author

Added BootstrapClusterCreatorAdminPermissions to the AccessConfig struct (default = true). We don't include the new field in subsequent calls to UpdateClusterConfig but we do log if it has changed.

Copy link
Contributor

@punkwalker punkwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm 🚀


var (
// EKSAuthenticationModeConfigMap indicates that only `aws-auth` ConfigMap will be used for authentication
EKSAuthenticationModeConfigMap = EKSAuthenticationMode("CONFIG_MAP")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if we used values like:

  • configmap
  • api
  • configmap_and_api

We generally try not to have direct copies of the values like this as all caps is a bit jarring in the manifests. It would mean we'd need to add a "conversion" function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshfrench Would you get a chance to make this change so we can merge this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

@punkwalker
Copy link
Contributor

/test ?

@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: d0deb3bdc08d5d856d6d5cc699e75ee0c02a7196

@faiq
Copy link
Contributor

faiq commented Sep 10, 2025

All the changes look fine to me and the e2e failure looks entirely unrelated and due to infra flakes

@faiq
Copy link
Contributor

faiq commented Sep 10, 2025

/test ?

@k8s-ci-robot
Copy link
Contributor

@faiq: The following commands are available to trigger required jobs:

/test pull-cluster-api-provider-aws-build
/test pull-cluster-api-provider-aws-build-docker
/test pull-cluster-api-provider-aws-e2e-blocking
/test pull-cluster-api-provider-aws-test
/test pull-cluster-api-provider-aws-verify

The following commands are available to trigger optional jobs:

/test pull-cluster-api-provider-aws-apidiff-main
/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-clusterclass
/test pull-cluster-api-provider-aws-e2e-conformance
/test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts
/test pull-cluster-api-provider-aws-e2e-eks
/test pull-cluster-api-provider-aws-e2e-eks-gc
/test pull-cluster-api-provider-aws-e2e-eks-testing

Use /test all to run the following jobs that were automatically triggered:

pull-cluster-api-provider-aws-apidiff-main
pull-cluster-api-provider-aws-build
pull-cluster-api-provider-aws-build-docker
pull-cluster-api-provider-aws-e2e-blocking
pull-cluster-api-provider-aws-test
pull-cluster-api-provider-aws-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@faiq
Copy link
Contributor

faiq commented Sep 10, 2025

/test pull-cluster-api-provider-aws-e2e-eks

@faiq
Copy link
Contributor

faiq commented Sep 11, 2025

@joshfrench can you look into the eks test failures?

@punkwalker
Copy link
Contributor

@joshfrench can you look into the eks test failures?

@faiq
I think the failures are flakes, the pull-cluster-api-provider-aws-e2e-eks had already passed and the changes after that should not impact the test.
The main concern is pull-cluster-api-provider-aws-test, which is failing due to Max VPC issue.

@faiq
Copy link
Contributor

faiq commented Sep 11, 2025

@punkwalker

@faiq
I think the failures are flakes, the pull-cluster-api-provider-aws-e2e-eks had already passed and the changes after that should not impact the test.
The main concern is pull-cluster-api-provider-aws-test, which is failing due to Max VPC issue.

Because of the squashed commits and hidden comments I didn't see that a previous run had passed... regardless we should make sure that the flake isn't caused by any of the changes made here and is still worth a look.

Regading the VPC issue, I think they're entirely unrelated

@joshfrench
Copy link
Author

The e2e-eks test currently failing have all passed previously, even after the squash commit. I'm not sure how to interpret these failures as other than infrastructure-related, they're all some flavor of timeout:

  INFO: clusterctl describe cluster eks-upgrade-m1opde --show-conditions=all --show-machinesets=true --grouping=false --echo=true --v1beta2
  [FAILED] in [It] - /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:373 @ 09/10/25 22:03:12.039
• [FAILED] [1646.845 seconds]
EKS Cluster upgrade test [It] [managed] [upgrade] should create a cluster and upgrade the kubernetes version
/home/prow/go/src/sigs.k8s.io/cluster-api-provider-aws/test/e2e/shared/common.go:250
  [FAILED] Failed to run clusterctl describe
  Unexpected error:
      <*errors.withStack | 0xc000af85d0>: 
      get InfraCluster reference from Cluster: failed to retrieve AWSManagedCluster eks-upgrade-hfju12/eks-upgrade-m1opde: awsmanagedclusters.infrastructure.cluster.x-k8s.io "eks-upgrade-m1opde" not found
  [TIMEDOUT] A suite timeout occurred
  In [It] at: /home/prow/go/src/sigs.k8s.io/cluster-api-provider-aws/test/e2e/suites/managed/gc_test.go:161 @ 09/11/25 00:55:13.567
  Full Stack Trace
  This is the Progress Report generated when the suite timeout occurred:
    [managed] [gc] EKS Cluster external resource GC tests [managed] [gc] should cleanup a cluster that has ELB/NLB load balancers using AlternativeGCStrategy (Spec Runtime: 14m50.136s)
      /home/prow/go/src/sigs.k8s.io/cluster-api-provider-aws/test/e2e/suites/managed/gc_test.go:161
  [TIMEDOUT] A grace period timeout occurred
  In [SynchronizedAfterSuite] at: /home/prow/go/src/sigs.k8s.io/cluster-api-provider-aws/test/e2e/suites/managed/managed_suite_test.go:67 @ 09/11/25 00:55:43.579
  Full Stack Trace
  This is the Progress Report generated when the grace period timeout occurred:
    In [SynchronizedAfterSuite] (Node Runtime: 30s)
      /home/prow/go/src/sigs.k8s.io/cluster-api-provider-aws/test/e2e/suites/managed/managed_suite_test.go:67

@joshfrench
Copy link
Author

/test pull-cluster-api-provider-aws-e2e-eks

@joshfrench
Copy link
Author

Timeouts in different tests this run.

@joshfrench
Copy link
Author

/test pull-cluster-api-provider-aws-e2e-eks

@damdo
Copy link
Member

damdo commented Sep 23, 2025

/retest

@damdo
Copy link
Member

damdo commented Sep 23, 2025

@richardcase if you are happy with this, are you able to approve?

@richardcase
Copy link
Member

/test pull-cluster-api-provider-aws-test

@richardcase
Copy link
Member

@richardcase if you are happy with this, are you able to approve?

@damdo - we need the tests to be passing first.

@damdo
Copy link
Member

damdo commented Sep 23, 2025

@damdo
Copy link
Member

damdo commented Sep 23, 2025

/test pull-cluster-api-provider-aws-test

@damdo
Copy link
Member

damdo commented Sep 23, 2025

Ah looking deeper at this it looks like is is not a flake after all:

--- FAIL: TestWebhookCreate (0.41s)
    ...
    --- FAIL: TestWebhookCreate/BootstrapClusterCreatorAdminPermissions_false_with_EKSAuthenticationModeConfigMap (0.01s)
=== RUN   TestWebhookCreate/BootstrapClusterCreatorAdminPermissions_false_with_EKSAuthenticationModeConfigMap
I0923 10:39:11.390761   46693 awsmanagedcontrolplane_webhook.go:599] "AWSManagedControlPlane setting defaults" logger="awsmanagedcontrolplane-resource" control-plane="default/"
I0923 10:39:11.397086   46693 awsmanagedcontrolplane_webhook.go:90] "AWSManagedControlPlane validate create" logger="awsmanagedcontrolplane-resource" control-plane="default/mcp-dqq8z"
    awsmanagedcontrolplane_webhook_test.go:420: 
        Expected
            <string>: admission webhook "validation.awsmanagedcontrolplanes.controlplane.cluster.x-k8s.io" denied the request: AWSManagedControlPlane.controlplane.cluster.x-k8s.io "mcp-dqq8z" is invalid: spec.accessConfig.bootstrapClusterCreatorAdminPermissions: Invalid value: false: bootstrapClusterCreatorAdminPermissions must be true if cluster authentication mode is set to config_map
        to contain substring
            <string>: bootstrapClusterCreatorAdminPermissions must be true if cluster authentication mode is set to CONFIG_MAP

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 23, 2025
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@joshfrench
Copy link
Author

/test pull-cluster-api-provider-aws-test

@damdo
Copy link
Member

damdo commented Sep 23, 2025

Thanks @joshfrench

@richardcase they are passing now :)

@damdo
Copy link
Member

damdo commented Sep 23, 2025

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants