Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: EKS e2e test using eksctl #667

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

whatnick
Copy link
Contributor

@whatnick whatnick commented Aug 27, 2024

Description

NOTE : Since this will take a bit of CI and other account provisioning planning to keep this synced to upstream once a week till across the line or I run out of juice.

Add EKS based e2e tests by execing EKSCtl to provision and delete temporary cluster. Currently at POC stage since Account setup etc. are needed to run this in practice in conjuction with secrets and variables associated with this repository.

The AWS integration should be setup via OIDC as shown here : https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services

with roles relevant to EKSCtl as shown here :

https://eksctl.io/usage/minimum-iam-policies/

Related Issue

Partially addresses #451

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

With drop packets metrics Scenario disabled as per #746 , the AWS e2e test suite runs successfully.

go test -run TestE2ERetinaAWS ./test/e2e/ -timeout 40m
ok      github.com/microsoft/retina/test/e2e    1866.467s

For failing test runs cluster creation and tear down is as below.

go test -run TestE2ERetinaAWS ./test/e2e/ -timeout 30m
CreateCluster setting stored value for parameter [AccountID] set as [XXXXXXXXXXXXXX]
CreateCluster setting stored value for parameter [Region] set as [us-west-2]
CreateCluster setting stored value for parameter [ClusterName] set as [whatnick-e2e-netobs-1724757102]
CreateCluster setting stored value for parameter [KubeConfigFilePath] set as [/home/whatnick/dev/retina/test/e2e/test.pem]
#################### CreateCluster ######################################################################
2024-08-27 20:41:44 [ℹ]  eksctl version 0.189.0
2024-08-27 20:41:44 [ℹ]  using region us-west-2
2024-08-27 20:41:45 [ℹ]  setting availability zones to [us-west-2d us-west-2c us-west-2b]
2024-08-27 20:41:45 [ℹ]  subnets for us-west-2d - public:192.168.0.0/19 private:192.168.96.0/19
2024-08-27 20:41:45 [ℹ]  subnets for us-west-2c - public:192.168.32.0/19 private:192.168.128.0/19
2024-08-27 20:41:45 [ℹ]  subnets for us-west-2b - public:192.168.64.0/19 private:192.168.160.0/19
2024-08-27 20:41:45 [ℹ]  nodegroup "ng-2a2471d7" will use "" [AmazonLinux2/1.30]
2024-08-27 20:41:45 [ℹ]  using Kubernetes version 1.30
2024-08-27 20:41:45 [ℹ]  creating EKS cluster "whatnick-e2e-netobs-1724757102" in "us-west-2" region with managed nodes
2024-08-27 20:41:45 [ℹ]  will create 2 separate CloudFormation stacks for cluster itself and the initial managed nodegroup
2024-08-27 20:41:45 [ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-west-2 --cluster=whatnick-e2e-netobs-1724757102'
2024-08-27 20:41:45 [ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "whatnick-e2e-netobs-1724757102" in "us-west-2"
2024-08-27 20:41:45 [ℹ]  CloudWatch logging will not be enabled for cluster "whatnick-e2e-netobs-1724757102" in "us-west-2"
2024-08-27 20:41:45 [ℹ]  you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-west-2 --cluster=whatnick-e2e-netobs-1724757102'
2024-08-27 20:41:45 [ℹ]  default addons vpc-cni, kube-proxy, coredns were not specified, will install them as EKS addons
2024-08-27 20:41:45 [ℹ]  
2 sequential tasks: { create cluster control plane "whatnick-e2e-netobs-1724757102", 
    2 sequential sub-tasks: { 
        2 sequential sub-tasks: { 
            1 task: { create addons },
            wait for control plane to become ready,
        },
        create managed nodegroup "ng-2a2471d7",
    } 
}
2024-08-27 20:41:45 [ℹ]  building cluster stack "whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:41:46 [ℹ]  deploying stack "whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:42:16 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:42:47 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:43:48 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:44:49 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:45:50 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:46:52 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:47:53 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:48:54 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:49:55 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:50:56 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:51:00 [!]  recommended policies were found for "vpc-cni" addon, but since OIDC is disabled on the cluster, eksctl cannot configure the requested permissions; the recommended way to provide IAM permissions for "vpc-cni" addon is via pod identity associations; after addon creation is completed, add all recommended policies to the config file, under `addon.PodIdentityAssociations`, and run `eksctl update addon`
2024-08-27 20:51:00 [ℹ]  creating addon
2024-08-27 20:51:01 [ℹ]  successfully created addon
2024-08-27 20:51:02 [ℹ]  creating addon
2024-08-27 20:51:02 [ℹ]  successfully created addon
2024-08-27 20:51:03 [ℹ]  creating addon
2024-08-27 20:51:03 [ℹ]  successfully created addon
2024-08-27 20:53:08 [ℹ]  building managed nodegroup stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 20:53:09 [ℹ]  deploying stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 20:53:09 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 20:53:40 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 20:54:24 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 20:55:30 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 20:55:31 [ℹ]  waiting for the control plane to become ready
2024-08-27 20:55:31 [✔]  saved kubeconfig as "/home/whatnick/dev/retina/test/e2e/test.pem"
2024-08-27 20:55:31 [ℹ]  no tasks
2024-08-27 20:55:31 [✔]  all EKS cluster resources for "whatnick-e2e-netobs-1724757102" have been created
2024-08-27 20:55:31 [✔]  created 0 nodegroup(s) in cluster "whatnick-e2e-netobs-1724757102"
2024-08-27 20:55:33 [ℹ]  nodegroup "ng-2a2471d7" has 2 node(s)
2024-08-27 20:55:33 [ℹ]  node "ip-192-168-39-250.us-west-2.compute.internal" is ready
2024-08-27 20:55:33 [ℹ]  node "ip-192-168-75-149.us-west-2.compute.internal" is ready
2024-08-27 20:55:33 [ℹ]  waiting for at least 2 node(s) to become ready in "ng-2a2471d7"
2024-08-27 20:55:33 [ℹ]  nodegroup "ng-2a2471d7" has 2 node(s)
2024-08-27 20:55:33 [ℹ]  node "ip-192-168-39-250.us-west-2.compute.internal" is ready
2024-08-27 20:55:33 [ℹ]  node "ip-192-168-75-149.us-west-2.compute.internal" is ready
2024-08-27 20:55:33 [✔]  created 1 managed nodegroup(s) in cluster "whatnick-e2e-netobs-1724757102"
2024-08-27 20:55:34 [ℹ]  kubectl command should work with "/home/whatnick/dev/retina/test/e2e/test.pem", try 'kubectl --kubeconfig=/home/whatnick/dev/retina/test/e2e/test.pem get nodes'
2024-08-27 20:55:34 [✔]  EKS cluster "whatnick-e2e-netobs-1724757102" in "us-west-2" region is ready
2024/08/27 20:55:34 Cluster created successfully!
InstallHelmChart setting stored value for parameter [Namespace] set as [kube-system]
InstallHelmChart setting stored value for parameter [ReleaseName] set as [retina]
...
#################### DeleteCluster ######################################################################
2024-08-27 20:59:55 [ℹ]  deleting EKS cluster "whatnick-e2e-netobs-1724757102"
2024-08-27 20:59:57 [ℹ]  will drain 0 unmanaged nodegroup(s) in cluster "whatnick-e2e-netobs-1724757102"
2024-08-27 20:59:57 [ℹ]  starting parallel draining, max in-flight of 1
2024-08-27 20:59:57 [✖]  failed to acquire semaphore while waiting for all routines to finish: context canceled
2024-08-27 20:59:58 [ℹ]  deleted 0 Fargate profile(s)
2024-08-27 21:00:00 [ℹ]  cleaning up AWS load balancers created by Kubernetes objects of Kind Service or Ingress
2024-08-27 21:00:05 [ℹ]  
2 sequential tasks: { delete nodegroup "ng-2a2471d7", delete cluster control plane "whatnick-e2e-netobs-1724757102" [async] 
}
2024-08-27 21:00:05 [ℹ]  will delete stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:00:05 [ℹ]  waiting for stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7" to get deleted
2024-08-27 21:00:05 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:00:36 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:01:15 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:02:38 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:03:55 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:05:46 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:06:34 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:08:28 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:08:28 [ℹ]  will delete stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 21:08:29 [✔]  all cluster resources were deleted
2024/08/27 21:08:29 Cluster deleted successfully!

Additional Notes

The helm chart install portion of this test fails in practice presumably due to unreachable image registry. May need to push images to corresponding ECR or debug ghcr access.

Opening this PR for feedback and discussions on AWS e2e testing approach. In practice I have successfully deployed retina legacy charts in EKS.


Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

@whatnick whatnick requested a review from a team as a code owner August 27, 2024 12:12
@nddq nddq added the area/infra Test, Release, or CI Infrastructure label Aug 27, 2024
@nddq
Copy link
Contributor

nddq commented Aug 27, 2024

linters are flagging the exec cmds. IMO shelling out commands is not ideal here. I know that AKS has their own SDK that is able to interact with EKS https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/eks so maybe this could b[e worth look into as an alternative?

@timraymond
Copy link
Member

@nddq I believe the goal is PoC here, and exec'ing is taking an efficiency here to prove the concept faster. @rbtr curious on your thoughts.

@rbtr rbtr requested review from matmerr, vakalapa and neaggarwMS and removed request for karina-ranadive and spencermckee August 27, 2024 21:59
@rbtr
Copy link
Collaborator

rbtr commented Aug 27, 2024

Yeah, to do it for real we will want to use aws-sdk, but shelling out to eksctl is fine while we're just trying to say hey, Retina E2E could work on EKS

@rbtr
Copy link
Collaborator

rbtr commented Aug 27, 2024

@whatnick this is great, thanks for putting it together so fast!
While we review/discuss I do want to set the expectation appropriately that us getting an AWS account provisioned will likely be the slow/hard part of this 😓

@whatnick
Copy link
Contributor Author

whatnick commented Aug 27, 2024 via email

@matmerr
Copy link
Member

matmerr commented Aug 28, 2024

good stuff, thanks for taking a look into this @whatnick

@whatnick
Copy link
Contributor Author

whatnick commented Sep 2, 2024

linters are flagging the exec cmds. IMO shelling out commands is not ideal here. I know that AKS has their own SDK that is able to interact with EKS https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/eks so maybe this could b[e worth look into as an alternative?

It has added a lot of requires, but I have updated the PoC to consume eksctl as a package and run the cobra commands. It can be slimmed down to remove fancy things like coloured logging which are not really relevant for this use-case.

@whatnick
Copy link
Contributor Author

whatnick commented Sep 14, 2024

More progress by enabling AWS VPC-CNI in Network Policy enforcement mode.

kubectl --kubeconfig=/home/whatnick/dev/retina/test/e2e/test.pem get pods -n kube-system
NAME                       READY   STATUS    RESTARTS   AGE
agnhost-a-0                1/1     Running   0          16s
aws-node-77sfp             2/2     Running   0          2m19s
aws-node-ssjsk             2/2     Running   0          2m15s
aws-node-xs2kv             2/2     Running   0          2m17s
coredns-787cb67946-lrxxh   1/1     Running   0          6m34s
coredns-787cb67946-qr4xx   1/1     Running   0          6m34s
kube-proxy-7h7vk           1/1     Running   0          2m15s
kube-proxy-qxwb7           1/1     Running   0          2m19s
kube-proxy-xcbcf           1/1     Running   0          2m17s
retina-agent-22mcl         1/1     Running   0          34s
retina-agent-cc8b4         1/1     Running   0          34s
retina-agent-hwj5t         1/1     Running   0          34s

Network policy is enabled

 kubectl --kubeconfig=/home/whatnick/dev/retina/test/e2e/test.pem get networkpolicy -n kube-system
NAME       POD-SELECTOR    AGE
deny-all   app=agnhost-a   66s

image

Copy link

This PR will be closed in 7 days due to inactivity.

@github-actions github-actions bot added the meta/waiting-for-author Blocked and waiting on the author label Oct 22, 2024
@whatnick
Copy link
Contributor Author

Will merge to upstream soon.

@github-actions github-actions bot removed the meta/waiting-for-author Blocked and waiting on the author label Oct 23, 2024
@whatnick
Copy link
Contributor Author

Currently disabled Windows tests for AWS, can enable once windows cluster setup via eksctl is tested.

Copy link

This PR will be closed in 7 days due to inactivity.

@github-actions github-actions bot added the meta/waiting-for-author Blocked and waiting on the author label Dec 18, 2024
Copy link

Pull request closed due to inactivity.

@github-actions github-actions bot closed this Dec 26, 2024
@nddq nddq reopened this Dec 26, 2024
@nddq nddq removed the meta/waiting-for-author Blocked and waiting on the author label Dec 26, 2024
@whatnick
Copy link
Contributor Author

Thanks for re-opening this. Been busy otherwise , will fix conflicts and maintain it over the weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/infra Test, Release, or CI Infrastructure
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants