Skip to content

Conversation

@shraddhabang
Copy link
Collaborator

This PR implements two key components for the AWS Global Accelerator controller:

  1. Endpoint Loaders: Dynamically loads and resolves endpoints from Kubernetes resources
  2. Resource Monitoring: Watches referenced resources for changes to trigger reconciliation

Commit 1: [feat aga] Implement endpoint loader with DNS resolution

This commit implements the endpoint loading system for the AGA controller. It provides:

  • Dynamic loading of endpoints from Kubernetes resources (Services, Ingresses, Gateways)
  • DNS resolution to AWS load balancer ARNs
  • Efficient LRU caching with TTL-based invalidation
  • Detailed error reporting and status tracking
  • Comprehensive unit tests

The endpoint loader enables GlobalAccelerator resources to reference Kubernetes objects and automatically resolve them to the appropriate AWS resources.

Commit 2: [feat aga] Implement resource monitoring for referenced resources

This commit implements the resource monitoring system for the AGA controller. It provides:

  • Dynamic watching of referenced Kubernetes resources
  • Event handling to trigger reconciliation when resources change
  • Only watches resources that are actively referenced
  • Reference tracking between resources and GlobalAccelerators
  • Efficient watch management with cleanup for unreferenced resources
  • Unit tests for all monitoring components

This monitoring system ensures that when a referenced resource changes (e.g., a Service gets a new load balancer), the GlobalAccelerator is automatically reconciled to use the updated endpoint.

Note for temporary limitations for cross namespace reference

We want to allow references to Kubernetes resources (Services, Ingresses, Gateways) that exist in different namespaces from the GlobalAccelerator CR itself. This enables more flexible architectural patterns but requires careful security considerations. We will implement this later as we will need to come up with a proper cross-namespace reference system keeping security concerns in mind. For now in the current implementation cross-namespace references are detected but only result in a warning - this means:

  • The references won't work (the endpoint is marked as warning)
  • The GlobalAccelerator CR won't be rejected

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the docs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 18, 2025
@shraddhabang shraddhabang force-pushed the agaresourceloader branch 2 times, most recently from 89591fa to f2d0ac0 Compare November 21, 2025 18:20
@shraddhabang shraddhabang changed the base branch from AGAController to main November 25, 2025 04:28
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 25, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shraddhabang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 25, 2025
return &DNSToLoadBalancerResolver{
elbv2Client: elbv2Client,
cache: cache,
ttl: 5 * time.Minute, // Default TTL of 5 minutes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this is too short. DNS names / ARNs do not change that frequently. What do you think about increasing the cache time?

Copy link
Collaborator Author

@shraddhabang shraddhabang Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it small because I wanted to remove the older dns from the cache and keep the cache upto date. Having large TTL will keep the older dns and invalid (Deleted) ones for longer time. If we encounter any older dns because the our k8s were not updated for some reason, we will resolve that to invalid arn. Hence I kept it for 5 minutes so that we get faster failure feedback. What do you think?

return "", fmt.Errorf("object is not a Service")
}

if svc.Spec.Type != corev1.ServiceTypeLoadBalancer {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"fatal", fatal)

// Log individual endpoints
for i, endpoint := range endpoints {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this will potentially construct a lot of strings. i would be mindful of leaving this in here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I am aware of that. But I think this is crucial to log this info for troubleshooting since the endpoints work dynamically. And we need to know which endpoints we filtered out so that we can debug this better. We are not expecting many endpoints per accelerator so I think verbose level 1 logs are okay for this.

}
t.resourceMap[resourceKey].Insert(gaKey.String())

t.logger.V(1).Info("Resource referenced by GA",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about a potential lot of strings of constructed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning as above.

@zac-nixon
Copy link
Collaborator

/lgtm
/approved

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 25, 2025
@k8s-ci-robot k8s-ci-robot merged commit d278541 into kubernetes-sigs:main Nov 25, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants