Enhance IP allocation error diagnostics with detailed ENI and fragmentation info #3432

musharafmaqbool · 2025-09-06T06:24:39Z

Summary

Replaces verbose logging with structured Kubernetes events for IP allocation failures, providing actionable diagnostics to users and operators.

What type of PR is this?

enhancement

Changes Made

Replaced verbose logging with structured Kubernetes events on nodes
Added detailed diagnostic information including:
- Specific failure reasons (subnet exhaustion, ENI limits, fragmentation)
- Available IP counts and subnet details
- ENI utilization and limits
- Actionable guidance for operators
Reduced log verbosity while maintaining essential debug information
Enhanced user experience - failures now visible via kubectl describe node

Why this change?

Addresses reviewer feedback on PR #3429. Instead of burying diagnostic information in verbose logs, this surfaces the "why" of IP allocation failures through Kubernetes events that users can actually see and act upon.

Testing

Unit tests for event emission logic
Manual testing of IP allocation failure scenarios
Verified events appear correctly in kubectl describe node output

Result

Users get clear, actionable information when pods can't get IPs
Operators can quickly identify if issues are due to fragmentation, ENI limits, or capacity
Reduced log noise while improving diagnostics

Closes #3429 (replaces previous implementation)

…ws#3415 Enhanced error messages for IP address allocation failures to provide more detailed diagnostic information as requested in issue aws#3415. Changes made: - Added current IP usage statistics (used/available) to error messages - Included ENI limit information in allocation failure messages - Enhanced subnet configuration context in error logs - Provided specific failure reasons to assist with troubleshooting This improvement will help users quickly identify the root cause of IP assignment failures: - Whether it's due to subnet IP exhaustion - ENI limits being reached - VPC/subnet configuration issues - IP warming delays The enhanced error messages follow the format: "failed to allocate IP on ENI %s: %v. Usage: %d/%d IPs in subnet, ENI limit: %d/%d"

Enhance IP assignment error messages with detailed diagnostics Fixes…

@erezzarum

…tation info Follow-up to #[previous PR number] addressing feedback from @erezzarum **Changes made:** - Added ENI count and limit information to allocation failures - Enhanced IPv4/IPv6 prefix allocation errors with current usage stats - Included fragmentation detection in error messages - Added trunk ENI mode and prefix delegation context **Addresses the following issues:** 1. Better root cause identification for prefix delegation scenarios 2. ENI allocation status information in error messages 3. More detailed fragmentation context for troubleshooting **Testing:** Enhanced error messages now provide actionable diagnostic information for common IP allocation failure scenarios.

Enhance IP allocation error diagnostics with detailed ENI and fragmen…

@jaydeokar

…ailures @jaydeokar Thank you for the excellent feedback! I've updated the implementation to address your concerns: 🔄 **Changes Made:** - **Replaced verbose logging** with structured Kubernetes events - **Events are emitted on nodes** when IP allocation fails - users can see them via `kubectl describe node` - **Detailed diagnostic information** including: - Specific failure reason (subnet exhaustion, ENI limits, fragmentation) - Available IP counts and subnet details - Actionable guidance for operators - **Reduced log verbosity** - keeping only essential debug information 🎯 **Result:** - Users now get clear, actionable information about why their pods can't get IPs - The "why" is surfaced through Kubernetes events instead of buried in logs - Operators can quickly identify if it's fragmentation, ENI limits, or genuine capacity issues This addresses the core issue you raised about providing meaningful diagnostics to users rather than just verbose logs for debugging. The gRPC handler now gets structured error information while users get events they can actually act on. Ready for re-review! 🚀

musharafmaqbool · 2025-09-21T07:40:38Z

@labria hey can you please review my pr?

larhauga · 2025-11-03T10:07:14Z

@labria sorry for pinging you here, but we are trying to debug high latency when allocating ENIs and would really like this PR to extend our debugging options. Would you be able to review this?

musharafmaqbool · 2025-11-14T06:39:56Z

@labria sure i will loook it into it

yash97 · 2025-11-17T08:14:57Z

pkg/ipamd/ipamd.go

 	rcv1alpha1 "github.com/aws/amazon-vpc-resource-controller-k8s/apis/vpcresources/v1alpha1"
 )

+// Add these type definitions right after the imports, before the package comment:


can you remove these comments which looks like added by LLM

cdirubbio · 2025-11-18T18:38:20Z

pkg/ipamd/ipamd.go

+// The package ipamd is a long running daemon which manages a warm pool of available IP addresses.
+// It also monitors the size of the pool, dynamically allocates more ENIs when the pool size goes below
+// the minimum threshold and frees them back when the pool size goes above max threshold.
+


Duplicate comment here

musharafmaqbool added 6 commits August 27, 2025 20:40

Merge pull request #1 from musharafmaqbool/enhance-ip-assignment-errors

c4e97b2

Enhance IP assignment error messages with detailed diagnostics Fixes…

Merge branch 'aws:master' into master

09bd1b8

Merge pull request #2 from musharafmaqbool/musharafmaqbool-patch-1

10bb1ee

Enhance IP allocation error diagnostics with detailed ENI and fragmen…

musharafmaqbool requested a review from a team as a code owner September 6, 2025 06:24

Merge branch 'master' into enhance-ip-diagnostics

9b44128

yash97 reviewed Nov 17, 2025

View reviewed changes

cdirubbio reviewed Nov 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance IP allocation error diagnostics with detailed ENI and fragmentation info #3432

Enhance IP allocation error diagnostics with detailed ENI and fragmentation info #3432

Uh oh!

musharafmaqbool commented Sep 6, 2025 •

edited

Loading

Uh oh!

musharafmaqbool commented Sep 21, 2025

Uh oh!

larhauga commented Nov 3, 2025

Uh oh!

musharafmaqbool commented Nov 14, 2025

Uh oh!

yash97 Nov 17, 2025 •

edited

Loading

Uh oh!

cdirubbio Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Enhance IP allocation error diagnostics with detailed ENI and fragmentation info #3432

Are you sure you want to change the base?

Enhance IP allocation error diagnostics with detailed ENI and fragmentation info #3432

Uh oh!

Conversation

musharafmaqbool commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What type of PR is this?

Changes Made

Why this change?

Testing

Result

Uh oh!

musharafmaqbool commented Sep 21, 2025

Uh oh!

larhauga commented Nov 3, 2025

Uh oh!

musharafmaqbool commented Nov 14, 2025

Uh oh!

yash97 Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cdirubbio Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

musharafmaqbool commented Sep 6, 2025 •

edited

Loading

yash97 Nov 17, 2025 •

edited

Loading