Skip to content

Close per-namespace pod informers on agent termination#2821

Open
vquemener wants to merge 1 commit intojenkinsci:masterfrom
vquemener:fix/close-informer-on-terminate
Open

Close per-namespace pod informers on agent termination#2821
vquemener wants to merge 1 commit intojenkinsci:masterfrom
vquemener:fix/close-informer-on-terminate

Conversation

@vquemener
Copy link
Copy Markdown

@vquemener vquemener commented Apr 7, 2026

Summary

registerPodInformer() creates a SharedIndexInformer<Pod> per namespace and stores it in a ConcurrentHashMap, but nothing ever closes or removes these informers. When pods run in ephemeral namespaces (one per build), each build
leaks an informer that retries indefinitely after the namespace is deleted, causing thread leaks, log floods (403 Forbidden), and CPU waste.

This PR adds unregisterPodInformer(namespace) in KubernetesCloud and calls it from KubernetesSlave._terminate() when no other pod from the same cloud remains in the namespace.

Fixes #2820

AI disclosure

This patch was developed with the assistance of Claude Opus 4.6 (Anthropic).

The analysis, fix, and tests were produced collaboratively between a human operator and the AI. I am not a fluent Java developer: I maintain the Jenkins instance where this bug was causing real production issues, and this was the best way I could contribute a concrete fix proposal.

I completely understand if this PR is rejected on that basis, or if the approach needs rework by someone more familiar with the codebase. I wanted to at least document the problem and push a starting point for discussion.

Changes

KubernetesCloud.java

  • New method unregisterPodInformer(String namespace): removes the informer from the map and calls informer.close().

KubernetesSlave.java

  • At the end of _terminate(), after pod deletion: checks whether any other KubernetesSlave node from the same cloud still uses the namespace. If not, calls cloud.unregisterPodInformer(ns).

KubernetesCloudTest.java

  • unregisterPodInformer_closesAndRemoves: verifies the informer is closed and removed from the map.
  • unregisterPodInformer_noopOnUnknownNamespace: verifies no side effects when called with an unknown namespace.
  • informerKeptWhileOtherPodsShareNamespace: two pods in the same namespace => terminating the first must not close the shared informer.
  • informerClosedWhenLastPodInNamespaceTerminates: last pod in namespace terminates => informer must be closed.
  • informerNotAffectedByOtherCloud: pods on different clouds sharing a namespace => only the relevant cloud's informer is closed.

Testing done

  • Unit tests added (see above): 24 tests in KubernetesCloudTest, 0 failures
  • Full plugin unit test suite (89 tests): 0 failures, 0 regressions
  • End-to-end validated on a local K3s v1.34.6+k3s1 cluster with Jenkins 2.541.3 deployed via Helm: created a pod in an ephemeral namespace, confirmed Registered informer on launch and Closed informer on agent termination

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests that demonstrate the feature works or the issue is fixed

registerPodInformer() creates a SharedIndexInformer per namespace but
never closes or removes them. With ephemeral namespaces (one per build),
each build leaks an informer that retries indefinitely after the namespace
is deleted, causing thread leaks and log floods (403 Forbidden).

Add unregisterPodInformer(namespace) in KubernetesCloud and call it from
KubernetesSlave._terminate() when no other pod from the same cloud
remains in the namespace.

Tests added:
- unregisterPodInformer closes and removes the informer
- no-op on unknown namespace
- informer kept while other pods share the namespace
- informer closed when last pod in namespace terminates
- informer not affected by other clouds

Co-authored-by: Claude Opus 4.6 (Anthropic) <[email protected]>
@vquemener vquemener requested a review from a team as a code owner April 7, 2026 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

registerPodInformer leaks SharedIndexInformers when pods use ephemeral namespaces

1 participant