Investigate memory limit 2766#2790
Open
Abhijeet212004 wants to merge 5 commits intojenkinsci:masterfrom
Open
Conversation
Fixes jenkinsci#2783 Partially addresses jenkinsci#2737 The problem: When you create pod templates in pipelines (ephemeral templates), their reference becomes null after Jenkins restarts. The cleanup code was checking 'if (template != null)' before decrementing counters, so it never ran - causing a permanent leak. The fix: Use cloudName and templateId instead of the template object. These fields survive restarts, so counters get cleaned up correctly.
Prevents attempting to decrement counters for nodes that were never registered (e.g., in eviction scenarios). Checks if counters exist before decrementing to avoid spurious warnings.
- Restructure unregisterByIds() to check and decrement counters independently - Improve test to simulate ephemeral template removal with cloud.removeTemplate() - Keep logging simple for consistency with existing patterns
The issue was in PodTemplate.readResolve() - it only migrated deprecated resource limit fields when containers was null (old v0.8 configs), but not when the list was empty (which happens with modern configs). When users set memory limits through the UI, the deprecated setter tries to apply them to the first container. If no containers exist yet, it does nothing and the values just sit in deprecated fields. After restart, readResolve() skipped migration because it only checked for null, not isEmpty(). Now we handle both empty containers and apply deprecated fields to existing containers that don't have limits set yet. Fixes jenkinsci#2766
32a8a13 to
5e3bc02
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix memory limits not persisting after Jenkins restart
Problem
Memory limits configured for Kubernetes agent pods reset to default (1024 MB) after Jenkins restarts, even when explicitly set to higher values like 3072 MB in the cloud configuration.
Root Cause
The issue was in
PodTemplate.readResolve(). When users configure memory limits through the Jenkins UI on a PodTemplate, the deprecated settersetResourceLimitMemory()is called. This setter tries to apply the value to the first container usinggetFirstContainer().ifPresent().If the containers list is empty at that point (which is common in modern configs),
ifPresent()does nothing and the value only gets stored in the deprecatedPodTemplate.resourceLimitMemoryfield.After Jenkins restarts,
readResolve()runs to migrate old configs. However, it only checked forcontainers == null(legacy v0.8 configs), notcontainers.isEmpty(). So these deprecated field values were never migrated to actual container resource limits, and pods ended up with default memory limits.Solution
Extended the
readResolve()migration logic to handle two additional cases:This maintains backward compatibility with v0.8 configs while fixing the modern usage pattern.
Testing done
Unit Tests:
Added comprehensive test coverage in
ResourceLimitPersistenceTest.java:testContainerTemplateResourceLimitMemoryPersistsAfterXStreamSerialization- Verifies XStream serialization works correctlytestPodTemplateWithContainerResourceLimitsPersists- Verifies container-level limits persisttestPodTemplateLegacyFieldsMigrateToContainer- Reproduces and verifies the bug fix - simulates the exact scenario from issue [JENKINS-76163] Kubernetes plugin doesnt save memory limits #2766 with deprecated fields and empty containers listAll tests pass:
#2787 - Fixed counter leak with ephemeral templates after restart
#2785 - Fixed Reaper not detecting ImagePullBackOff state
Submitter checklist
Fixes #2766