Skip to content

HIVE-28954: CI fails intermittently due to ephemeral-storage exhaustion #5840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zabetak
Copy link
Member

@zabetak zabetak commented Jun 3, 2025

What changes were proposed in this pull request?

Introduce ephemeral storage request and limit for HDB container based on current usage and cluster capacity.

Why are the changes needed?

Based on recent runs the HDB container, which executes the tests, consumes 10Gi to 15Gi of ephemeral storage. To ensure that pods are scheduled correctly to the GKE nodes that have the necessary capacity we should add an explicit resource request.

Moreover, to avoid malfunctioning PRs/pods affect the overall health of the cluster we set the resource limit to 20Gi that is reasonably high to permit precommits to run fine and can also guard against accidental changes that may cause disk spikes.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Monitor requests/limits through the GKE console.

Introduce ephemeral storage request and limit for HDB container based on current usage and cluster capacity.

Based on recent runs the HDB container, which executes the tests, consumes 10Gi to 15Gi of ephemeral storage. To ensure that pods are scheduled correctly to the GKE nodes that have the necessary capacity we should add an explicit resource request.

Moreover, to avoid malfunctioning PRs/pods affect the overall health of the cluster we set the resource limit to 20Gi that is reasonably high to permit precommits to run fine and can also guard against accidental changes that may cause disk spikes.
Copy link

sonarqubecloud bot commented Jun 5, 2025

@kokila-19
Copy link
Contributor

LGTM +1
There is one test failure which is flaky , re-triggering should pass the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants