Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking improvement of estimated space necessary for inflight compactions #648

Open
wants to merge 3 commits into
base: palantir-cassandra-2.2.18
Choose a base branch
from

Conversation

lyubent
Copy link
Contributor

@lyubent lyubent commented Mar 29, 2025

Before

The estimation of bytes needed to compact is aggregated for all in-flight compactions but even as the compactions progress, the estimation never drops. This has negative impact on large compactions that build up a lot of temporary files since we end up reserving the space for the estimate but do not account for the temporary files on disk reducing the total available disk space with the outcome being that new compactions cannot be scheduled due to no disk space being available even though there is plenty of disk.

Worst case scenario, for example for a 2T compaction - is that it will be double counted near its completion where the compaction is at 99%. The estimated needed space which will be reserved will sit at 2T and the temporary files will sit at 2T thus reducing the available space for compacting by 4T.

After

The estimated space is changed to subtract the current space used by in-flight compactions giving a more reliable estimate. Since the estimation isn't always correct we take whatever is larger of:

  1. estimated space - actual live space used by inflight compactions
  2. actual live space used by inflight compactions

The outcome is that the reserved disk space needed for compactions becomes closer to what is realistically needed.

if (!checkAvailableDiskSpace(estimatedSSTables,
expectedTotalWriteSize,
expectedSpaceUsedByCompactions,
CompactionMetrics.getCompactions()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the overall idea. Just two concerns from me:

  1. It feels weird to make decisions within the Compaction Manager based on metrics — it removes isolation between the two. Though the compaction throttler does this too, that was our addition and this feels more invasive. I'd prefer if there was a "concurrent compactions tracker" that both the compaction manager and metrics emitter read from. Curious to hear your thoughts here.
  2. How reliable is the value from compactionHolder.getCompactionInfo().getCompleted()? I don't think we emit this as a metric anywhere right now. Can we sanity check either via logging or (preferably) a dashboard and soak for a couple days to verify it looks reasonable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 - Sounds like we want to add an abstraction for the sake of abstracting which is fine. I was going for the most simple solution but if we want to add complexity to ensure isolation that makes sense. Building a singleton to track the compactions and then using it to update metrics should be straight forward.
2 - From testing observations, much more accurate than the estimation used to reserve space. Validating this by observing actual telemetry sounds perfectly valid, I'll set something up.

*
* @param estimatedSSTables The estimated number of SSTables expected to be generated as a result of the compaction.
* @param expectedTotalWriteSize The total estimated disk space required for all ongoing and pending compactions, in bytes.
* @param expectedSpaceUsedByCompactions The estimated disk space, in bytes, needed specifically for the current compaction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you flipped this and the one before it

long writeSize = expectedTotalWriteSize / estimatedSSTables;
long totalAvailable = 0L;
long totalSpace = 0L;
long spaceNeededForInProgressCompactions = Math.max(liveSpaceUsedByInProgressCompactions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused why this is a max and not just expectedSpaceUsedByCompactions - liveSpaceUsedByInProgressCompactions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the estimation for how big the compaction will be is an estimation. In most cases space used by tmp tables takes up more space than the initial estimation, with this setup we will never end up with a negative number meaning we are underestimating the needed space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants