Skip to content

Tracking improvement of estimated space necessary for inflight compactions #648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: palantir-cassandra-2.2.18
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 29 additions & 7 deletions src/java/org/apache/cassandra/db/Directories.java
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
import com.google.common.collect.ImmutableSet;
import com.google.common.collect.ImmutableSet.Builder;
import com.google.common.collect.Iterables;
import org.apache.cassandra.metrics.CompactionMetrics;
import org.apache.commons.lang3.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
Expand Down Expand Up @@ -122,8 +123,8 @@ public class Directories

public static final DataDirectory[] dataDirectories;

//needed for dealing with race condition when compactions run in parallel, to reflect the actual available space
//see https://github.com/palantir/cassandra/issues/198
// needed for dealing with race condition when compactions run in parallel, to reflect the actual available space
// see https://github.com/palantir/cassandra/issues/198
private static final Object COMPACTION_LOCK = new Object();
private static long expectedSpaceUsedByCompactions = 0;
static
Expand Down Expand Up @@ -496,37 +497,58 @@ static void sortWriteableCandidates(List<DataDirectoryCandidate> candidates, lon

public Boolean checkAvailableDiskSpaceWithoutConsideringConcurrentCompactions(long estimatedSSTables, long expectedTotalWriteSize)
{
return checkAvailableDiskSpace(estimatedSSTables, expectedTotalWriteSize, 0);
return checkAvailableDiskSpace(estimatedSSTables, expectedTotalWriteSize, 0, 0);
}

public Boolean checkAvailableDiskSpaceConsideringConcurrentCompactions(long estimatedSSTables, long expectedTotalWriteSize)
{
synchronized (COMPACTION_LOCK)
{
if (!checkAvailableDiskSpace(estimatedSSTables, expectedTotalWriteSize, expectedSpaceUsedByCompactions))
if (!checkAvailableDiskSpace(estimatedSSTables,
expectedTotalWriteSize,
expectedSpaceUsedByCompactions,
CompactionMetrics.getCompactions()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the overall idea. Just two concerns from me:

  1. It feels weird to make decisions within the Compaction Manager based on metrics — it removes isolation between the two. Though the compaction throttler does this too, that was our addition and this feels more invasive. I'd prefer if there was a "concurrent compactions tracker" that both the compaction manager and metrics emitter read from. Curious to hear your thoughts here.
  2. How reliable is the value from compactionHolder.getCompactionInfo().getCompleted()? I don't think we emit this as a metric anywhere right now. Can we sanity check either via logging or (preferably) a dashboard and soak for a couple days to verify it looks reasonable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 - Sounds like we want to add an abstraction for the sake of abstracting which is fine. I was going for the most simple solution but if we want to add complexity to ensure isolation that makes sense. Building a singleton to track the compactions and then using it to update metrics should be straight forward.
2 - From testing observations, much more accurate than the estimation used to reserve space. Validating this by observing actual telemetry sounds perfectly valid, I'll set something up.

.stream()
.mapToLong(compactionHolder -> compactionHolder.getCompactionInfo().getCompleted())
.sum()))
return false;
expectedSpaceUsedByCompactions += expectedTotalWriteSize;
return true;
}
}

private boolean checkAvailableDiskSpace(long estimatedSSTables, long expectedTotalWriteSize, long expectedSpaceUsedByCompactions) {
/**
* Determines if there is sufficient disk space available to perform a compaction.
*
* @param estimatedSSTables The estimated number of SSTables expected to be generated as a result of the compaction.
* @param expectedTotalWriteSize The total estimated disk space required for all ongoing and pending compactions, in bytes.
* @param expectedSpaceUsedByCompactions The estimated disk space, in bytes, needed specifically for the current compaction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you flipped this and the one before it

* @param liveSpaceUsedByInProgressCompactions The disk space, in bytes, currently used by temporary SSTables.
* @return boolean indicating whether there is enough disk space available to proceed with the compaction.
*/
private boolean checkAvailableDiskSpace(long estimatedSSTables,
long expectedTotalWriteSize,
long expectedSpaceUsedByCompactions,
long liveSpaceUsedByInProgressCompactions)
{
long writeSize = expectedTotalWriteSize / estimatedSSTables;
long totalAvailable = 0L;
long totalSpace = 0L;
long spaceNeededForInProgressCompactions = Math.max(liveSpaceUsedByInProgressCompactions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused why this is a max and not just expectedSpaceUsedByCompactions - liveSpaceUsedByInProgressCompactions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the estimation for how big the compaction will be is an estimation. In most cases space used by tmp tables takes up more space than the initial estimation, with this setup we will never end up with a negative number meaning we are underestimating the needed space.

expectedSpaceUsedByCompactions - liveSpaceUsedByInProgressCompactions);

for (DataDirectory dataDir : dataDirectories)
{
if (DisallowedDirectories.isUnwritable(getLocationForDisk(dataDir)))
continue;
DataDirectoryCandidate candidate = new DataDirectoryCandidate(dataDir);
// exclude directory if its total writeSize does not fit to data directory
if (insufficientDiskSpaceForWriteSize(candidate.availableSpace - expectedSpaceUsedByCompactions, candidate.totalSpace, writeSize))
if (insufficientDiskSpaceForWriteSize(candidate.availableSpace - spaceNeededForInProgressCompactions, candidate.totalSpace, writeSize))
continue;
totalAvailable += candidate.availableSpace;
totalSpace += candidate.totalSpace;
}
if (insufficientDiskSpaceForWriteSize(totalAvailable - expectedSpaceUsedByCompactions, totalSpace, expectedTotalWriteSize))
if (insufficientDiskSpaceForWriteSize(totalAvailable - spaceNeededForInProgressCompactions, totalSpace, expectedTotalWriteSize))
{
logger.warn("Insufficient space for compaction - total available space found: {}MB for compaction with"
+ " expected size {}MB, with total disk space {}MB and max disk usage by compaction at {}%",
Expand Down