Skip to content

Commit

Permalink
[fix](catalog) Fix mark handling for failed tasks to maintain getLeft…
Browse files Browse the repository at this point in the history
…Marks (#46205)

Many pieces of code use `getLeftMarks` to check whether a task is
finished. Therefore, when a task fails, its mark should not be deleted
directly. For example, when creating a tablet, the left marks are used
to track how many replicas were successfully created. If the number of
successful replicas exceeds the quorum, the tablet is considered
successfully created. However, if the mark is deleted due to failure, it
could lead to a situation where the majority of replicas fail to be
created, but the tablet is still mistakenly considered as successfully
created.

This PR addresses the issue by saving the mark of the failed task in a
separate map while maintaining the method's idempotency.
  • Loading branch information
w41ter authored Dec 31, 2024
1 parent 8781666 commit 22eb777
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 3 deletions.
2 changes: 1 addition & 1 deletion be/src/olap/data_dir.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -497,7 +497,7 @@ Status DataDir::load() {
}
if (rowset_partition_id_eq_0_num > config::ignore_invalid_partition_id_rowset_num) {
throw Exception(Status::FatalError(
"roswet partition id eq 0 is {} bigger than config {}, be exit, plz check be.INFO",
"rowset partition id eq 0 is {} bigger than config {}, be exit, plz check be.INFO",
rowset_partition_id_eq_0_num, config::ignore_invalid_partition_id_rowset_num));
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,13 @@
public class MarkedCountDownLatch<K, V> extends CountDownLatch {

private Multimap<K, V> marks;
private Multimap<K, V> failedMarks;
private Status st = Status.OK;

public MarkedCountDownLatch(int count) {
super(count);
marks = HashMultimap.create();
failedMarks = HashMultimap.create();
}

public synchronized void addMark(K key, V value) {
Expand All @@ -54,7 +56,13 @@ public synchronized boolean markedCountDownWithStatus(K key, V value, Status sta
st = status;
}

if (marks.remove(key, value)) {
// Since marks are used to determine whether a task is completed, we should not remove
// a mark if the task has failed rather than finished. To maintain the idempotency of
// this method, we store failed marks in a separate map.
//
// Search `getLeftMarks` for details.
if (!failedMarks.containsEntry(key, value)) {
failedMarks.put(key, value);
super.countDown();
return true;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -663,7 +663,7 @@ private void finishAlterInvertedIndexTask(AgentTask task, TFinishTaskRequest req
}

AlterInvertedIndexTask alterInvertedIndexTask = (AlterInvertedIndexTask) task;
LOG.info("beigin finish AlterInvertedIndexTask: {}, tablet: {}, toString: {}",
LOG.info("begin finish AlterInvertedIndexTask: {}, tablet: {}, toString: {}",
alterInvertedIndexTask.getSignature(),
alterInvertedIndexTask.getTabletId(),
alterInvertedIndexTask.toString());
Expand Down

0 comments on commit 22eb777

Please sign in to comment.