-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a rare CME in LatencyUtils #1984
base: 2.0
Are you sure you want to change the base?
Fix a rare CME in LatencyUtils #1984
Conversation
But WHY does it occur on folia? |
I looked and I only see this method being called from a single thread so I don't understand how it's throwing a CME |
I have no clue all I know how to do is to make it stop happening. Quite frankly I don't really care why its happening I just want to fix it and move on. |
We've seen this occur on a 1.8 modified paper server too a few days ago
If it was from different threads, the locks that are already in the code would prevent it. This issue has to be the same thread calling to add a transaction in some codepath, while iterating thru them. |
It you trace all possible iterations of this code, this is impossible.
addRealTimeTask() executes runnables outside of the synchronized block. Its possible for another thread to call addRealTimeTask() and run a runnable on another thread that modifies the transactionMap or finishes executing after the lock has released in another thread or part of the code. If you get really unlucky and modify transactionMap while its being iterated over in handleNettySyncTransaction() you can get a CME. |
* Future considerations: | ||
* If this becomes a performance bottleneck, we may revisit using a single-pass approach | ||
* on non-Folia servers. We could also explore concurrent data structures or parallel | ||
* execution, but this would lose the guarantee that transactions are processed in order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative solution if this is a performance issue, is to do the queuing on the add method instead.
If any of the add task methods are called while we're iterating, add them to a list, and once the iteration is done add everything from that list to the transaction map.
If everything is working ideally the list will always be empty and be a non-issue, if something breaks the list may see some use
59eb669
to
96a5e00
Compare
could using a Index: src/main/java/ac/grim/grimac/utils/latency/LatencyUtils.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/main/java/ac/grim/grimac/utils/latency/LatencyUtils.java b/src/main/java/ac/grim/grimac/utils/latency/LatencyUtils.java
--- a/src/main/java/ac/grim/grimac/utils/latency/LatencyUtils.java (revision e329171b113351ef91cfad0312031dbe7417b27f)
+++ b/src/main/java/ac/grim/grimac/utils/latency/LatencyUtils.java (date 1738622032295)
@@ -7,11 +7,10 @@
import ac.grim.grimac.utils.data.Pair;
import com.github.retrooper.packetevents.netty.channel.ChannelHelper;
-import java.util.LinkedList;
-import java.util.ListIterator;
+import java.util.concurrent.ConcurrentLinkedDeque;
public class LatencyUtils {
- private final LinkedList<Pair<Integer, Runnable>> transactionMap = new LinkedList<>();
+ private final ConcurrentLinkedDeque<Pair<Integer, Runnable>> transactionMap = new ConcurrentLinkedDeque<>();
private final GrimPlayer player;
public LatencyUtils(GrimPlayer player) {
@@ -42,9 +41,7 @@
public void handleNettySyncTransaction(int transaction) {
synchronized (this) {
- for (ListIterator<Pair<Integer, Runnable>> iterator = transactionMap.listIterator(); iterator.hasNext(); ) {
- Pair<Integer, Runnable> pair = iterator.next();
-
+ for (Pair<Integer, Runnable> pair : transactionMap) {
// We are at most a tick ahead when running tasks based on transactions, meaning this is too far
if (transaction + 1 < pair.first())
return;
@@ -64,7 +61,7 @@
}
}
// We ran a task, remove it from the list
- iterator.remove();
+ transactionMap.remove(pair);
}
}
} |
As I mentioned in my PR description:
ConcurrentLinkedQueue isn't magic unfortunately, its possible for iteration of the data structure to not reflect its underlying state which could lead to subtle bugs. For example if a runnable deletes an item in the list (not head or tail) it could lead to iteration skipping some runnables. Even if our own code doesn't do this we have no guarantee when we open up Grim with the check API that other checks won't trigger these issues (and that is something I'm unsure about). CopyOnWriteArrayList or some variant of it might be what we're looking for but that would be even slower than what I'm already doing. I would rather we fail obviously and hard than have to debug a stupid error 10 months from now with processing transactions no one can figure out. |
synchronized (this) { | ||
for (ListIterator<Pair<Integer, Runnable>> iterator = transactionMap.listIterator(); iterator.hasNext(); ) { | ||
tasksToRun.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this happen first, and not last? This is preventing it being a local variable. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could totally be a local variable, but by having it non-local you save on the memory allocation overhead and just use already allocated memory
at least that's my assumption of why its done this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a very minor optimization that saves you the cost of having to call new
on every iteration (that is allocate and deallocate the ArrayList), in addition since its an ArrayList the underlying list will never be shrunk, meaning fewer O(n) resizes than if we used a local variable. If you log the size of the list you'll see that most of the time its 1 and occasionally it'll spike to 50 and every so often a few hundred.
In the case where a few hundred entries are added several resizes are required under the hood which costs time and allocations we save by clearing instead.
This was partially explained in my original PR post
Added a class-level ArrayList to store tasks between passes, which is cleared and reused each call to avoid unnecessary allocations.
- Using ArrayList for tasksToRun as it only needs sequential adds and iteration
- Keeping it as a class field to reuse the backing array
Problem
The original implementation of
handleNettySyncTransaction
could throw a ConcurrentModificationException if a runnable task modified thetransactionMap
(either directly or by callingaddRealTimeTask
). This was observed on Folia servers but could theoretically occur in any environment.Solution
Refactored
handleNettySyncTransaction
to use a two-pass approach:transactionMap
using the iteratorAdded a class-level
ArrayList<Runnable>
to store tasks between passes, which is cleared and reused each call to avoid unnecessary allocations.Implementation Details
private final ArrayList<Runnable> tasksToRun
fieldPerformance Considerations
tasksToRun
as it only needs sequential adds and iterationtransactionMap
due to its frequent size changes and removal patternsTesting
This change makes the code more robust by preventing CMEs rather than relying on documentation to prevent them.