-
Notifications
You must be signed in to change notification settings - Fork 21
A future created inside of a parallel collection map / foreach sometimes blocks the fork join pool with JDK 11 #12106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
does using cc @viktorklang |
@SethTisue |
@SethTisue Where should I use |
I think this is very likely, as the thread which is waiting is inside of the |
@OndrejSpanel |
I have tested it and it does not help, at least when used as this:
|
@viktorklang I find such definition unusual. Compare with https://en.wikipedia.org/wiki/Blocking_(computing):
|
@OndrejSpanel Remember that when you execute on a thread pool you have cooperative scheduling for the tasks, not preemptive, so in this case, no matter if you are busy-spinning or sleeping you are preventing progress for other tasks. |
"busy-spinning" is just a simple replacement of a CPU intensive computation found in the real application where I see the issue. I could use a recursive Fibonacci computation instead, or something like that, but When I execute one long task on a pool of 8 threads, I do not expect it to be blocking the whole process. The same code works on Java 8 without any issues and I do not see any reason why it should not. There is some problem with Java 11 scheduling (most likely caused by some FJP implementation changes), which perhaps causes some short task to be scheduled on the thread which is busy with the long task, or something like that. Spawning more threads (which is the result of using By posting the issue here I expect to achieve two things:
Fixing the issue either in Scala or in Java would be nice, but I do not expect for this to happen soon enough to help me in any way. |
Putting the Since the same thread pool is used, maybe the surface area of the reproduction can be reduced? Avoid using parallel collections, or maybe even write a java-only reproducer? @OndrejSpanel do you have time to give this a try? Or even in the current state, it might be worth reporting it to Java concurrency experts. http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest for example be active. For the record, I can reproduce the issue as described in the ticket on my macOS. |
I did not see the issue when not using parallel collections. It is possible it could be reproduced by expanding the parallel collections to the plain Java executor calls, I am not sure how much work this would be.
I am not sure I want to dedicate much time to this. Given the issue shows with Java 11 which is available to general public, I guess a priority for me will be finding a suitable workaround, as I cannot expect the users of my application will avoid this particular runtime. |
Understood, thanks. I also looked at it in JMC. Here's the stack trace of the waiting executor thread:
So the wait starts here: https://github.com/scala/scala/blob/v2.12.12/src/library/scala/collection/parallel/Tasks.scala#L174 It seems that the I'll try to dig a little further in the next few days. |
I am not sure I understand you correctly. If you check the source code of the repro, you will see the first iteration of the parallel collection spawns a future, but this does not mean the future becomes the subtask of the collection - the subtasks finishes normally without any further interaction with the spawned future. I repeat the corresponding source lines: state.foreach { x =>
if (triggerIssue && x == 0) {
Future {
hotSleep(205)
}
}
hotSleep(2)
} |
Yes, the future task should be independent of the parallel collections task, but what seems to happen is that the parallel collections task waits for the future to complete at the code location I pointed to. I'll try to figure out if that's really it, and why. |
After looking at the implementation of parallel collections and futures for a while, I managed to isolate this issue entierly from Scala and wrote a reproducer using only Java and its ForkJoinPool. https://gist.github.com/lrytz/a622c541d91336030a8cb788273bdc66 When running on Java 11 or later, this code regularaly prints
Before taking this outside, people with concurrency experience that are watching this repo (e.g., @viktorklang, @NthPortal, @som-snytt, @retronym, ...) could maybe take a look at the Java source and share what they think.
|
@lrytz As an aside, you'll want to use System.nanoTime (since currentTimeMillis may have some very "interesting" behaviors) |
@lrytz After having a quick look at the reproducer, I think it might be worth raising the issue on concurrency-interest. |
Thank you, Viktor. |
I ran the Java example with the FJP parallelism set to 4 (that triggers the issue a lot more often than higher parallelism levels) and took 10 seconds flight recording on Java 8 and 11. There's an obvious difference: on Java 8 there are a lot of FJP threads (22): On Java 11 there are only 5 threads: I'll write a post on concurrency-interest. |
@lrytz I've put it on Backlog since it isn't clear there's something to do on our end, but you could milestone it for a release if you like |
Can something be done with this issue? I have recently tested new Java versions against the code provided by @lrytz in https://gist.github.com/lrytz/a622c541d91336030a8cb788273bdc66 and it seems to me it is happening even more frequently in newer versions. When performing 300 iterations and counting suspicious durations, I get results like this: Java 8: Suspicious duration encountered 0 times (All Oracle JDK, I have also tested Temurin 21 with the same result) The fact the short tasks on fork-join pool may be randomly blocked by a longer task sounds like quite a fundamental issue and it is hard to understand for me why it does not seem important. Regarding the questions in #12106 (comment)
I am convinced it is a bug in the JDK, but I am unsure what could I do to get some attention to it.
I do not see any link here - was there any follow up? |
@scala/collections |
We need a scala/concurrent group, @SethTisue I took a look at the original sample and updated for 2.13. I don't see a problem. On my machine, 8 available processors, if it happens that 4 long-runners are running, then the remaining 4 threads are not enough to beat the timer on the wall clock. This does not seem mind-bending to me. Maybe the overhead is surprising. The OP says 50 tasks at 2 ms = 100 ms, it should never take twice that by wall clock. I tried to refresh my memory about scala.concurrent internals, but actually I never learned about the complex machinery. I did just notice that I had backported to parallel collections that the optimization to The other footnote is that the default pool has async mode on, so jobs are run in order; normally, you would fork and run that computation right away (to join and then continue with whatever algorithm). I think for par operations, I would not use global pool. Sample debug output, showing that when loop 10 takes too long, it's because 4 threads are working hard and the other threads are running the futures from previous loops, F2 is when they complete:
No "inner" tasks run on the same worker thread as a "future". It occurs to me that (without consulting the code) maybe tasks do wind up on the same worker as the long-runner, until they are taken by a different worker; that could result in added delays and overhead. Perhaps Lukas's example in Java shows the same thing, namely, the fork/join takes longer because threads are occupied by the long-running computations, plus scheduling overhead. At the end of a run, there will be a thread or two draining their remaining task or two, as opposed to four threads completing the last four tasks in parallel. (I didn't refresh my memory about how FJP actually works.) I checked JDK 8, which seems to do better with the degenerate workload, with greater throughput on fewer threads. Here is a future about to squat on thread 153 (with time at left):
INNER2 is after the busy wait, so as each thread finishes a task, it churns through the next one yolo style. Maybe there's doc for scheduling improvements that explains why the newer behavior is better. |
The parallel collection should be always able to execute its tasks on its own (main) thread. The calling thread can never be used for long-runner, which are run as futures, using EC. In the JCM capture in the original post it can be seen the main thread is blocked, waiting for worker threads to complete. |
Does that show 2.12 behavior? I didn't experiment with 2.12, and I'm not sure if the scala.concurrent changes for 2.13 could make a difference? I assume your main is joining the tasks, which must be the two long bars which are not all green. I don't know (much, or enough) about the internals of FJP to guess why those threads seem to be stuck. I could imagine that is a FJP bug. I'll also experiment on JVM 11/17. Also I haven't checked for bug reports, and quickly abandoned looking for a release note. On 2.13/JDK 21, pool threads are running the foreach tasks in parallel with the futures, which complete later. But maybe I just haven't hit this even-more-pathological case yet. |
I must be wrong because here it waits for futures 0-2 to complete before reporting after the foreach. This is JDK 11. (I removed a sleep I'd added to confuse myself.)
|
I cannot seem to find an online archive of concurrency-interest, so here's what I have from my email. I got one answer. https://gist.github.com/lrytz/a979bf2b410fc9058be317189856671c |
So it's just trying to be helpful by assuming you used the wrong API for a subtask. This is like the optimization I mentioned for Future. I never quite understood how Future was intended to interact meaningfully with the API, so at least forking under the hood was a gesture. |
@som-snytt I have re-read your last comment several times and I am afraid it structure is too complex for me to understand what it means. Can you rewrite it in simpler words, please?
Back then I have implemented a hacky workaround of an execution context which is not starting and futures while parallel loop is running. Now I am revisiting this and checking if perhaps I would be able to implement a custom Execution Context and Task Support to have a simpler and more robust solution. I have tried using As another basic experiment I have tried what @lihaoyi wrote in scala/toolkit#31 (comment) - and voila, it does not show the issue even with Java 11 - 21 and the default Fork Join Pool execution context. See https://github.com/OndrejSpanel/ParLoopAndFuture/tree/use-futures I guess this perhaps answer the part of the question "why is nobody interested in the issue". It does not show without parallel collections and maybe nobody is using parallel collections for any serious work? |
@OndrejSpanel I was trying to say that Future doesn't let you do fork/join, and its default pool is "async" (event-driven), so it really caters to a certain kind of program. Future used to have a line of code to say that if you try to run something on the "current" fork/join pool, it will try to fork instead of execute (submit from "outside" client). That is similar to what the pool is doing with the Future in your example. But I haven't looked yet at the internals. Parallel collections still has that behavior. Thanks for the update, I'll try out your branch. I don't know the answer to your last question, but the lesson from the issue is clearly not to mix future and par. By "clearly", I mean, "I assume so." |
I am afraid this means avoiding par completely - I can hardly imagine writing any async Scala code avoiding futures. I like the par simplicity, but issues like this make me think the sentiment expressed in scala/toolkit#31 (comment) may be substantiated. |
If you write the equivalent code with Java streams, do you have the same problem? One quick way to get into Java Stream land is to convert any Scala collections to arrays with Another option to get get quickly into Java-stream-land is by using I don't really see why this should be different, since Java's streams use FJP too. But if for some reason it does work, it might be a not-excruciating workaround for |
I think futures use FJP too, as this is what At the moment I am experimenting with a simplistic future based replacement of par (I need only val source = Array.ofDim[Any](coll.size)
val result = Array.ofDim[Any](coll.size)
val taskCount = coll.size
val taskIndex = new AtomicInteger(0)
val tasksDone = new AtomicInteger(0)
coll.copyToArray(source)
@tailrec
def worker(): Boolean = {
val index = taskIndex.getAndIncrement()
if (index < taskCount) {
result(index) = f(source(index).asInstanceOf[T])
tasksDone.getAndIncrement()
worker()
} else false
}
val maxConcurrency = numberOfProcessors - 1 // calling thread is doing work as well
val numberOfWorkers = taskCount - 1 min maxConcurrency
var i = 0
while (i < numberOfWorkers) {
Future(worker())
i += 1
}
// perform tasks on the main thread while waiting for futures to complete
worker()
// busy-wait for the rest of the futures
while (tasksDone.get() < taskCount) {
Thread.onSpinWait()
}
result.asInstanceOf[Array[S]] See also https://github.com/OndrejSpanel/ParLoopAndFuture/tree/custom-par |
I use that kind of thing quite often, and it's handy as long as you use it with keen awareness of when and what you're doing, and I think you also need a barrier there to ensure that the array updates are all made before the array is passed off to the caller--honestly, I'd just stuff the arrays into a class which accesses them with If you use Java streams, or another battle-tested library, this stuff has been thought through already; getting around one flaw in one of those libraries is good, but you might stumble into others. |
Barrier should be done by As for "battle-tested libraries" I am afraid my trust in them is seriously shattered after this experience. I will give Java stream a try, though - I would prefer something standard before self-made. |
I thought |
Let me check: Atomic variable access function have memory fence guarantees, this is what JavaDoc in JDK 8 sources says about
Note: this is what sources contain and IDE shows it, but web docs stay surprisingly silent on the topic. The link says:
JLS 17.4.4. Synchronization Order
Where hb is happens-before |
The latest commit has a familiar face The package had the doc on memory effect. I wonder if they'd accept a PR changing Thanks for the interesting thread. |
I see now. The information is a bit more hidden than in the JDK 11 sources, but it is there. The relevant docs are referenced in the common header as
That page says later:
Newer versions are more verbose in this respect: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/concurrent/atomic/AtomicInteger.html - it is just that Google is showing JDK 8 docs as a most prominent hit. |
But that's happens-before only for the variable. You need happens-before for the array, which might or might not be a side-effect of how the happens-before for the variable is implemented. |
The h-b is not defined for variables, rather for actions. It is also defined to be transitive. That should be enough. |
Yes, you're right, looks like that's enough! |
reproduction steps
(using Scala 2.12.12 or 2.13.3)
problem
Observe the message "Suspicious background duration ..." being printed multiple times. This messages shows that a problem with a parallel
foreach
scheduling has been detected, namely that a long-duration future created from the parallelforeach
has blocked theforeach
executtion. This happens both with Scala 2.12.12 and Scala 2.13.3 running on JDK 11.0.4. It does not happen when running on JDK / JRE 8. It seems to happen intermittently, probably depending on exact timing of theFuture
call relative the to parallel collection tasks execution.The
foreach
should normally terminate in much less than 200 ms, as 50 tasks are executed, each of them taking 2 ms. Once it takes more than 200 ms, it shows there is a scheduling problem with the slow future blocking the parallel collection tasks.Following JFR screenshot shows the thread scheduling once the issue is seen (you can also see normals iterations without the issue):
My configuration is Windows 10 x64 build 2004, Intel i7 / 4x core.
The code demonstrating the issue is https://github.com/OndrejSpanel/ParLoopAndFuture/blob/c2f4d1b6b155217d8f8111f58e6a7c9337f36619/src/main/scala/com/github/ondrejspanel/parFuture/Main.scala#L1-L45
The text was updated successfully, but these errors were encountered: