-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel doesn't work on Windows (mclapply) #176
Comments
Thanks for the comment. Note to self: mclapply is called on line 137 of mboot.R. We have just updated the bootstrap code, and it may get rid of this problem, but I haven't tested it. Another note to self: we only actually run in parallel if the number of observations/clusters is large enough, so I think this is the explanation for the difference in behavior across applications. I am going to keep this open for now. |
I am also facing this issue with a data set that has tens of millions of observations on a Windows machine. I am getting the same error when I try to run with multiple cores. |
Here are two possible solutions: Option 1: base tools onlyThe base R function for implementing a psock cluster is if(n > 2500 & pl == TRUE & cores > 1) {
if (.Platform$OS.type == "windows") {
cl_cores <- parallel::makeCluster(cores)
on.exit(parallel::stopCluster(cl_cores))
results = parallel::parLapply(cl = cl_cores, chunks, FUN = parallel.function)
} else {
results = parallel::mclapply(chunks, FUN = parallel.function, mc.cores = cores)
}
results = do.call(rbind, results)
} (FWIW this kind of conditional strategy is super common. It's what a bunch of popular bootstrapping packages and functions use, e.g. Option 2:
|
When I attempt to run some specifications in parallel, I get a warning that mclapply can't operate on Windows. However, in other cases the parallelization seems to work fine (or at least doesn't throw an error).
Example of a specification that doesn't work:
I haven't been able to figure out why some specifications produce this error and others (such as the "real data" example in your vignette) seem to work fine. Maybe some later commenter will be able to figure out what triggers the error. But if the code calls mclapply at least in some cases, those cases won't run on Windows.
There may be a way to avoid this error by replacing calls to mclapply (which uses forking, which Windows doesn't do) with parLapply, which works on Windows but takes more setup due to the need to explicitly pass objects to the workers. Alternatively, at a cost in memory efficiency and speed, you could use the parallelsugar package to overwrite mclapply on Windows machines:
https://www.r-bloggers.com/2014/07/implementing-mclapply-on-windows-a-primer-on-embarrassingly-parallel-computation-on-multicore-systems-with-r/
https://www.r-bloggers.com/2015/10/parallelsugar-an-implementation-of-mclapply-for-windows/
https://www.r-bloggers.com/2019/06/parallel-r-socket-or-fork/
Thanks for providing this great package!!
The text was updated successfully, but these errors were encountered: