-
Notifications
You must be signed in to change notification settings - Fork 754
performance issues (pcntl_fork overhead?) #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I know it's not really a solution, but running multiple workers checking the same queues might be beneficial for you to get higher throughput. Forking is by its very nature "slow", so I'm not too shocked to hear the performance isn't where you'd need it to be. Forking every X jobs might be a nice idea and something that could be built in to the core. In terms of communication between the processes, you could use Let me know what you're thinking. |
The performance for per-job forking is that it takes about 50ms per job. This means about 20 jobs per second per worker maximum. To handle our queue we would need 50-100 workers, which is INSANE. Because of this, I'm leaning towards a fix that won't fork on every job. I noticed that redis keeps a record of what the worker is working on. I'm working on a fix that will catch failures in a child that processes multiple items, and use the redis re-create the job object and fail it. This would be an option, which means it could play nicely with the current implementation. If this doesn't work out so well, I'll see what I can do w/ |
See pull request here: #35 |
Any news on this or my pull request above? |
I'd be interested in hearing more pros and especially about the cons of this? |
Cons include the loss of process-level isolation of jobs. In the current model (1 fork every 1 jobs), each job operates in complete and total isolation (memory-wise) from every other job. This is a very good idea for failure-tolerance - especially when any given job could spontaneously encounter a fatal error, or even segfault. It is possible to figure out some kind of failure detection mechanism and work around this, but the options are each more hackish and/or unreliable than the last - and you still have to replace the now-dead worker and figure out what job(s) to hand it - a process which is likely to take longer than the original fork itself. I want to also point out, albeit rather belatedly, that Something else that will help is having a PHP binary that doesn't contain (and a config that doesn't load) any features you aren't going to use in your workers. So will anything else that limits the amount of memory consumed per process. Each fork creates a full copy of the parent process's memory space, so the less you load, the less the OS needs to copy, and the less time it takes to do so. I'm guessing you already knew most of that, though, so I hope you take the bits you knew as advice for others who didn't. :) |
I know that your initial post is about 2 years old. But since it's till open.. I'm curious what you came up with since then @kballenegger. Another possibility I'm considering is to not fork at all and do some alternate process control (like you can with Supervisord). That removes a tremendous amount of overhead. I have jobs of around 100ms, that is just too slow. Shaving 50ms off would save a couple of servers. |
@Dynom I kind of gave up; making nice things in PHP is too hard / impossible. These days I built everything in Ruby or C. |
I agree with @Dynom, as far as resque itself goes, since it was written with Ruby first. I am going to try to run my daemon and jobs in Ruby, but leave the app code itself PHP since we would never get approval to port all of our apps to Ruby. |
An alternative approach might be a solution like this: https://github.com/salimane/php-resque-jobs-per-fork , this idea introduces (at least) the following: con's:
pro's:
|
I've done some tests and I have very promising results. Heavy jobs that used to finished in 200ms now finish in 30ms and other jobs have very similar results. As soon as I find some time I will make a PR and I hope that @chrisboulton will merge it in asap |
I've done some test and I've seen some promising results. Large jobs taking 200ms now take 35ms. I've created PR #130, which is an up-to-date implementation of the work by Salimane's work here: https://github.com/salimane/php-resque-jobs-per-fork This PR introduces an environment variable |
Hi @kballenegger, sorry I did not see your PR. Is it still running successfully in production ? |
@Dynom - We don't use Resque in our production systems any longer; we moved to more durable queuing & event stream systems as we grew even further in scale (combination of SQS, Rabbit, & Kafka, nowadays). From what I recall, however, the fork worked great. We ran it with quite a bit of traffic for a while. |
Ok. I actually think that this won't hold for long and I'm already looking at alternatives. But we need some improvements now so I'll switch to this fix using either PR and create some time to investigate alternatives. SQS is not an option and I'm unsure about Rabbit. I liked the Resque approach because it gives me great reliability in the queue. Instead of having SPF's with mindless brokers and the like. How do you handle that ? |
So I've started playing with a deployment of this in production and seem to be having performance issues with pcntl_fork. Processing an empty job (that just contain an
error_log
statement) takes over 50ms, and our queue needs to be able to process more than 1000 items per second.I'm thinking I might have to fork the project and change the behavior so that instead of forking on every job, to fork on every X jobs (where X > 100). My concern is that I haven't figured out a way to communicate between processes in PHP that would let me pass back the job object about to be processed from the child to the parent. Ideas?
The text was updated successfully, but these errors were encountered: