Remove process limit in parallel_bundle_adjust#449
Remove process limit in parallel_bundle_adjust#449AndrewAnnex wants to merge 1 commit intoNeoGeographyToolkit:masterfrom
Conversation
…ng run on each node when resources allow. Otherwise you have to allocate many nodes with few cores to scale.
|
okay I am now seeing some indication that num_instances actually corresponds to the total number of jobs that will be run as opposed to the number of jobs to run simultaneously. investigating further |
|
after reviewing the source better I still I believe correct about the --max-procs thing above, but the python code change I suggest isn't actually needed (or desirable). A new commit will correct the documentation and revert the change, and possibly add some refactoring the num_instances to a better name. |
|
The code that you were suggesting to remove was on purpose. If bundle adjust only needs to run 4 jobs to do all the work it needs, but the user says use 20 processes, spinning up so many processes with nothing to do does not add anything. Note that in the latest build there was a fix recently. Now, if there are 500 match files to be computed, then 500 jobs will be started, one per match file. Not all at the same time. This has the advantage of maximizing the use of the nodes and better load balancing. There is overhead though of so many jobs, with each going through the motions of loading cameras and statistics. I haven't yet figured a good way of keeping the nodes busy while eliminating such preliminary work. |
|
@oleg-alexandrov not sure if you saw but I worked that out in this comment #449 (comment). I need to update the issue title to reflect my new understanding. As far as I am aware now, just setting the --processes and --threads to specific numbers to the size available is all that should be necessary, the docs just didn't make that clear to me as currently written. if each job independently loads the cameras and statistics for the cameras it cares about that shouldn't create a bottleneck. But if there is inter-job dependence (I don't see why there would be) then that's something I could look at if you point out the files/functions/lines to inspect. |
|
Yes, I saw your latest comment, but I thought it was still worth mentioning why things as now.
Unfortunately all cameras and statistics are loaded each time. I don't think this is a big bottleneck with CSM. Maybe in the future something could be done about this. |
|
Do you mean that each job loads all the stats and cameras for every camera/image, even those not specifically used for a particular matching job? For example say I have 10 images, and 50 pair-wise matching jobs, does each job that uses 2 images load all 10 cameras and all 10 stat files? I would think it would just load 2 cameras and 2 stat files |
|
Yes, all cameras and stats are loaded, even the ones that are not needed. Trying to customize this depending on the desired match file would require quite a lot of book-keeping changes, so trying to avoid that unless necessary. We now have a single bundle_adjust tool that is used in many ways, so the more special cases we have the harder it would be to maintain. Normally the operating system is smart enough to keep in RAM recently used data, so the disk I/O should not bad even if a large sequence of processes on same node keep on asking for same thing. This was more me talking to myself about limitations of the approach, hoping it is fine. |
removes limit on procs that I suspect prevents multiple jobs from being run on each node when resources allow.
Otherwise you have to allocate many nodes with few cores to scale.
Description
The '--max-procs' option for gnu parallel limits the number of jobs that can be run simultaneously across all machines available, not per node: (
Run up to num jobs in parallel). I believe if the intent was to specify the number of jobs to run per node (machine), then the--jobsor-jparameter should be used instead. The documentation specifically around--max-procsis very thin and unspecific, but given other tutorials and example usages I've seen it is far more common to use the--jobsparameter instead.The current behavior limits that number to the number of instances, so if 8 instances are available to parallel, only 8 jobs can run simultaneously, even if many cores are allocated. This contradicts the description of
--processes <integer>that states:Alternatively, one can allocate many nodes, and limit the core count per node, but on some clusters that may not be as easy to schedule as jobs that use a smaller number of entire nodes.
The current fix I have is just to trust the user and delete the check.
So far I haven't tested this yet but expect this to be the problem
Opting to using
runInGnuParallel()here is likely the better next step but haven't explored that yet.If this fix is generally acceptable, then I'll go about updating documentation prior to merging of this pr.
Motivation and Context
How Has This Been Tested?
Not yet, in progress
Types of changes
Checklist:
Licensing:
This project is released under the LICENSE.