Skip to content

Remove process limit in parallel_bundle_adjust#449

Closed
AndrewAnnex wants to merge 1 commit intoNeoGeographyToolkit:masterfrom
AndrewAnnex:fix_parallelism_parallel_bundle_adjust
Closed

Remove process limit in parallel_bundle_adjust#449
AndrewAnnex wants to merge 1 commit intoNeoGeographyToolkit:masterfrom
AndrewAnnex:fix_parallelism_parallel_bundle_adjust

Conversation

@AndrewAnnex
Copy link
Copy Markdown
Contributor

removes limit on procs that I suspect prevents multiple jobs from being run on each node when resources allow.
Otherwise you have to allocate many nodes with few cores to scale.

Description

The '--max-procs' option for gnu parallel limits the number of jobs that can be run simultaneously across all machines available, not per node: (Run up to num jobs in parallel). I believe if the intent was to specify the number of jobs to run per node (machine), then the --jobs or -j parameter should be used instead. The documentation specifically around --max-procs is very thin and unspecific, but given other tutorials and example usages I've seen it is far more common to use the --jobs parameter instead.

The current behavior limits that number to the number of instances, so if 8 instances are available to parallel, only 8 jobs can run simultaneously, even if many cores are allocated. This contradicts the description of --processes <integer> that states:

The number of processes to use per node. The default is to use as many processes as cores.

Alternatively, one can allocate many nodes, and limit the core count per node, but on some clusters that may not be as easy to schedule as jobs that use a smaller number of entire nodes.

The current fix I have is just to trust the user and delete the check.
So far I haven't tested this yet but expect this to be the problem

Opting to using runInGnuParallel() here is likely the better next step but haven't explored that yet.

If this fix is generally acceptable, then I'll go about updating documentation prior to merging of this pr.

Motivation and Context

How Has This Been Tested?

Not yet, in progress

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Licensing:

This project is released under the LICENSE.

  • I dedicate any and all copyright interest in my contributions in this pull request to the public domain. I make this dedication for the benefit of the public at large and to the detriment of my heirs and successors. I intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this contribution under copyright law.

…ng run on each node when resources allow. Otherwise you have to allocate many nodes with few cores to scale.
@AndrewAnnex
Copy link
Copy Markdown
Contributor Author

okay I am now seeing some indication that num_instances actually corresponds to the total number of jobs that will be run as opposed to the number of jobs to run simultaneously. investigating further

@AndrewAnnex
Copy link
Copy Markdown
Contributor Author

AndrewAnnex commented Mar 10, 2025

after reviewing the source better I still I believe correct about the --max-procs thing above, but the python code change I suggest isn't actually needed (or desirable). A new commit will correct the documentation and revert the change, and possibly add some refactoring the num_instances to a better name.

@oleg-alexandrov
Copy link
Copy Markdown
Member

oleg-alexandrov commented Mar 10, 2025

The code that you were suggesting to remove was on purpose. If bundle adjust only needs to run 4 jobs to do all the work it needs, but the user says use 20 processes, spinning up so many processes with nothing to do does not add anything.

Note that in the latest build there was a fix recently. Now, if there are 500 match files to be computed, then 500 jobs will be started, one per match file. Not all at the same time. This has the advantage of maximizing the use of the nodes and better load balancing.

There is overhead though of so many jobs, with each going through the motions of loading cameras and statistics. I haven't yet figured a good way of keeping the nodes busy while eliminating such preliminary work.

@AndrewAnnex
Copy link
Copy Markdown
Contributor Author

@oleg-alexandrov not sure if you saw but I worked that out in this comment #449 (comment).

I need to update the issue title to reflect my new understanding. As far as I am aware now, just setting the --processes and --threads to specific numbers to the size available is all that should be necessary, the docs just didn't make that clear to me as currently written.

if each job independently loads the cameras and statistics for the cameras it cares about that shouldn't create a bottleneck. But if there is inter-job dependence (I don't see why there would be) then that's something I could look at if you point out the files/functions/lines to inspect.

@oleg-alexandrov
Copy link
Copy Markdown
Member

oleg-alexandrov commented Mar 10, 2025

Yes, I saw your latest comment, but I thought it was still worth mentioning why things as now.

if each job independently loads the cameras and statistics for the cameras it cares about that shouldn't create a bottleneck.

Unfortunately all cameras and statistics are loaded each time. I don't think this is a big bottleneck with CSM. Maybe in the future something could be done about this.

@AndrewAnnex
Copy link
Copy Markdown
Contributor Author

AndrewAnnex commented Mar 10, 2025

Do you mean that each job loads all the stats and cameras for every camera/image, even those not specifically used for a particular matching job? For example say I have 10 images, and 50 pair-wise matching jobs, does each job that uses 2 images load all 10 cameras and all 10 stat files? I would think it would just load 2 cameras and 2 stat files

@oleg-alexandrov
Copy link
Copy Markdown
Member

oleg-alexandrov commented Mar 10, 2025

Yes, all cameras and stats are loaded, even the ones that are not needed. Trying to customize this depending on the desired match file would require quite a lot of book-keeping changes, so trying to avoid that unless necessary. We now have a single bundle_adjust tool that is used in many ways, so the more special cases we have the harder it would be to maintain.

Normally the operating system is smart enough to keep in RAM recently used data, so the disk I/O should not bad even if a large sequence of processes on same node keep on asking for same thing. This was more me talking to myself about limitations of the approach, hoping it is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants