threadpoolexecutor for distributing MPI by sahiljhawar · Pull Request #230 · nuclear-multimessenger-astronomy/nmma

sahiljhawar · 2023-09-10T15:12:12Z

This PR more closely follows the idea given in #215, repeatedely calling light_curve_analysis within the script. This parallelisation is free from all the issues faced in #229. Multiple methods are implemented to leverage the parallelisation based on user preference.
List of commands with explanation:

multi_config_analysis --config config.yaml --parallel --process 20 : This runs all the configurations in parallel by dividing 20 processes equally amongst the configs. Everything runs in parallel.
multi_config_analysis --config config.yaml --process 20: This runs all the configurations in serial but each configuration use 20 processes one after another.
multi_config_analysis --config config.yaml --parallel: If --process-per-config is given in yaml (if given, should be given individually to all the configs), then those many numbers of processes are assigned to each configuration. Everything runs in parallel.

Things to note:

--process is strictly required if --process-per-config is not given
--process-per-config takes precedence over --process

injection.log now works as it is expected to; logs from concurrent runs do not leak.

tylerbarna · 2023-09-11T20:55:23Z

if one runs multi_model_analysis --config config.yaml --parallel --process 20 for a config file that has like 30 different fits, will it start with 20 fits and then queue up the remaining 10 to start once the others have finished?

sahiljhawar · 2023-09-11T22:06:08Z

@tylerbarna That's a good question and I didn't had this case in my mind. When --parallel is set, in this case 20//30 will be 0, hence the configs will run with mpiexec -np 0 ... Even though it's 0, the process starts with 1 process. Reading online and on ChatGPT it seems the execution will depend on the implementation of MPI. In my case -np 0 does not fails and start with 1 process.

tylerbarna · 2023-09-12T19:29:50Z

@tylerbarna That's a good question and I didn't had this case in my mind. When --parallel is set, in this case 20//30 will be 0, hence the configs will run with mpiexec -np 0 ... Even though it's 0, the process starts with 1 process. Reading online and on ChatGPT it seems the execution will depend on the implementation of MPI. In my case -np 0 does not fails and start with 1 process.

looking through some stackoverflow pages, concurrent.futures might already handle this (see here), but I've seen some conflicting info. I've also seen some things along the following lines that explicitly use concurrent.futures.Queue():

import concurrent.futures

# Create a thread pool executor with 4 threads
executor = concurrent.futures.ThreadPoolExecutor(max_workers=4)

# Create a queue
queue = concurrent.futures.Queue()

# Add tasks to the queue
queue.put(task1)
queue.put(task2)
queue.put(task3)

# Start the executor
executor.map(task_function, queue)

# Wait for the executor to finish
executor.shutdown()

but I haven't experimented much with it.

There's also multiprocessing,queue

sahiljhawar · 2023-09-12T19:42:20Z

@tylerbarna I have indeed used the concurrent.futures :') But what is the conflicting info you have seen?

tylerbarna · 2023-09-12T19:45:53Z

y

@tylerbarna I have indeed used the concurrent.futures :') But what is the conflicting info you have seen?

Conflicting info regarding whether or not submitting more jobs than the max number of workers handles the queueing automatically

sahiljhawar · 2023-09-12T19:53:23Z

@tylerbarna Okay. We can try to implement along these lines. Maybe, defaulting to 1 process per configuration if no. of config > no. of process mentioned. Or making it strict policy for the no of processes to be an integer multiple of the no of configurations, if analysis needs to be in parallel. Also try to implement a queue based execution, as what you have asked here #230 (comment)

@mcoughlin @tsunhopang Any comments regarding these?

tylerbarna · 2023-09-12T19:55:52Z

@tylerbarna Okay. We can try to implement along these lines. Maybe, defaulting to 1 process per configuration if no. of config > no. of process mentioned. Or making it strict policy for the no of processes to be an integer multiple of the no of configurations, if analysis needs to be in parallel.

yeah, the former option seems like it would be best. There can be situations where we might have a larger number of jobs that we want to fit than number of cores we can allocate to one job in a cluster, but being able to still parallelize using the cores we have is good

mcoughlin · 2023-09-12T20:36:20Z

@sahiljhawar I have no particularly strong preference, whatever works for folks.

sahiljhawar · 2023-10-16T16:52:43Z

Parallelised*
ThreadPoolExcecutor or ProcessPoolExecutor, to avoid unforseen issues with GIL when all configurations are running in parallel.

sahiljhawar · 2024-01-25T17:25:46Z

@tsunhopang

tsunhopang · 2024-01-25T19:35:28Z

could u add a small test for this executable? E.g. Run on At2017gfo with two models

sahiljhawar · 2024-01-26T15:45:38Z

@tsunhopang yeah, I am trying to do that but there seems to be some issues with config file Path.

…_threadpool_exec

sahiljhawar · 2024-10-17T19:12:04Z

@tsunhopang this is working now. can you review?

mcoughlin

Looks good to me

sahiljhawar · 2024-10-18T09:32:21Z

The test and code actually works but sometimes it is cancelled/exits due to Github runners limited resources.

mcoughlin requested review from tsunhopang and tylerbarna September 10, 2023 15:19

sahiljhawar mentioned this pull request Sep 25, 2023

multi config analysis #229

Closed

sahiljhawar marked this pull request as draft October 16, 2023 16:42

multi config analysis

ca98d7d

sahiljhawar force-pushed the multi_model_threadpool_exec branch from b4ac52a to ca98d7d Compare January 25, 2024 16:26

sahiljhawar marked this pull request as ready for review January 25, 2024 16:27

sahiljhawar and others added 11 commits January 26, 2024 18:16

add tests

c633bcc

tests once again

783467a

Update continous_integration.yml

e71275d

Update multi_config_analysis.py

8ee8905

Rename testing function

bac3fd1

change to relative import

fcfe09c

add logic for checking is args is None

06c6195

working multi config

2150a37

Merge branch 'nuclear-multimessenger-astronomy:main' into multi_model…

70e4768

…_threadpool_exec

working multi config

90dbfc8

only 2 cores

89bbf0d

sahiljhawar added 3 commits October 17, 2024 00:33

Update continous_integration.yml

1f7401c

Update continous_integration.yml

215880a

Update continous_integration.yml

e7ad0a2

Update continous_integration.yml

09af789

sahiljhawar requested a review from mcoughlin October 17, 2024 19:13

mcoughlin approved these changes Oct 17, 2024

View reviewed changes

tylerbarna reviewed Oct 17, 2024

View reviewed changes

Comment thread .github/workflows/continous_integration.yml

sahiljhawar and others added 2 commits October 17, 2024 21:28

Update continous_integration.yml

975200d

remove file deletion

ad62332

sahiljhawar merged commit 3b0b3b0 into nuclear-multimessenger-astronomy:main Oct 21, 2024

Conversation

sahiljhawar commented Sep 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Things to note:

Uh oh!

tylerbarna commented Sep 11, 2023

Uh oh!

sahiljhawar commented Sep 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tylerbarna commented Sep 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sahiljhawar commented Sep 12, 2023

Uh oh!

tylerbarna commented Sep 12, 2023

Uh oh!

sahiljhawar commented Sep 12, 2023

Uh oh!

tylerbarna commented Sep 12, 2023

Uh oh!

mcoughlin commented Sep 12, 2023

Uh oh!

sahiljhawar commented Oct 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sahiljhawar commented Jan 25, 2024

Uh oh!

tsunhopang commented Jan 25, 2024

Uh oh!

sahiljhawar commented Jan 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sahiljhawar commented Oct 17, 2024

Uh oh!

mcoughlin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sahiljhawar commented Oct 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sahiljhawar commented Sep 10, 2023 •

edited

Loading

sahiljhawar commented Sep 11, 2023 •

edited

Loading

tylerbarna commented Sep 12, 2023 •

edited

Loading

sahiljhawar commented Oct 16, 2023 •

edited

Loading

sahiljhawar commented Jan 26, 2024 •

edited

Loading