WIP: Try to speed things up with async #209

blester125 · 2023-04-14T17:15:42Z

This PR adds an asyncify function to try to turn sync code into async code. Still testing if this speeds things up.

This PR adds an `asyncify` function to try to turn sync code into async code. Still testing if this speeds things up.

blester125 · 2023-04-14T18:15:18Z

Currently there seems to be issues with the threading library numba uses, on my linux box the omp module is used but this isn't available on osx so the workgroup thing has issues. There is also an intel tbb library but there are issues with getting that installed (I had to install it system wide in linux but it seems to need to be installed in osx and the suggested pip install tbb doesn't work)

blester125 · 2023-04-21T15:58:43Z

Some clarifications:

The default threading library numba uses can't handle a threaded numba function (the prange, etc) being called from different threads. This is what happens when we asyncify the hash using the thread executor.
TBB was just the first threading library I tried as it was supposed to be pip installable. Using the OpenMP library is another option, it is just harder to install
- Having to install pre-reqs beyond pip can induce a lot of friction, which we already have some of from needing git-lfs
We want to async the hash because it is currently serial. The whole runtime from an last await -> hashing -> next await, no other async task can run (i.e. we can't start shoving a tensor through a pipe to git-lfs or async interactions with tensorstore).
We talked about using multiprocessing, the main issue with that is we will be working in m = # of process chunks and long io tasks like piping tensors to git-lfs will mot have concurrency beyond the m.
- Scaling m > mp.cpu_count() could help on the IO bound parts but would probably cause contention in the compute bound parts.
- Additionally we would need to be careful that the parameter tree is not loaded before the mp.pool in created, otherwise each process have a copy of the parameters.
- This would also add an extra IPC cost as parameters need to be passed from the main process to each worker or we would need to use the shared memory array and have some sort of numpy translation layer (and ensure it doesn't result in copies).
  - I did some research on shared memory (we could use the RawArray class as we don't need to lock it, embarrassingly parallel). It seems like we could use this in a smudge (as we know the shape of the paramters based on the metadata) but during clean we would need to us IPC (if we load the model to get the shapes, the model parameters would copied in the fork and each process would have a full copy of the model :()
We want to test if the git-lfs-filter process is able to handle concurrent requests.

craffel · 2023-04-25T19:49:10Z

The default threading library numba uses can't handle a threaded numba function (the prange, etc) being called from different threads. This is what happens when we asyncify the hash using the thread executor.

...

We want to async the hash because it is currently serial. The whole runtime from an last await -> hashing -> next await, no other async task can run (i.e. we can't start shoving a tensor through a pipe to git-lfs or async interactions with tensorstore).

The latter is caused by the former, right? I wonder if we can use a different library than numba for acceleration that might better support the threading we want to do. If we are only using numba for accelerating a slow for loop or something, we could use some other library (maybe Cython or something, no idea which ones support what we want). Or is the issue that numba is already doing multithreading, so we can't do additional threading during the numba call?

blester125 · 2023-04-25T22:17:30Z

Currently our LSH hash is implemented numba it is already using multithreading (to parallelize a for loop, also it avoids materializing a large intermediate array) but that threading is all encapsulated within numba.

From the prospective of the caller, this hash call is blocking, it uses threads under the hood but the caller needs to sit there until it is done. Normally in async, the concurrency comes from waiting (on things like I/O or this hash to finish) but the hash never says "ok, I'm waiting, you can run other things."

The default way to "asyncify" something like that is to run it in another thread. This gives us back a wrapper that will say "ok, I'm waiting" until the computation is done in that thread, so we can await on that in async code and everyone is happy.

The issue is that this means we are now launching multithreaded numba code from multiple threads which their default threading library can't deal with. Different libraries can enable different safety settings docs. For example, the OpenMP lib on linux can hand if we call multithreaded numba from multiple threads but it couldn't if we called multithreaded numba code from multiple processes.

craffel · 2023-04-26T15:02:35Z

Hm, I see. While there may be a solution, this seems like something that we can punt on for now.

blester125 added 3 commits April 14, 2023 13:09

Try to speed things up with async

b8f06bc

This PR adds an `asyncify` function to try to turn sync code into async code. Still testing if this speeds things up.

use better threading library

6c940b2

use better threading library

4fc2143

blester125 linked an issue Apr 21, 2023 that may be closed by this pull request

Ability to Async blocking computations #159

Open

blester125 marked this pull request as draft April 21, 2023 20:06

blester125 added 2 commits April 25, 2023 13:34

ups

2ce9f20

detect threading library

cfd9594

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: Try to speed things up with async #209

WIP: Try to speed things up with async #209

Uh oh!

blester125 commented Apr 14, 2023

Uh oh!

blester125 commented Apr 14, 2023

Uh oh!

blester125 commented Apr 21, 2023 •

edited

Loading

Uh oh!

craffel commented Apr 25, 2023

Uh oh!

blester125 commented Apr 25, 2023

Uh oh!

craffel commented Apr 26, 2023

Uh oh!

Uh oh!

WIP: Try to speed things up with async #209

Are you sure you want to change the base?

WIP: Try to speed things up with async #209

Uh oh!

Conversation

blester125 commented Apr 14, 2023

Uh oh!

blester125 commented Apr 14, 2023

Uh oh!

blester125 commented Apr 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

craffel commented Apr 25, 2023

Uh oh!

blester125 commented Apr 25, 2023

Uh oh!

craffel commented Apr 26, 2023

Uh oh!

Uh oh!

blester125 commented Apr 21, 2023 •

edited

Loading