Open
Description
Currently we can limit concurrency by specifying a number of parameters to process at once which is pretty rough.
It should be possible to automatically limit concurrency based on things like memory usage (see https://github.com/google-research/t5x/blob/6b02d25cd67a397c6cfffe90ad2cca4b343535ae/t5x/checkpoints.py#L461), compute used, number of open files, etc. Automatic limiting could even help avoid write contention on disk.
https://death.andgravity.com/limit-concurrency includes ways to implement concurrency limits (Note: I don't think their issue with our Semaphore solution isn't really relevant to us)