fix(wip): use MPS when available except for sparse computations #5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Selecting MPS as a backend when CUDA is unavailable does not initially work properly, as the code uses sparse tensors and the SparseMPS backend is not implemented. This commit changes the code so that when MPS is in use, sparse tensors are created on the CPU instead, and then moved back to the same device as the rest of the computation. Since Apple Silicon has unified memory, this "movement" should be an accounting change only and a no-op in terms of actual data movement.
I haven't fully tested this yet, but it provides a significant performance improvement on an M3 MBA, taking the time to execute all cells in the circuit_tracing_tutorial.ipynb from ~19min to <1min.
Note: this stacks on top of #4, cf #1