This is great work! The speed implications are insane.
I'd be interested in making an SHMT framework implementation for apple silicon! Other than this project here for the hardware specified, any active development being done for apple silicon SHMT that you know of?
The non-Nvidia portion should run with docker, but definitely won't run on iOS w/o extra work.