-
Notifications
You must be signed in to change notification settings - Fork 86
GPU support - Design feedback requested #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you! Sure, let's take the right time. Also we need to consider everything from the user experience perspective as you mentioned. If we can assure that we have zero-overhead from those checks and we document them correctly, it should not be a problem to have the method to fork into a GPU-specific routine. |
Sounds good. Yes, core tenants of my approach are transparent GPU acceleration layer, minimal changes to code base, zero API changes, and zero changes required to developer code. Existing devs will just have to enable new "gpu" feature, run existing code and get performance boost. Optionally set threshold if they desire, nothing else. Overhead for checks is nearly non-existent. Checking # rows/ samples against threshold is O(1), and tryFrom<> slightly more but not much. Not worried about that part, more worried about architecture of the GPU layer in and of itself. Anyway, I'm back to my NLU engine. will swing back to this in a couple weeks. If I don't hear from you before then, will drop message here to hopefully restart discussion before I jump back into it to ensure I'm on the right track. Question, is this feature beneficial and worthwhile to the crate? Essentially what happened is I had three choices, burn out my laptop's CPU, develop Python into my Rust pipeline, or just add GPU acceleration into smartcore keeping everything Rust. I went with the last option. Cheers, |
Hi Lorenzo, Sorry for the delay, but I'm back, and ready to finish the GPU acceleration update. I think it would be a great bump for the classic ML eco-system in Rust. I'm quite confident the design and architecture I have in place is solid, but again, wouldn't mind extra eyes on it. If you're happy with it, let me know, and I'll finish the upgrade. I'll make you a deal. This upgrade already took me way more time than I initially thought. I'll put in the hours to finish it, but in exchange, you promote my latest piece to your audience. Text: https://cicero.sh/r/hows-the-ai-revolution Audio: https://youtu.be/xmSSmpvFFaI George Carlin inspiration, but kind of my way of checking in with the universe. Kind of a "hello, is there anyone out there? can I get a pulse" kind of thing. Think George Carlin. Share that with your followers, and I'll put in the hours to finish the transparent GPU acceleration layer. Deal? Cheers, |
Hi Lorenzo, Thanks for adding me to the repo, I appreciate it. Still need a reply to my message though before I finish the update. Again, will make you a deal. I have two pieces, same message, different delivery: Serious manifesto: https://cicero.sh/r/manifesto Satire comedy, George Carlin style: https://cicero.sh/r/hows-the-ai-revolution Both deliver the same message, just in different styles. That comedy piece was meant to try and cut through the noise that's out there. Promote either of those pieces to your audience / followers, and in exchange, I'll put in the hours to finish the update. smartcore will get a nice, transparent GPU acceleration layer, and it will be a nice boost for the classic ML Rust ecosystem. That's a fair exchange, right? You've already seen my code, so you know I can do the work. I wouldn't mind getting it verified, but I'm confident my design and architecture is correct. If you need my WhatsApp, just e-mail me at [email protected] and I'll send it over. Take care, |
Hello, PS. I will delete these comments in a due time as they don't look strictly pertinent to software contribution |
Sorry, but that's just not how life works. I was asking a fellow human if they could do a small favor for me while I'm busting my ass to help them out. I'm a Buddhist, kharma is real. You get what you give. You can go ahead and delete my access to the repo. I don't need it anymore. Matt |
I think you have answered yourself: karma is unconditional giving to the world; what you proposed is do ut des, that may sound wrong in some contexts. No problem. See you next. |
Hi Lorenzo,
Ok, submitted PR as requested, but again this code is nowhere close to being ready to merge. However, could use some design guidance if possible. Quite confident I have this right, but not 100% certain. Turns out GPU acceleration is rather intricate with various design considerations and trade offs.
To go through how a typical run will work, first as you can see within /src/gpu/models/logistic_regression/gradient_descent.rs there's that TryFrom<> trait implementation.
Then within fit() function of LogisticRegression I will add something like:
And it will be similar small code blocks for each algorithm supported, and that's really the only modifications to actual smartcore code base. The API is going to remain exactly the same.
As long as the TryFrom<> implementation converts whatever we throw at it into a Box, indicating the algorithm supports GPU acceleration, and if the number of rows / samples is large enough, then simply a GpuWorker struct is instantiated and it handles everything then returns the exact same result the smartcore crate expects.
I think that should work, right?
Then GpuStation struct I think I got right... it's a global static, which caches / lazy loads the various GPU resources which I believe I have grouped together correctly. Only supports one GPU device for now, I figured get one working for now, then enhance to multiple device support later if needed.
One somewhat unfortunate thing is there's no real way to keep the model architectures and solvers seperate like in the smartcore design. It's just not going to work for GPU acceleration.
Then there's just a bunch of other little design trade offs throughout. For example, I don't like that GpuAlgorithm enum in there as it seems like a dirty, lost orphan enum to me, but it's actually kind of required. We need some type of hashable unique identifier for each algorithm, as it's needed within the hash map key of the workgroups property of GpuStation. And when you start playing around trying to put a trait object inside a hash map key wrapped in a mutex, things get really messy really quickly.
Anyway, I think I'm on the right track, but if I could get some extra eyes on this it would be greatly appreciated. Absolutely no rush, as again I'm going to pivot off this for a couple weeks and get the next version of my NLU engine out. I will be back to finish this though.
Cheers,
Matt