Skip to content

GPU support - Design feedback requested #310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

cicero-ai
Copy link

Hi Lorenzo,

Ok, submitted PR as requested, but again this code is nowhere close to being ready to merge. However, could use some design guidance if possible. Quite confident I have this right, but not 100% certain. Turns out GPU acceleration is rather intricate with various design considerations and trade offs.

To go through how a typical run will work, first as you can see within /src/gpu/models/logistic_regression/gradient_descent.rs there's that TryFrom<> trait implementation.

Then within fit() function of LogisticRegression I will add something like:

#[cfg(feature = "gpu")]
if let Ok(gpu_module) = (&params, k).try_into() {
    // Add some cool way to check # rows / samples threshold, and if matrix large enough, run it on GPU
    let mut gpu_worker = GpuWorker::new(gpu_module, X.into(), &Y);
    return gpu_worker.run().into();
}

And it will be similar small code blocks for each algorithm supported, and that's really the only modifications to actual smartcore code base. The API is going to remain exactly the same.

As long as the TryFrom<> implementation converts whatever we throw at it into a Box, indicating the algorithm supports GPU acceleration, and if the number of rows / samples is large enough, then simply a GpuWorker struct is instantiated and it handles everything then returns the exact same result the smartcore crate expects.

I think that should work, right?

Then GpuStation struct I think I got right... it's a global static, which caches / lazy loads the various GPU resources which I believe I have grouped together correctly. Only supports one GPU device for now, I figured get one working for now, then enhance to multiple device support later if needed.

One somewhat unfortunate thing is there's no real way to keep the model architectures and solvers seperate like in the smartcore design. It's just not going to work for GPU acceleration.

Then there's just a bunch of other little design trade offs throughout. For example, I don't like that GpuAlgorithm enum in there as it seems like a dirty, lost orphan enum to me, but it's actually kind of required. We need some type of hashable unique identifier for each algorithm, as it's needed within the hash map key of the workgroups property of GpuStation. And when you start playing around trying to put a trait object inside a hash map key wrapped in a mutex, things get really messy really quickly.

Anyway, I think I'm on the right track, but if I could get some extra eyes on this it would be greatly appreciated. Absolutely no rush, as again I'm going to pivot off this for a couple weeks and get the next version of my NLU engine out. I will be back to finish this though.

Cheers,
Matt

@Mec-iS
Copy link
Collaborator

Mec-iS commented Jun 20, 2025

Thank you!

Sure, let's take the right time. Also we need to consider everything from the user experience perspective as you mentioned.

If we can assure that we have zero-overhead from those checks and we document them correctly, it should not be a problem to have the method to fork into a GPU-specific routine.

@cicero-ai
Copy link
Author

Sounds good. Yes, core tenants of my approach are transparent GPU acceleration layer, minimal changes to code base, zero API changes, and zero changes required to developer code. Existing devs will just have to enable new "gpu" feature, run existing code and get performance boost. Optionally set threshold if they desire, nothing else.

Overhead for checks is nearly non-existent. Checking # rows/ samples against threshold is O(1), and tryFrom<> slightly more but not much. Not worried about that part, more worried about architecture of the GPU layer in and of itself.

Anyway, I'm back to my NLU engine. will swing back to this in a couple weeks. If I don't hear from you before then, will drop message here to hopefully restart discussion before I jump back into it to ensure I'm on the right track.

Question, is this feature beneficial and worthwhile to the crate? Essentially what happened is I had three choices, burn out my laptop's CPU, develop Python into my Rust pipeline, or just add GPU acceleration into smartcore keeping everything Rust. I went with the last option.

Cheers,
Matt

@cicero-ai
Copy link
Author

Hi Lorenzo,

Sorry for the delay, but I'm back, and ready to finish the GPU acceleration update. I think it would be a great bump for the classic ML eco-system in Rust.

I'm quite confident the design and architecture I have in place is solid, but again, wouldn't mind extra eyes on it. If you're happy with it, let me know, and I'll finish the upgrade.

I'll make you a deal. This upgrade already took me way more time than I initially thought. I'll put in the hours to finish it, but in exchange, you promote my latest piece to your audience.

Text: https://cicero.sh/r/hows-the-ai-revolution

Audio: https://youtu.be/xmSSmpvFFaI

George Carlin inspiration, but kind of my way of checking in with the universe. Kind of a "hello, is there anyone out there? can I get a pulse" kind of thing. Think George Carlin.

Share that with your followers, and I'll put in the hours to finish the transparent GPU acceleration layer. Deal?

Cheers,
Matt

@cicero-ai
Copy link
Author

Hi Lorenzo,

Thanks for adding me to the repo, I appreciate it. Still need a reply to my message though before I finish the update.

Again, will make you a deal. I have two pieces, same message, different delivery:

Serious manifesto: https://cicero.sh/r/manifesto

Satire comedy, George Carlin style: https://cicero.sh/r/hows-the-ai-revolution

Both deliver the same message, just in different styles. That comedy piece was meant to try and cut through the noise that's out there.

Promote either of those pieces to your audience / followers, and in exchange, I'll put in the hours to finish the update. smartcore will get a nice, transparent GPU acceleration layer, and it will be a nice boost for the classic ML Rust ecosystem. That's a fair exchange, right?

You've already seen my code, so you know I can do the work. I wouldn't mind getting it verified, but I'm confident my design and architecture is correct.

If you need my WhatsApp, just e-mail me at [email protected] and I'll send it over.

Take care,
Matt

@Mec-iS
Copy link
Collaborator

Mec-iS commented Jul 16, 2025

Hello,
anybody is free to contribute according to the contribution guidelines, Open Source work is usually voluntary, unconditional and free from obligations when not explicitly specified. Also, it would sound unfair to me in the respect of all the others that contributed unconditionally.
If you want to contribute you are free to put some of your time in it or otherwise. Up to you to decide what to do with your time.
Cheers,
Lorenzo

PS. I will delete these comments in a due time as they don't look strictly pertinent to software contribution

@cicero-ai
Copy link
Author

Sorry, but that's just not how life works. I was asking a fellow human if they could do a small favor for me while I'm busting my ass to help them out.

I'm a Buddhist, kharma is real. You get what you give.

You can go ahead and delete my access to the repo. I don't need it anymore.

Matt

@Mec-iS
Copy link
Collaborator

Mec-iS commented Jul 17, 2025

I think you have answered yourself: karma is unconditional giving to the world; what you proposed is do ut des, that may sound wrong in some contexts.

No problem. See you next.

@Mec-iS Mec-iS closed this Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants