-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise without unrolling #940
base: main
Are you sure you want to change the base?
Conversation
this is blocked waiting for new jll |
This is looking good, based on PRONTOLab/GB-25#79 (comment). Don't have performance numbers though. |
@giordano if you can confirm runtime perf is reasonable by comparison lets merge |
I'll need to do some testing on GPU, profiling on CPU is broken according to @Pangoraw so we don't have any numbers there. |
we can still do lazy @Btime on outside |
Uhm, this seems to degrade performance a lot: according to profiling information, on A100 3000 iterations take 58s with Reactant v0.2.45 (Reactant_jll v0.0.92), on this branch (and Reactant_jll v0.0.93) they take 4m 15s |
Yeah without the parallelize pass the raised code will be pretty inefficient. |
No description provided.