-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile / typecheck perf regression in 0.16 from proc macro changes #18103
Comments
I've been struggling with this for a while now, but wasn't sure if it was a rust-analzyer problem or a bevy problem. It's been painfully slow for me (around a minute per cargo check after edits in some cases). I'm not sure how or why it impacts examples but they also seem to trigger full re-check after this change. 669d139
main
|
Would be interesting to check how the numbers change with
set. That would force proc-macros and build scripts to be built with optimizations on. Given the bisection points to #17330 it might be that whatever that crate is doing is just very slow without in debug. |
Removing
|
Thanks for the suggestions of stuff to try, and the tentative PR. So far, no luck - I observe the same check/RA durations with or without FWIW, we don't use the |
Looking into this now. I have some suspicions. |
The crux of the issue appears to be: parsing the cargo manifest is expensive, and so is doing the work required to not parse the manifest every time. Naturally the best approach would be to stop parsing the manifest at all (which we do to determine whether our derive macros should use the In an ideal world, we could just do I've found a solution, but it will involve more per-macro boilerplate (and dirtier module re-exports for the top-level bevy crate). The benefit is that at compile time, we naturally select the derive impl we need without any manifest parsing: // in bevy_ecs/src/florp.rs
pub use bevy_ecs_macros::Florp;
pub use bevy_ecs_macros::FlorpBevy;
pub trait Florp {}
// in bevy_ecs/src/lib.rs
pub mod prelude {
pub use crate::{
florp::Florp
}
}
// in bevy_ecs/macros/lib.rs
#[proc_macro_derive(Florp)]
pub fn derive_florp(input: TokenStream) -> TokenStream {
derive_florp_internal(input, &bevy_macro_utils::try_parse_str("bevy_ecs").unwrap())
}
#[proc_macro_derive(FlorpBevy)]
pub fn derive_florp_bevy(input: TokenStream) -> TokenStream {
derive_florp_internal(
input,
&bevy_macro_utils::try_parse_str("bevy::ecs").unwrap(),
)
}
fn derive_florp_internal(input: TokenStream, bevy_ecs_path: &syn::Path) -> TokenStream {
let ast = parse_macro_input!(input as DeriveInput);
let name = ast.ident;
let (impl_generics, ty_generics, where_clauses) = ast.generics.split_for_impl();
TokenStream::from(quote! {
impl #impl_generics #bevy_ecs_path::florp::Florp for #name #ty_generics #where_clauses {
}
})
}
// in bevy_internal
pub mod ecs {
pub use bevy_ecs::*;
pub mod prelude {
pub use super::florp::FlorpBevy as Florp;
pub use bevy_ecs::prelude::*;
}
pub mod florp {
pub use bevy_ecs::florp::{FlorpBevy as Florp, *};
}
} |
Also note that reverting #17330 would naively re-introduce some significant Rust Analyzer autocomplete breakages (for things like |
I really like the compile-time behavior of the approach above (and I expect it would provide tangible performance improvements, both relative to current main with the regression, and relative to the old approach). But the "cost" is still reasonably high:
|
Another path I tried was using a |
Another downside to the proposal here is if a user renames |
Hey @h3r2tic! I've put together a branch that moves this logic back into Bevy proper, switches us to a RwLock, and slims down the logic a bit: https://github.com/cart/bevy/tree/internalize-manifest-handling If you get a chance, can you check the performance on the project you got the numbers above for? Ideally that brings us closer to the original performance. So far I haven't found alternatives other than the static approach listed above (which has enough tradeoffs that I'd prefer not to). Hopefully eventually Rust solves this problem upstream. The approach used by |
I unfortunately haven't yet been able to reproduce the regression (using the bevy crate, or my various test projects). I really need to find (or build) a heftier Bevy project to test these things. |
Hey, @cart! Awesome work in that branch, thank you for jumping on this! :) Your change completely fixes the performance regression. Here's some numbers: before your changes (cca5813), rustc 1.85
after your changes (a207767), rustc 1.85
I was also curious how the numbers looked in comparison to our pre-upgrade ones, and they do indeed look essentially the same. The only real diff is due to rustc versions. Our mainline is still on 1.78 (the upgrade to 1.85 requires fixing a naughty crate, and upgrading a few others). 0.14, before we did the pre-upgrade, rustc 1.78
0.14, before we did the pre-upgrade, rustc 1.85 (a bit borken)
(For my own reference: the file I'm modifying: |
Music to my ears. Thank you so much for catching this and for following back up. Very glad this was caught and fixed as this is an issue close to my heart (and mental well being). |
# Objective Fixes #18103 #17330 introduced a significant compile time performance regression (affects normal builds, clippy, and Rust Analyzer). While it did fix the type-resolution bug (and the general approach there is still our best known solution to the problem that doesn't involve [significant maintenance overhead](#18103 (comment))), the changes had a couple of issues: 1. It used a Mutex, which poses a significant threat to parallelization. 2. It externalized existing, relatively simple, performance critical Bevy code to a crate outside of our control. I am not comfortable doing that for cases like this. Going forward @bevyengine/maintainer-team should be much stricter about this. 3. There were a number of other areas that introduced complexity and overhead that I consider unnecessary for our use case. On a case by case basis, if we encounter a need for more capabilities we can add them (and weigh them against the cost of doing so). ## Solution 1. I moved us back to our original code as a baseline 2. I selectively ported over the minimal changes required to fix the type resolution bug 3. I swapped `Mutex<BTreeMap<PathBuf, &'static Mutex<CargoManifest>>>` for `RwLock<BTreeMap<PathBuf, CargoManifest>>`. Note that I used the `parking_lot` RwLock because it has a mapping API that enables us to return mapped guards.
Hey @cart! I just noticed this issue. Sorry for causing such a ruckus! Optional story timePlot summaryWhen I first started work on the Well no of course not. Resolving proc macro paths is much more complicated than that. Soon complaints arose stemming from the fact that users had bevy in their Now I read this issue and felt kind of dumb for not benchmarking my build times. In my tiny bevy projects I did not notice the slowdown. I now wrote a lot of benchmarking infrastructure and played around with flamechart to track and fix the big slowdowns. Addressed in: When I benchmark bevy with Regarding supply chain attacksSupply chain attacks are sadly a real threat. I do not want to publicly state my real name in a manner that is quick and easy to look up for just anyone. ConclusionContributing to bevy has been a great honor and pleasure so far! I would have never spotted the double dependency bug without the brave bevy main branch testers. I understand the decision to not use my crate and was already expecting its removal. @h3r2tic you seem to have quite a big bevy project to test performance on. |
Thanks for the overview @raldone01! Your work is very appreciated (and most of the key ideas are still present in the bevy codebase). I'm still biased toward keeping this code internalized though. Trust is certainly one concern, but it isn't the only motivator. Dependencies regularly cause issues for us (ex: keeping dependencies up to date / in sync, unintended semver violations breaking us unexpectedly, changes made that haven't gone through our vetting process, etc). I'd like to further reign things in, so please don't feel singled out here :) |
Bevy version
Relevant system information
What's performing poorly?
Howdy! I've upgraded Tiny Glade's Bevy deps to
0.16.0-dev
to play around with auto-registration of reflection info, and I've noticed Rust Analyzer being sluggish. The Rust Analyzer slowdown is also reflected in running build/check/clippy. I've bisected it down to #17330The test I'm doing is touching/modifying a single .rs file in our chonkiest crate The numbers are best out of a few runs.
Using bevy at commit 669d139:
clippy
) after a ctrl+s: 6.53scargo clippy
: 5.36scargo check
: 1.98scargo build
(touch one .rs file, no change): 3.03scargo build
: (modify one function) 4.39sUsing bevy at commit 1b7db89:
clippy
) after a ctrl+s: 12.40scargo clippy
: 11.19scargo check
: 7.83scargo build
(touch one .rs file, no change): 8.78scargo build
(modify one function): 10.18sBefore and After Traces
Traces for touching a single .rs file and running
cargo build --timings
Additional information
Here's our Bevy dependencies:
The text was updated successfully, but these errors were encountered: