Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile / typecheck perf regression in 0.16 from proc macro changes #18103

Closed
h3r2tic opened this issue Mar 1, 2025 · 16 comments · Fixed by #18263
Closed

Compile / typecheck perf regression in 0.16 from proc macro changes #18103

h3r2tic opened this issue Mar 1, 2025 · 16 comments · Fixed by #18263
Labels
A-Cross-Cutting Impacts the entire engine C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times P-Regression Functionality that used to work but no longer does. Add a test for this!
Milestone

Comments

@h3r2tic
Copy link

h3r2tic commented Mar 1, 2025

Bevy version

Relevant system information

cargo --version
cargo 1.85.0 (d73d2caf9 2024-12-31)
uname -a
Linux h3-arch 6.12.10-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 18 Jan 2025 02:26:57 +0000 x86_64 GNU/Linux

What's performing poorly?

Howdy! I've upgraded Tiny Glade's Bevy deps to 0.16.0-dev to play around with auto-registration of reflection info, and I've noticed Rust Analyzer being sluggish. The Rust Analyzer slowdown is also reflected in running build/check/clippy. I've bisected it down to #17330

The test I'm doing is touching/modifying a single .rs file in our chonkiest crate The numbers are best out of a few runs.

Using bevy at commit 669d139:

  • Rust Analyzer build (running clippy) after a ctrl+s: 6.53s
  • cargo clippy: 5.36s
  • cargo check: 1.98s
  • cargo build (touch one .rs file, no change): 3.03s
  • cargo build: (modify one function) 4.39s

Using bevy at commit 1b7db89:

  • Rust Analyzer build (running clippy) after a ctrl+s: 12.40s
  • cargo clippy: 11.19s
  • cargo check: 7.83s
  • cargo build (touch one .rs file, no change): 8.78s
  • cargo build (modify one function): 10.18s

Before and After Traces

Traces for touching a single .rs file and running cargo build --timings

Additional information

Here's our Bevy dependencies:

bevy_app = "0.16.0-dev"
bevy_ecs = { version = "0.16.0-dev", features = ["multi_threaded", "serialize"] }
bevy_input = { version = "0.16.0-dev", features = ["serialize"] }
bevy_log = "0.16.0-dev"
bevy_reflect = { version = "0.16.0-dev" }
bevy_state = { version = "0.16.0-dev", default-features = false, features = ["bevy_app", "bevy_reflect"] }
bevy_tasks = "0.16.0-dev"
bevy_time = { version = "0.16.0-dev", default-features = false }
@h3r2tic h3r2tic added C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times P-Regression Functionality that used to work but no longer does. Add a test for this! S-Needs-Triage This issue needs to be labelled labels Mar 1, 2025
@aloucks
Copy link
Contributor

aloucks commented Mar 1, 2025

I've been struggling with this for a while now, but wasn't sure if it was a rust-analzyer problem or a bevy problem. It's been painfully slow for me (around a minute per cargo check after edits in some cases). I'm not sure how or why it impacts examples but they also seem to trigger full re-check after this change.

669d139
crates/bevy_ecs/lib.rs # add whitespace change

cargo check
Finished `dev` profile [unoptimized + debuginfo] target(s) in 14.26s

main
crates/bevy_ecs/lib.rs # add whitespace change

cargo check
Finished `dev` profile [unoptimized + debuginfo] target(s) in 48.51s

@Veykril
Copy link
Contributor

Veykril commented Mar 1, 2025

Would be interesting to check how the numbers change with

[profile.dev.build-override]
opt-level = 3

set. That would force proc-macros and build scripts to be built with optimizations on. Given the bisection points to #17330 it might be that whatever that crate is doing is just very slow without in debug.

@aloucks
Copy link
Contributor

aloucks commented Mar 1, 2025

Removing default-features = false from the bevy_* [dev-dependencies] made a huge improvement for me both from the command line and Rust-Analyzer. R-A is still doing something strange, but it's better overall.

$ git diff Cargo.toml
diff --git a/Cargo.toml b/Cargo.toml
index 7bd99fc3a..08ccaf6e3 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -504,14 +504,14 @@ flate2 = "1.0"
 serde = { version = "1", features = ["derive"] }
 serde_json = "1"
 bytemuck = "1.7"
-bevy_render = { path = "crates/bevy_render", version = "0.16.0-dev", default-features = false }
+bevy_render = { path = "crates/bevy_render", version = "0.16.0-dev" }
 # The following explicit dependencies are needed for proc macros to work inside of examples as they are part of the bevy crate itself.
-bevy_ecs = { path = "crates/bevy_ecs", version = "0.16.0-dev", default-features = false }
-bevy_state = { path = "crates/bevy_state", version = "0.16.0-dev", default-features = false }
-bevy_asset = { path = "crates/bevy_asset", version = "0.16.0-dev", default-features = false }
-bevy_reflect = { path = "crates/bevy_reflect", version = "0.16.0-dev", default-features = false }
-bevy_image = { path = "crates/bevy_image", version = "0.16.0-dev", default-features = false }
-bevy_gizmos = { path = "crates/bevy_gizmos", version = "0.16.0-dev", default-features = false }
+bevy_ecs = { path = "crates/bevy_ecs", version = "0.16.0-dev" }
+bevy_state = { path = "crates/bevy_state", version = "0.16.0-dev" }
+bevy_asset = { path = "crates/bevy_asset", version = "0.16.0-dev" }
+bevy_reflect = { path = "crates/bevy_reflect", version = "0.16.0-dev" }
+bevy_image = { path = "crates/bevy_image", version = "0.16.0-dev" }
+bevy_gizmos = { path = "crates/bevy_gizmos", version = "0.16.0-dev" }

@h3r2tic
Copy link
Author

h3r2tic commented Mar 1, 2025

Thanks for the suggestions of stuff to try, and the tentative PR. So far, no luck - I observe the same check/RA durations with or without [profile.dev.build-override] opt-level = 3, and similarly with and without default-features = false.

FWIW, we don't use the bevy crate directly - I've listed our exact dependencies as an edit to the issue.

@cart
Copy link
Member

cart commented Mar 5, 2025

Looking into this now. I have some suspicions.

@cart
Copy link
Member

cart commented Mar 6, 2025

The crux of the issue appears to be: parsing the cargo manifest is expensive, and so is doing the work required to not parse the manifest every time. Naturally the best approach would be to stop parsing the manifest at all (which we do to determine whether our derive macros should use the bevy::ecs path or the bevy_ecs path).

In an ideal world, we could just do if cfg!(dependency = "bevy") {}, but that is not possible (at least, not yet).

I've found a solution, but it will involve more per-macro boilerplate (and dirtier module re-exports for the top-level bevy crate). The benefit is that at compile time, we naturally select the derive impl we need without any manifest parsing:

// in bevy_ecs/src/florp.rs
pub use bevy_ecs_macros::Florp;
pub use bevy_ecs_macros::FlorpBevy;
pub trait Florp {}

// in bevy_ecs/src/lib.rs
pub mod prelude {
    pub use crate::{
        florp::Florp
    }
}

// in bevy_ecs/macros/lib.rs
#[proc_macro_derive(Florp)]
pub fn derive_florp(input: TokenStream) -> TokenStream {
    derive_florp_internal(input, &bevy_macro_utils::try_parse_str("bevy_ecs").unwrap())
}

#[proc_macro_derive(FlorpBevy)]
pub fn derive_florp_bevy(input: TokenStream) -> TokenStream {
    derive_florp_internal(
        input,
        &bevy_macro_utils::try_parse_str("bevy::ecs").unwrap(),
    )
}

fn derive_florp_internal(input: TokenStream, bevy_ecs_path: &syn::Path) -> TokenStream {
    let ast = parse_macro_input!(input as DeriveInput);
    let name = ast.ident;
    let (impl_generics, ty_generics, where_clauses) = ast.generics.split_for_impl();

    TokenStream::from(quote! {
        impl #impl_generics #bevy_ecs_path::florp::Florp for #name #ty_generics #where_clauses {
        }
    })
}


// in bevy_internal
pub mod ecs {
    pub use bevy_ecs::*;
    pub mod prelude {
        pub use super::florp::FlorpBevy as Florp;
        pub use bevy_ecs::prelude::*;
    }

    pub mod florp {
        pub use bevy_ecs::florp::{FlorpBevy as Florp, *};
    }
}

@cart
Copy link
Member

cart commented Mar 6, 2025

Also note that reverting #17330 would naively re-introduce some significant Rust Analyzer autocomplete breakages (for things like fn system(value: Res<MyType>) { value.AUTOCOMPLETE_BREAKS }). I'm reasonably sure that the methods used to fix that issue inherently cause the regression (likely the environment variable lookup,the "last edited" lookup in the filesystem, or the mutex lock added to the relatively hot BevyManifest::shared()).

@cart
Copy link
Member

cart commented Mar 6, 2025

I really like the compile-time behavior of the approach above (and I expect it would provide tangible performance improvements, both relative to current main with the regression, and relative to the old approach). But the "cost" is still reasonably high:

  1. We lose the module-level docs that come from direct module re-exports
  2. Writing macros becomes much harder. If you forget to "cast the spell", bevy-context derives will break
  3. Maintaining code becomes harder, as there will be more of it.

@cart
Copy link
Member

cart commented Mar 6, 2025

Another path I tried was using a bevy_crate cargo feature (set on each crate, such as bevy_ecs, in bevy_internal) to do the selection (removing the need for the module redeclaration weirdness), but this does not work because crates like bevy_render that internally use things like derive(Component) will still break in the context of a bevy crate build (because as we know, cargo features are flattened and used to produce a single version of a crate).

@bushrat011899
Copy link
Contributor

Another downside to the proposal here is if a user renames bevy_ecs within their Cargo.toml macros will break. Whereas I believe checking the manifest path is resilient to that.

@alice-i-cecile alice-i-cecile added A-Cross-Cutting Impacts the entire engine and removed S-Needs-Triage This issue needs to be labelled labels Mar 6, 2025
@cart
Copy link
Member

cart commented Mar 7, 2025

Hey @h3r2tic!

I've put together a branch that moves this logic back into Bevy proper, switches us to a RwLock, and slims down the logic a bit: https://github.com/cart/bevy/tree/internalize-manifest-handling

If you get a chance, can you check the performance on the project you got the numbers above for?

Ideally that brings us closer to the original performance. So far I haven't found alternatives other than the static approach listed above (which has enough tradeoffs that I'd prefer not to).

Hopefully eventually Rust solves this problem upstream. The approach used by proc-macro-crate (which is fundamentally what we are doing as well) is a nasty hack, not something the Rust community should be satisfied with.

@cart
Copy link
Member

cart commented Mar 7, 2025

I unfortunately haven't yet been able to reproduce the regression (using the bevy crate, or my various test projects). I really need to find (or build) a heftier Bevy project to test these things.

@h3r2tic
Copy link
Author

h3r2tic commented Mar 11, 2025

Hey, @cart! Awesome work in that branch, thank you for jumping on this! :) Your change completely fixes the performance regression. Here's some numbers:

before your changes (cca5813), rustc 1.85

  • cargo check (touch one file): 8.11s
  • cargo check (modify one fn): 9.13s
  • cargo clippy: (touch one file) 11.35s
  • cargo clippy (modify one fn): 12.23s
  • cargo build (touch one file): 9.02s
  • cargo build (modify one fn): 10.22s
  • wait for Rust Analyzer: (touch one file) 12.52s

after your changes (a207767), rustc 1.85

  • cargo check (touch one file): 1.89s
  • cargo check (modify one fn): 2.81s
  • cargo clippy (touch one file): 5.07s
  • cargo clippy (modify one fn): 5.96s
  • cargo build (touch one file): 2.90s
  • cargo build (modify one fn): 4.13s
  • wait for Rust Analyzer (touch one file): 6.16s

I was also curious how the numbers looked in comparison to our pre-upgrade ones, and they do indeed look essentially the same. The only real diff is due to rustc versions. Our mainline is still on 1.78 (the upgrade to 1.85 requires fixing a naughty crate, and upgrading a few others).

0.14, before we did the pre-upgrade, rustc 1.78

  • cargo check (touch one file): 1.88s
  • cargo check (modify one fn): 2.80s
  • cargo clippy (touch one file): 7.16s
  • cargo clippy (modify one fn): 8.02s
  • cargo build (touch one file): 4.09s
  • cargo build (modify one fn): 5.20s
  • wait for Rust Analyzer (touch one file): 7.85s

0.14, before we did the pre-upgrade, rustc 1.85 (a bit borken)

  • cargo check (touch one file): 1.83s
  • cargo check (modify one fn): 2.81s
  • cargo clippy (touch one file): 4.96s
  • cargo clippy (modify one fn): 5.95s
  • cargo build (touch one file): N/A (does not compile)
  • cargo build (modify one fn): N/A (does not compile)
  • wait for Rust Analyzer (touch one file): 6.16s

(For my own reference: the file I'm modifying: crates/country-core/src/systems/garden/record_garden_history.rs)

@cart
Copy link
Member

cart commented Mar 11, 2025

Music to my ears. Thank you so much for catching this and for following back up. Very glad this was caught and fixed as this is an issue close to my heart (and mental well being).

github-merge-queue bot pushed a commit that referenced this issue Mar 12, 2025
# Objective

Fixes #18103

#17330 introduced a significant compile time performance regression
(affects normal builds, clippy, and Rust Analyzer). While it did fix the
type-resolution bug (and the general approach there is still our best
known solution to the problem that doesn't involve [significant
maintenance
overhead](#18103 (comment))),
the changes had a couple of issues:

1. It used a Mutex, which poses a significant threat to parallelization.
2. It externalized existing, relatively simple, performance critical
Bevy code to a crate outside of our control. I am not comfortable doing
that for cases like this. Going forward @bevyengine/maintainer-team
should be much stricter about this.
3. There were a number of other areas that introduced complexity and
overhead that I consider unnecessary for our use case. On a case by case
basis, if we encounter a need for more capabilities we can add them (and
weigh them against the cost of doing so).

## Solution

1. I moved us back to our original code as a baseline
2. I selectively ported over the minimal changes required to fix the
type resolution bug
3. I swapped `Mutex<BTreeMap<PathBuf, &'static Mutex<CargoManifest>>>`
for `RwLock<BTreeMap<PathBuf, CargoManifest>>`. Note that I used the
`parking_lot` RwLock because it has a mapping API that enables us to
return mapped guards.
@raldone01
Copy link
Contributor

raldone01 commented Mar 26, 2025

Hey @cart!

I just noticed this issue. Sorry for causing such a ruckus!

Optional story time

Plot summary

When I first started work on the BevyManifest struct, it started out innocently enough.
Just a few changes to only create a single instance and share it. #16766
Should work out fine right?

Well no of course not. Resolving proc macro paths is much more complicated than that.
When I got back lots of complaints about breakage, I took a deep dive into how to correctly resolve proc macro paths in all cases.
I started writing a quite complicated testing infrastructure to verify my code actually worked.
That's when I noticed that it was a lot of complicated code that had very little to do with bevy, and I wanted to use this functionality outside of bevy for my other crates.
So I extracted it out into its own crate. #17330

Soon complaints arose stemming from the fact that users had bevy in their dependencies and dev-dependencies which I had not accounted for in my tests. Simple fix #17330.
I also added integrations tests which I am a little conflicted about.
I now believe a single integration test is what is really needed just to make sure the macros for sure work in user crates.

Now I read this issue and felt kind of dumb for not benchmarking my build times. In my tiny bevy projects I did not notice the slowdown.

I now wrote a lot of benchmarking infrastructure and played around with flamechart to track and fix the big slowdowns. Addressed in: cargo-manifest-proc-macros = "0.4.3"

When I benchmark bevy with https://github.com/raldone01/bevy-rs/commit/97128a76f0b6d655109302a5b17f19c86fac917b I get very similar results to current main.
Even when resolving workspace manifests and all the other complicated jazz properly.

Regarding supply chain attacks

Supply chain attacks are sadly a real threat.
XZ was a big and unfortunate eye opener.

I do not want to publicly state my real name in a manner that is quick and easy to look up for just anyone.
My username raldone01 can be easily linked to my real name using some advanced search techniques.
Should you ever want to include a crate from me in the future, I am willing to hop on a call and disclose my identity to trusted bevy members.

Conclusion

Contributing to bevy has been a great honor and pleasure so far!
It has been the most fun I have had with rust so far.
I am especially glad that my extern crate self as bevy_asset; still seems to have been a good improvement for proc macro usage inside of bevy.

I would have never spotted the double dependency bug without the brave bevy main branch testers.
Also optimizing the performance was very fun and exciting!

I understand the decision to not use my crate and was already expecting its removal.
However should you desire feature complete performant macro resolution in the feel free to add my crate again. I will do my best to tend to bevy's needs.

@h3r2tic you seem to have quite a big bevy project to test performance on.
Would you be so kind and repeat your performance measurements for raldone01@97128a7?
I made some major performance improvements and would just like to see the fruits of my labor. :D

@cart
Copy link
Member

cart commented Mar 26, 2025

Thanks for the overview @raldone01! Your work is very appreciated (and most of the key ideas are still present in the bevy codebase). I'm still biased toward keeping this code internalized though. Trust is certainly one concern, but it isn't the only motivator. Dependencies regularly cause issues for us (ex: keeping dependencies up to date / in sync, unintended semver violations breaking us unexpectedly, changes made that haven't gone through our vetting process, etc). I'd like to further reign things in, so please don't feel singled out here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Cross-Cutting Impacts the entire engine C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times P-Regression Functionality that used to work but no longer does. Add a test for this!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants