-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental: Add Derive Proc-Macro Caching #129102
base: master
Are you sure you want to change the base?
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…ng, r=<try> Experimental: Add Derive Proc-Macro Caching # On-Disk Caching For Derive Proc-Macro Invocations This PR adds on-disk caching for derive proc-macro invocations using rustc's query system to speed up incremental compilation. The implementation is (intentionally) a bit rough/incomplete, as I wanted to see whether this helps with performance before fully implementing it/RFCing etc. I did some ad-hoc performance testing. ## Rough, Preliminary Eval Results: Using a version built through `DEPLOY=1 src/ci/docker/run.sh dist-x86_64-linux` (which I got from [here](https://rustc-dev-guide.rust-lang.org/building/optimized-build.html#profile-guided-optimization)). ### [Some Small Personal Project](https://github.com/futile/ultra-game): ```console # with -Zthreads=0 as well $ touch src/main.rs && cargo +dist check ``` Caused a re-check of 1 crate (the only one). Result: | Configuration | Time (avg. ~5 runs) | |--------|--------| | Uncached | ~0.54s | | Cached | ~0.54s | No visible difference. ### [Bevy](https://github.com/bevyengine/bevy): ```console $ touch crates/bevy_ecs/src/lib.rs && cargo +dist check ``` Caused a re-check of 29 crates. Result: | Configuration | Time (avg. ~5 runs) | |--------|--------| | Uncached | ~6.4s | | Cached | ~5.3s | Roughly 1s, or ~17% speedup. ### [Polkadot-Sdk](https://github.com/paritytech/polkadot-sdk): Basically this script (not mine): https://github.com/coderemotedotdev/rustc-profiles/blob/d61ad38c496459d82e35d8bdb0a154fbb83de903/scripts/benchmark_incremental_builds_polkadot_sdk.sh TL;DR: Two full `cargo check` runs to fill the incremental caches (for cached & uncached). Then 10 repetitions of `touch $some_file && cargo +uncached check && cargo +cached check`. ```console $ cargo update # `time` didn't build because compiler too new/dep too old $ ./benchmark_incremental_builds_polkadot_sdk.sh # see above ``` _Huge_ workspace with ~190 crates. Not sure how many were re-built/re-checkd on each invocation. Result: | Configuration | Time (avg. 10 runs) | |--------|--------| | Uncached | 99.4s | | Cached | 67.5s | Very visible speedup of 31.9s or ~32%. --- **-> Based on these results I think it makes sense to do a rustc-perf run and see what that reports.** --- ## Current Limitations/TODOs I left some `FIXME(pr-time)`s in the code for things I wanted to bring up/draw attention to in this PR. Usually when I wasn't sure if I found a (good) solution or when I knew that there might be a better way to do something; See the diff for these. ### High-Level Overview of What's Missing For "Real" Usage: * [ ] Add caching for `Bang`- and `Attr`-proc macros (currently only `Derive`). * Not a big change, I just focused on `derive`-proc macros for now, since I felt like these should be most cacheable and are used very often in practice. * [ ] Allow marking specific macros as "do not cache" (currently only all-or-nothing). * Extend the unstable option to support, e.g., `-Z cache-derive-macros=some_pm_crate::some_derive_macro_fn` for easy testing using the nightly compiler. * After Testing: Add a `#[proc_macro_cacheable]` annotation to allow proc-macro authors to "opt-in" to caching (or sth. similar). Would probably need an RFC? * Might make sense to try to combine this with rust-lang#99515, so that external dependencies can be picked up and be taken into account as well. --- So, just since you were in the loop on the attempt to cache declarative macro expansions: r? `@petrochenkov` Please feel free to re-/unassign! Finally: I hope this isn't too big a PR, I'll also show up in Zulip since I read that that is usually appreciated. Thanks a lot for taking a look! :) (Kind of related/very similar approach, old declarative macro caching PR: rust-lang#128747)
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (6d8226b): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary -0.3%, secondary -1.8%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -3.9%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 753.65s -> 756.621s (0.39%) |
Ok, I think the performance results look pretty great! Here are the results for InstructionsSummary
Primary benchmarks
Secondary benchmarks
ConclusionSo overall, lots of The secondary benchmark results look weird, because Also, I think the benchmarks don't contain a lot of "big" projects (like bevy etc.), which might see the biggest speedup from this caching. Next StepsGiven these results, what's the opinion here? I would think it makes sense to review + clean up the implementation, extend it to all proc-macro types, and to add an unstable option that allows not caching certain proc-macros (e.g., Then after that, as mentioned before, it might need a new attribute like Anyway, it would be awesome if someone could take a look at/review the diff so far, and to also get some feedback on possible next steps. Thanks a lot already! :) P.S.: Big kudos to this blog post which found these possible gains in the first place! Also to @SparrowLii, whose initial implementation for declarative macro caching provided me the much-needed basic structure for this implementation as well! |
As far as I've seen, the default case has been to assume that caching can happen (and cases such as stateful proc-macros have not been supported), even if some proc-macros haven't treated it this way. Examples issues of this are #44034 (comment) and #63804 (comment), and #44034 (comment) (and the following comments) notes the IDE case, where language servers such as rust-analyzer and RustRover may cache proc-macro expansions already. Not a reference, but another example of how an IDE has treated them https://blog.jetbrains.com/rust/2022/12/05/what-every-rust-developer-should-know-about-macro-support-in-ides/#what-every-rust-macro-implementor-should-take-into-account
I'm not arguing for or against a flag/attribute, nor am I a reviewer, but saw this wonderful PR and thought to add some context I've seen to help make a decision. It's possible there have been further discussions elsewhere that I have missed. Thank you for exploring an implementation of caching proc-macro expansions! |
Thank you for bringing that up, I wasn't aware of it, definitely helps! :) The reason I wanted to not just "break" existing libraries is to not make life unnecessarily harder for libraries like sqlx, which rely on this behavior to a certain point. I think it makes sense to coordinate with #99515, which would give proc macro authors something clear to rely on, and would at the same time help to reduce the "fallout" from starting to cache proc macro invocations. But this probably means that a new opt-in attribute/etc. will not be necessary, so that's really nice, thanks! :)
Thanks, very appreciated! :) |
64058f4
to
abb1ba2
Compare
Updated with some cleanup. Got rid of some unnecessary code, and split out a small change to error handling in the query system into #129271. Otherwise same as before, ready for review :) |
…-panic, r=michaelwoerister Prevent double panic in query system, improve diagnostics I stumbled upon a double-panic in the query system while working on something else (rust-lang#129102), which hid the real error cause for what I was debugging. This PR remedies that, so unwinding should be able to present more errors. It shouldn't really be relevant for code that doesn't ICE.
Rollup merge of rust-lang#129271 - futile:query-system/prevent-double-panic, r=michaelwoerister Prevent double panic in query system, improve diagnostics I stumbled upon a double-panic in the query system while working on something else (rust-lang#129102), which hid the real error cause for what I was debugging. This PR remedies that, so unwinding should be able to present more errors. It shouldn't really be relevant for code that doesn't ICE.
These commits modify the If this was unintentional then you should revert the changes before this PR is merged. |
// This test tests that derive-macro execution is cached. | ||
// HOWEVER, this test can currently only be checked manually, | ||
// by running it (through compiletest) with `-- --nocapture --verbose`. | ||
// The proc-macro (for `Nothing`) prints a message to stderr when invoked, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically printing to stderr is a side effect and it makes the macro non-cacheable.
(But it's fine for testing, and if printing is not mandatory for the macro to work.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's exactly the idea to test the proc-macro 😅 Use a side-effect that's not tracked, and check that it is only executed on first compilation, and not when the cached result is used.
// by running it (through compiletest) with `-- --nocapture --verbose`. | ||
// The proc-macro (for `Nothing`) prints a message to stderr when invoked, | ||
// and this message should only be present during the second invocation | ||
// (which has `cfail2` set via cfg). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why second?
In the first run we actually execute the macro and print to stderr, in the second one we take the result from cache and do not print, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, left over from a previous version of this test, will remove it. Yes exactly, the second run should use the cached result and not print the messages 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
// The proc-macro (for `Nothing`) prints a message to stderr when invoked, | ||
// and this message should only be present during the second invocation | ||
// (which has `cfail2` set via cfg). | ||
// FIXME(pr-time): Properly have the test check this, but how? UI-test that tests for `.stderr`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why do you want to test stderr output, incremental tests have directives for testing invalidation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's what the "Properly have the test check this, but how?" was for, I simply didn't see how I can check this. There are directives, but I didn't figure out a way to make them work for this case. Because the directive needs to be attached to the proc macro output I think, but that was weird somehow. I will try again and see what exactly the problem was with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried again, but couldn't figure out something that works, i.e., that succeeds with proc-macro caching enabled, but fails with proc-macro caching disabled.
|
||
//@ aux-build:derive_nothing.rs | ||
//@ revisions:cfail1 cfail2 | ||
//@ compile-flags: -Z query-dep-graph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's required for rustc_partition_codegened
and rustc_partition_reused
, iirc. Though I'm not sure if they are useful/a good thing to apply in this test.
@@ -1637,6 +1637,8 @@ options! { | |||
"emit noalias metadata for box (default: yes)"), | |||
branch_protection: Option<BranchProtection> = (None, parse_branch_protection, [TRACKED], | |||
"set options for branch target identification and pointer authentication on AArch64"), | |||
cache_all_derive_macros: bool = (true, parse_bool, [UNTRACKED], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache_all_derive_macros: bool = (true, parse_bool, [UNTRACKED], | |
cache_proc_macros: bool = (true, parse_bool, [UNTRACKED], |
For compatibility with future extensions (bang and attr macro caching, and explicit opt-ins/opt-outs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
}, | ||
); | ||
let proc_macro_backtrace = ecx.ecfg.proc_macro_backtrace; | ||
let strategy = crate::proc_macro::exec_strategy(ecx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need ecx
, only session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
); | ||
let proc_macro_backtrace = ecx.ecfg.proc_macro_backtrace; | ||
let strategy = crate::proc_macro::exec_strategy(ecx); | ||
let server = crate::proc_macro_server::Rustc::new(ecx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, Rustc
does needs ecx
and uses it in nontrivial ways, which may break macro caching.
This is basically a side channel through which data can flow to and from the macro without query system noticing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the logic in Rustc::new
doesn't need ecx
though, except for storing ecx
itself into the field of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is why I didn't bother replacing the other uses of ecx
before, because I saw that this is tied pretty strongly to ecx
. But I can probably take care of the replaceable uses in another commit, since it should be mostly independent I'd think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually didn't touch Rustc::new
itself, I didn't see how to replace the uses of ecx
with tcx
instead (and they weren't too many).
if tcx.sess.opts.unstable_opts.cache_all_derive_macros { | ||
tcx.derive_macro_expansion(key).cloned() | ||
} else { | ||
crate::derive_macro_expansion::provide_derive_macro_expansion(tcx, key).cloned() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to setup any global context in this case, we could just pass arguments, the context only needs to be setup around the query call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,125 @@ | |||
use std::cell::Cell; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably better to merge this file into compiler/rustc_expand/src/proc_macro.rs
and put the new content in the bottom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done; I think with_context
and enter_context
could maybe use more descriptive names soon, since they are now in a bigger file. But I'd wait until attr- and bang-proc-macros are also cacheable, to see if it's reused for those.
@@ -103,6 +104,13 @@ pub use plumbing::{IntoQueryParam, TyCtxtAt, TyCtxtEnsure, TyCtxtEnsureWithValue | |||
// Queries marked with `fatal_cycle` do not need the latter implementation, | |||
// as they will raise an fatal error on query cycles instead. | |||
rustc_queries! { | |||
query derive_macro_expansion(key: (LocalExpnId, Svh, &'tcx TokenStream)) -> Result<&'tcx TokenStream, ()> { | |||
// eval_always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that eval_always
may indeed be needed here because the query may access untracked data, but I'm not sure, this whole PR will need another review from some query system / incremental compilation expert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I understand correctly, eval_always
would prevent caching completely, no? Or will it always execute the query, but, if the output stays the same, not mark dependent nodes as "require recomputation"?
This will clearly need an opt-in opt out attribute (or better a modifier for |
Thank you for the review! I'm currently a bit busy, but hope I'll get to addressing all the feedback in the code soon! :)
Yep, that was also my initial thought. I guess it might make sense to also ask on, e.g., internals.r-l.o, or on Zulip, to gather opinions on this? But I'll do that when we get that far. Regarding the modifier: I scouted that out at first as well, but I think one or two of the proc-macro types already use modifiers (I'm thinking of
Note that, just as #129102 (comment) mentions, this is already the case. If a proc macro depends on some file/external state, but nothing else about the code has changed, That doesn't mean I'm against being careful about adding caching for proc macros, on the contrary, I think it makes sense to spread information about this proactively before anything is merged. Just that so far these guarantees have strictly not been there either, also rust-analyzer is already not re-executing proc-macros all the time. I think coordinating with #99515 makes a lot of sense, because that would add a way to properly track external dependencies on files and env-vars, which would actually give proc-macro authors a reliable option in general.
Would you have someone in mind? For now I'll fix this round of feedback first, so got a bit to do anyway. Also wanted to add caching for the other proc-macro types as well (does that make sense already?). Otherwise, what would be the correct way to find somebody? I can definitely ask on Zulip, if that is a good way to go about this. Again, thanks for the review! |
…ot appear in 2nd output
Also add manual test.
e7cbfd7
to
bb3bddf
Compare
441c4ab
to
abcc4a2
Compare
This comment has been minimized.
This comment has been minimized.
@rustbot ready Happy holidays if you're celebrating! Took some time, please let me know if I missed any of the feedback :) |
On-Disk Caching For Derive Proc-Macro Invocations
This PR adds on-disk caching for derive proc-macro invocations using rustc's query system to speed up incremental compilation.
The implementation is (intentionally) a bit rough/incomplete, as I wanted to see whether this helps with performance before fully implementing it/RFCing etc.
I did some ad-hoc performance testing.
Rough, Preliminary Eval Results:
Using a version built through
DEPLOY=1 src/ci/docker/run.sh dist-x86_64-linux
(which I got from here).Some Small Personal Project:
Caused a re-check of 1 crate (the only one).
Result:
No visible difference.
Bevy:
$ touch crates/bevy_ecs/src/lib.rs && cargo +dist check
Caused a re-check of 29 crates.
Result:
Roughly 1s, or ~17% speedup.
Polkadot-Sdk:
Basically this script (not mine): https://github.com/coderemotedotdev/rustc-profiles/blob/d61ad38c496459d82e35d8bdb0a154fbb83de903/scripts/benchmark_incremental_builds_polkadot_sdk.sh
TL;DR: Two full
cargo check
runs to fill the incremental caches (for cached & uncached). Then 10 repetitions oftouch $some_file && cargo +uncached check && cargo +cached check
.Huge workspace with ~190 crates. Not sure how many were re-built/re-checkd on each invocation.
Result:
Very visible speedup of 31.9s or ~32%.
-> Based on these results I think it makes sense to do a rustc-perf run and see what that reports.
Current Limitations/TODOs
I left some
FIXME(pr-time)
s in the code for things I wanted to bring up/draw attention to in this PR. Usually when I wasn't sure if I found a (good) solution or when I knew that there might be a better way to do something; See the diff for these.High-Level Overview of What's Missing For "Real" Usage:
Bang
- andAttr
-proc macros (currently onlyDerive
).derive
-proc macros for now, since I felt like these should be most cacheable and are used very often in practice.-Z cache-derive-macros=some_pm_crate::some_derive_macro_fn
for easy testing using the nightly compiler.#[proc_macro_cacheable]
annotation to allow proc-macro authors to "opt-in" to caching (or sth. similar). Would probably need an RFC?TokenStream
s?TokenStream
s aren't hashable/stable-hashable/encodable(?) for now. We have to a) detect that, which might require catching a panic (forHash::hash()
I believe), or b) otherwise establishingTryHash
etc. traits that are more failure resistant.So, just since you were in the loop on the attempt to cache declarative macro expansions:
r? @petrochenkov
Please feel free to re-/unassign!
Finally: I hope this isn't too big a PR, I'll also show up in Zulip since I read that that is usually appreciated. Thanks a lot for taking a look! :)
(Kind of related/very similar approach, old declarative macro caching PR: #128747)