Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subninja support and multithreaded loading #112

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

Colecf
Copy link
Contributor

@Colecf Colecf commented Feb 28, 2024

Fixes #109
Fixes #108

This PR ports over most of the enhancements that are in Android's fork of ninja to n2. N2 goes from taking ~16 seconds to parse the AOSP ninja files, to ~1 second. (at least on my work computer where available_parallelism returns 128) It still seems to be about ~0.1 seconds slower than the android fork of ninja, but it's at least in the range where it's a little difficult to measure.

A general list of the changes in this PR:

  • Add support for subninja, with scoped variables.
  • mmap input files instead of reading them into memory.
  • Remove the FileIds and the Vec of files, replacing them with Arc. Having to maintain a separate vec mapping FileIds to files is too expensive.
  • Use the DashMap crate to map from filename to File object, so that id_from_canonical can be called in parallel.
  • Use rayon for multithreading.
  • Split input files into chunks and parse the chunks in parallel
  • Parse subninjas in parallel. Subninja support was implemented because regular includes cannot be processed in parallel.
  • Revamp the evaluation of variable assignments. Now, every statement in the ninja file gets a "scope position". When evaluating global variables, they are evaluated relative to a certain scope position, so you can evaluate the variable at a particular point in the file, even if it is reassigned later.
  • Introduce "clumps" of parse results, as returning all the parsed statements in a flat vector causes too many large vector concatenations.
  • Box Build objects, as they are quite large, and Vecs of unboxed builds are too large and slow to concatenate.
  • Don't evaluate all the build bindings at load time. Only evaluate the input/output files, and defer evaluation of other bindings until the build is actually run.
  • Evalstrings are now just regular strings instead of Vecs of strings. This means when they're evaluated they're re-parsed. Doing it this way saves of the Vec allocations, and in the case of owned evalstrings, does one big string allocation instead of a bunch of little ones. There are a lot more owned evalstrings now due to deferred evaluation of build bindings, so it makes a bit of a difference.

One thing that the android fork has that I didn't include was precomputed hashes. The android fork has a HashedString and HashedStrView type that it uses to ensure hashes aren't recalculated. This is somewhat difficult to do in rust, you have to either make concessions about being able to look up map values by their borrowed representation, use the raw entry map apis on nightly rust, or use hashbrown's RawTable. And even when I was experimenting with RawTable it didn't seem to provide a noticeable speedup.

Some of these optimizations are memory optimizations. This turned out to be necessary because apparently the exit_group syscall that quits the process spends time cleaning up memory. The exit_group syscall was taking over a second to run before some of these optimizations.

I've also bumped the rust version needed to compile n2 from 1.70 to 1.73, to be able to use usize::next_multiple_of. But if we want to keep the rust version low I can just manually write a next_multiple_of.

@Colecf Colecf force-pushed the multithreaded_loading branch 3 times, most recently from aa7d8f3 to 60055f0 Compare February 28, 2024 18:24
@Colecf
Copy link
Contributor Author

Colecf commented Feb 28, 2024

CI is passing now.

@evmar
Copy link
Owner

evmar commented Feb 28, 2024

Wow, impressive! I was just wondering this weekend whether you were still tinkering in this area...

Some minor thoughts I had while skimming this:

  • I wonder if an arena would help with the all the alloc/dealloc traffic, in particular parsing creates a zillion Builds and we never want to free any except all at once
  • re scope positions, I don't understand what you've done yet but in ninja we had each scope contain a pointer to its immutable parent scope, so in a series like a = ...; { b = $a; } a = ... the nested bit in the middle would see only the first value of a
  • have you looked at the perf of how quickly it starts on build tasks? I imagine your 10% is just parse-time perf, and I'm curious whether n2's hash-based dirty checking impacts you

@Colecf
Copy link
Contributor Author

Colecf commented Feb 29, 2024

  • I did briefly look into bumpalo, but didn't use it because dashmap and stdlib hashmaps don't support custom allocators. Also bumpalo doesn't support multithreading directly, you have to create one arena per thread (bumpalo_herd). However the hashbrown crate supports them, so maybe that's worth trying... I did use std::mem::forget on the graph at the end of the program so we don't spend time deallocating it.
  • Yes, I think that's essentially the same thing. Though now with the multithreading every variable assignment, rule, build, default, and subninja gets a scope position so all these things can be evaluated in parallel, not just subninjas.
  • I haven't profiled build task start time, that might be worth doing but it should be much smaller than the parse time. I'm not sure what 10% you're referring to though.

By the way, did you ever get the copy of the android ninja files I shared with you after you requested them in #94? You should have a shared google drive zip.

@evmar
Copy link
Owner

evmar commented Mar 5, 2024

By the way, did you ever get the copy of the android ninja files I shared with you after you requested them in #94? You should have a shared google drive zip.

Thanks! I grabbed a copy onto my Google Drive.

I haven't looked into the rest of this PR yet, sorry. :( It seems pretty fancy though, do you think you'll use it for Android going forward?

Colecf and others added 22 commits March 5, 2024 11:31
This removes the need for maintaining 2 maps from ids to files, and
strings to ids.
These turn out to be faster than spawning individual tasks, even if
it requires more single-threaded parts of the code.
Using Arc<String> allows us to not copy the string into both the hashmap
key and the File object.
Keep the statements around as a Vec<Vec<Statement>>
even after parsing, because copying all the builds
into a new vector takes too long and causes more
memory usage.
Another attempt to reduce copies / memory usage
Using the final graph::Build object is faster due to the need for
fewer copies between different vector types. It will also allow us
to only evaluate the bindings of the Build objects we actually use.
This saves ~0.3 seconds of merging the vecs of builds.
This allows us to not spend time evaluating all the build bindings,
but we still need to evaluate all the global variables if we want to
not keep the file in memory.
We only need these within the lifetime of 'text, so we can make them
references instead.
Surprisingly this seems to have a measurable ~0.1s performance increase.
Also removed some unwraps in the VariableAssignment's evaluation, but
those didn't seem to help as much.
unmapping files is slow.
Colecf added 7 commits March 5, 2024 11:31
So that we don't have to allocate memory for their parsed
representations.
This should also improve performance of evaluation in general slightly.
Windows doesn't have support for mmap (at least not with the same apis)
Needed for usize::next_multiple_of().
@Colecf Colecf force-pushed the multithreaded_loading branch from 60055f0 to 9cc13a8 Compare March 5, 2024 19:37
@Colecf
Copy link
Contributor Author

Colecf commented Mar 5, 2024

I haven't looked into the rest of this PR yet, sorry. :( It seems pretty fancy though, do you think you'll use it for Android going forward?

Do you mean this PR or n2 in general? For the PR, yes, we definitely need some kind of load speed improvement, 16 seconds to start ninja would cause riots.

For n2 in general, I've started the process of forking it into android, but there are still a number of features that need to be added. I'm hoping to switch to it soon because there are some other teams who either want to add more ninja features, or who have already made their own forks with new features and are now contemplating merging their code back into the android fork. (e.g. abfs, a filesystem that communicates the files read by actions back to the build system) I also want to implement action sandboxing. I think it would be easier to scale to all these new features if they were written in rust as opposed to C++.

@evmar
Copy link
Owner

evmar commented Mar 5, 2024

I am glad it is working out! I agree that working from Rust will hopefully make it easier to extend. One of my goals with n2 was to make it separable enough that it's hopefully easier to reason about.

BTW I think you'll need #50 if you want to use n2 in production. Or maybe even consider using a real database, I dunno -- the load time is in the critical path so I'm kind of fond of my tricky hack.

In any case I'd be happy to give advice on any pieces you end up needing to interact with, feel free to email about them or file bugs. (In particular I have some ideas on how n2 might implement content hashing for files, which is useful both for dirty checking but also for interacting with cloud build systems, and how that can nicely fit into n2's model.)

@Colecf
Copy link
Contributor Author

Colecf commented Mar 5, 2024

BTW I think you'll need #50 if you want to use n2 in production. Or maybe even consider using a real database, I dunno -- the load time is in the critical path so I'm kind of fond of my tricky hack.

Yeah I'm aware of that issue. Though what "tricky hack" are you talking about?

In any case I'd be happy to give advice on any pieces you end up needing to interact with, feel free to email about them or file bugs.

Thanks! Though I think I have a good idea about what needs to be done. What would be most helpful would be if you could review changes like this one faster, but I understand that's a lot of work to ask of you. So I'm going to try to get a fork going so we can get the fork up to speed faster.

In particular I have some ideas on how n2 might implement content hashing for files

Yeah I'd like to add that feature as well.

@evmar
Copy link
Owner

evmar commented Mar 6, 2024

Though what "tricky hack" are you talking about?

The custom "database" defined in db.rs. I think Ninja has some extra goop in it for the databasey corner cases, where like each row is suffixed with a parity check to catch partial writes to it.

hubot pushed a commit to aosp-mirror/platform_manifest that referenced this pull request Mar 28, 2024
dashmap is currently unused, but will be required after merging in
evmar/n2#112

Bug: 318434287
Bug: 328273370
Test: added n2 to build-prebuilts.sh, and ran OUT_DIR=out prebuilts/build-tools/build-prebuilts.sh
Change-Id: Ied5af144b8b2c21b385521be430f9d0313c00d93
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mmap input files rather than reading Add multithreaded parsing
2 participants