-
Notifications
You must be signed in to change notification settings - Fork 372
perf: ignore instead of wax #4578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: ignore instead of wax #4578
Conversation
|
Now on I do think this is still too much for something that runs on every |
|
And thats in release right @ruben-arts? |
|
Yes that was in release mode. |
|
On my linux machine I get the following:
|
|
I made a small comparison repository, to see if I got things right. Maybe @baszalmstra can check if he sees any mistakes. https://github.com/tdejager/wax-vs-ignore |
| @@ -0,0 +1,437 @@ | |||
| //! Plan the effective glob walk root for a set of patterns that may contain relative components. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nichmor please check this :)
| fn drop(&mut self) { | ||
| let mut sink = self.sink.lock(); | ||
| sink.get_or_insert_with(Vec::new).append(&mut self.local); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we append all results from the walk
| .git_ignore(true) | ||
| .git_exclude(true) | ||
| .hidden(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've set some defaults here.
| uv-pypi-types = { git = "https://github.com/astral-sh/uv", tag = "0.8.5" } | ||
| uv-requirements-txt = { git = "https://github.com/astral-sh/uv", tag = "0.8.5" } | ||
|
|
||
| wax = "0.6.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gone with ye, for now!
|
Could you fix the tests @tdejager? I'll be glad to review after that. |
|
@remimimimimi Yeah sorry about that, because of ignore we were using gitignore rules instead of unix glob, so had to change some stuff here and there. Edit: |
| /// A list of globs that should be ignored when calculating any input hash. | ||
| /// These are typically used for build artifacts that should not be included in | ||
| /// the input hash. | ||
| pub const DEFAULT_BUILD_IGNORE_GLOBS: &[&str] = &["!.pixi/**"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dont need this we now exclude hidden by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of feels wrong to me, but I can't articulate why.
We for sure should document this though 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a switch I can flip. But I feel you also don't want to hover something like the .git folder by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I couldn't think of any case where you'd want a folder or file starting with . to be included.
Let's just document this for now :)
| @@ -0,0 +1,216 @@ | |||
| //! Convenience wrapper around `ignore` that emulates the glob semantics pixi expects. | |||
| //! | |||
| //! Notable behavioural tweaks compared to vanilla gitignore parsing, so that it behaves more like unix globbing with special rules: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was all a bit hairy to get it to behave like expected. I've made codex explain that part here.
|
Because we are using custom special globbing semantics, it might start to make sense to explain this in the docs explicitly in a follow up PR. |
|
Hmm will put it into draft until I figure out the pixi build test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested it and checkout the code - looks very good!
some small ideas - maybe we can rename glob_walk_root into walk_root as it's already in the glob_set module.
Apart from that I like it
Experiment with going back to ignore, seems quite promising. Got it to 1ms for most minimal folder structure for glob comparison in debug as opposed to 1s worst-case with wax on my machine. The exact problem that it was parsing too many directories now seems to happen with wax as opposed to ignore, I have an external test that I verified with @ruben-arts where I saw that happening. I'll link that in a bit.
Update 18-09:
Why?
We had a lot of slowness in glob traversal when testing out pixi build, seems that wax in the released version was iterating into the
.pixifolder when it should not need to. We've switched to ignore, which seemed to have worse performance charateristics in the past. But this has changed.Description
This is now in a state that it can be merged, this completely moves away from wax for now, as it seems in the use-cases that we are interested in I cannot get it to perform faster. I've also made a very naive implementation that does a BFS with fastglob (together with @remimimimimi) in: tdejager/wax-vs-ignore and it seems to be to always be faster or equally fast.
What changed?
High-level overview of the things I did:
GlobSetwith ignore implementation.GlobWalkRootthat determines where to start walking and allows rebasing of globs. This was needed because we did multiple walks with wax depending on the glob which could contain seperators like... This means you need to start walking from a location that is offset from the searchpath. Because I want to ideally do everything in one walk. I inverted this problem that we rebase/reset the globs that do not have any..relative seperators. This allows us to use all globs in a single traversal. You can read the moduleglob_walk_root.rsfor an overview, and I also added a lot of tests to verify the correctness for this behavior.walk.rsfile. We use a feature of the ignore library to be able to walk over directories in a parallel. We use thevisitmethod with a custom visitor that collects all values when the visitors are dropped. This makes sure we only need to synchronize in the end, I think the mutex here is not needed per se because the joins are serial. However, this makes sure it does not break in the future when things do change. Using this approached saves like 10ms for a big project like opencv on my machine.How this was tested.
The runtimes vary a bit though, but its a far cry from the 400ms it could sometimes take in the best case.
Review
Assigning to @nichmor and @remimimimimi. @nichmor seeings as you made the semantic glob adjustments in wax, would be great to closely check
glob_walk_root.rs.