Skip to content

Conversation

@tdejager
Copy link
Contributor

@tdejager tdejager commented Sep 12, 2025

Experiment with going back to ignore, seems quite promising. Got it to 1ms for most minimal folder structure for glob comparison in debug as opposed to 1s worst-case with wax on my machine. The exact problem that it was parsing too many directories now seems to happen with wax as opposed to ignore, I have an external test that I verified with @ruben-arts where I saw that happening. I'll link that in a bit.

Update 18-09:

Why?

We had a lot of slowness in glob traversal when testing out pixi build, seems that wax in the released version was iterating into the .pixi folder when it should not need to. We've switched to ignore, which seemed to have worse performance charateristics in the past. But this has changed.

Description

This is now in a state that it can be merged, this completely moves away from wax for now, as it seems in the use-cases that we are interested in I cannot get it to perform faster. I've also made a very naive implementation that does a BFS with fastglob (together with @remimimimimi) in: tdejager/wax-vs-ignore and it seems to be to always be faster or equally fast.

What changed?

High-level overview of the things I did:

  • Completely replace GlobSet with ignore implementation.
  • Add GlobWalkRoot that determines where to start walking and allows rebasing of globs. This was needed because we did multiple walks with wax depending on the glob which could contain seperators like ... This means you need to start walking from a location that is offset from the searchpath. Because I want to ideally do everything in one walk. I inverted this problem that we rebase/reset the globs that do not have any .. relative seperators. This allows us to use all globs in a single traversal. You can read the module glob_walk_root.rs for an overview, and I also added a lot of tests to verify the correctness for this behavior.
  • I essentially split up the actual walking in a walk.rs file. We use a feature of the ignore library to be able to walk over directories in a parallel. We use the visit method with a custom visitor that collects all values when the visitors are dropped. This makes sure we only need to synchronize in the end, I think the mutex here is not needed per se because the joins are serial. However, this makes sure it does not break in the future when things do change. Using this approached saves like 10ms for a big project like opencv on my machine.
  • I've kept the timings in debug so we can check it in the coming releases if reports come in that its slow.

How this was tested.

  • This is it running on the opencv repo by @ruben-arts:
Screenshot 2025-09-18 at 13 54 53

The runtimes vary a bit though, but its a far cry from the 400ms it could sometimes take in the best case.

  • I've also added a significant number of unit tests.

Review

Assigning to @nichmor and @remimimimimi. @nichmor seeings as you made the semantic glob adjustments in wax, would be great to closely check glob_walk_root.rs.

@ruben-arts
Copy link
Contributor

Now on opencv, it takes ~40ms: https://share.firefox.dev/4n3ZTSu
Insane improvement over the latest release with 800ms: https://share.firefox.dev/4nlupXI

I do think this is still too much for something that runs on every pixi run, as in a big mono repo, it could easily grow again.

@tdejager
Copy link
Contributor Author

And thats in release right @ruben-arts?

@ruben-arts
Copy link
Contributor

Yes that was in release mode.

@Hofer-Julian
Copy link
Contributor

On my linux machine I get the following:

@tdejager
Copy link
Contributor Author

I made a small comparison repository, to see if I got things right. Maybe @baszalmstra can check if he sees any mistakes. https://github.com/tdejager/wax-vs-ignore

@lucascolley lucascolley added the performance An issue related to performance label Sep 17, 2025
@tdejager tdejager marked this pull request as ready for review September 18, 2025 11:43
@@ -0,0 +1,437 @@
//! Plan the effective glob walk root for a set of patterns that may contain relative components.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nichmor please check this :)

Comment on lines +29 to +32
fn drop(&mut self) {
let mut sink = self.sink.lock();
sink.get_or_insert_with(Vec::new).append(&mut self.local);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we append all results from the walk

Comment on lines +89 to +91
.git_ignore(true)
.git_exclude(true)
.hidden(true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've set some defaults here.

uv-pypi-types = { git = "https://github.com/astral-sh/uv", tag = "0.8.5" }
uv-requirements-txt = { git = "https://github.com/astral-sh/uv", tag = "0.8.5" }

wax = "0.6.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gone with ye, for now!

@remimimimimi
Copy link
Contributor

Could you fix the tests @tdejager? I'll be glad to review after that.

@tdejager
Copy link
Contributor Author

tdejager commented Sep 18, 2025

@remimimimimi Yeah sorry about that, because of ignore we were using gitignore rules instead of unix glob, so had to change some stuff here and there.

Edit:
More failures..
Should fixed the windows failure but pixi build I need to look more closely.

/// A list of globs that should be ignored when calculating any input hash.
/// These are typically used for build artifacts that should not be included in
/// the input hash.
pub const DEFAULT_BUILD_IGNORE_GLOBS: &[&str] = &["!.pixi/**"];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont need this we now exclude hidden by default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of feels wrong to me, but I can't articulate why.

We for sure should document this though 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a switch I can flip. But I feel you also don't want to hover something like the .git folder by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I couldn't think of any case where you'd want a folder or file starting with . to be included.
Let's just document this for now :)

@@ -0,0 +1,216 @@
//! Convenience wrapper around `ignore` that emulates the glob semantics pixi expects.
//!
//! Notable behavioural tweaks compared to vanilla gitignore parsing, so that it behaves more like unix globbing with special rules:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was all a bit hairy to get it to behave like expected. I've made codex explain that part here.

@tdejager
Copy link
Contributor Author

tdejager commented Sep 18, 2025

Because we are using custom special globbing semantics, it might start to make sense to explain this in the docs explicitly in a follow up PR.

@tdejager
Copy link
Contributor Author

Hmm will put it into draft until I figure out the pixi build test.

@nichmor nichmor marked this pull request as ready for review September 23, 2025 13:52
Copy link
Contributor

@nichmor nichmor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested it and checkout the code - looks very good!

some small ideas - maybe we can rename glob_walk_root into walk_root as it's already in the glob_set module.

Apart from that I like it

@tdejager tdejager merged commit 43b0cbf into prefix-dev:main Sep 25, 2025
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance An issue related to performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants