Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --incompatible_compact_repo_mapping_manifest #24809

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Dec 29, 2024

With the flag enabled, <binary>.repo_mapping contains

+deps+*,aaa,_main
+deps+*,dep,+deps+dep1
+deps+*,dep1,+deps+dep1
+deps+*,dep2,+deps+dep2
+deps+*,dep3,+deps+dep3

instead of

+deps+dep1,aaa,_main
+deps+dep1,dep,+deps+dep1
+deps+dep1,dep1,+deps+dep1
+deps+dep1,dep2,+deps+dep2
+deps+dep1,dep3,+deps+dep3
+deps+dep2,aaa,_main
+deps+dep2,dep,+deps+dep1
+deps+dep2,dep1,+deps+dep1
+deps+dep2,dep2,+deps+dep2
+deps+dep2,dep3,+deps+dep3
...

for the deps module extension.

Runfiles libraries have to be updated to find entries using the new format.

Work towards #24808

@fmeum fmeum force-pushed the 24808-compact-repo-mapping branch 4 times, most recently from 2601e43 to 9b3e9fc Compare December 30, 2024 08:26
@fmeum fmeum marked this pull request as ready for review December 30, 2024 08:29
@fmeum fmeum requested review from a team and lberki as code owners December 30, 2024 08:29
@fmeum fmeum requested review from aranguyen and removed request for a team, lberki and aranguyen December 30, 2024 08:29
@github-actions github-actions bot added team-Configurability platforms, toolchains, cquery, select(), config transitions team-Rules-Python Native rules for Python awaiting-review PR is awaiting review from an assigned reviewer labels Dec 30, 2024
@fmeum
Copy link
Collaborator Author

fmeum commented Dec 30, 2024

@Wyverald Should we ask runfiles libraries to perform a linear match on all lines with prefixes? That's O(num extensions), but avoids specifying the separator char or scheme.

@fmeum
Copy link
Collaborator Author

fmeum commented Jan 30, 2025

@Wyverald Friendly ping

@Wyverald
Copy link
Member

Before we commit to introducing a new manifest format, could you briefly explain to me why we're recording so many entries in the manifest? I vaguely remember that we try to trim entries down to just the ones that we actually include runfiles for. Does that actually end up being every single repo generated by the extension? (Is it because of Python source files?)

I really wish this was something that could transparently be taken care of by compression, but I guess that's a bit of a pipe dream.

@fmeum
Copy link
Collaborator Author

fmeum commented Jan 30, 2025

We are trimming down the target repos to those that provide runfiles, but for NPM and Python that's typically every extension repo.

Most of those won't use a runfiles library, so if we tracked that (my original proposal had something like this, but we decided against it for being too complicated), we could potentially trim down the source repos. But if a ruleset for a dynamic language ever adopts repo mapped language imports using the runfiles library (rules_python has been discussing this at some point), even that wouldn't help.

Compression is a good fix for remote execution. I have a change out that lazily streams these files to the executor with BwoB, but that doesn't help for local builds.

@Wyverald
Copy link
Member

Wyverald commented Feb 3, 2025

I see. It somehow escaped me, but thinking about it again, for a top-level binary that depends on a lot of Python code, there's basically no way to "trim" anything here for any meaningful measure.

We should definitely tread carefully here -- changing the manifest format can be rather disruptive, especially since it's not versioned (so it basically always has to be forwards-compatible).

@Wyverald
Copy link
Member

Wyverald commented Feb 3, 2025

@Wyverald Should we ask runfiles libraries to perform a linear match on all lines with prefixes? That's O(num extensions), but avoids specifying the separator char or scheme.

Could you elaborate a bit what "a linear match on all lines with prefixes" means?

@fmeum
Copy link
Collaborator Author

fmeum commented Feb 3, 2025

Could you elaborate a bit what "a linear match on all lines with prefixes" means?

Runfiles libraries have essentially two ways to look up mappings in the presence of wildcards. First, try to look up an exact match for the source repo in some equivalent of HashMap. If there is no such match, then:

  1. Iterate over all manifest lines with a wildcard character and check whether they represent a prefix. This takes time O(wildcard entries).
  2. Assume that prefixes are cut off at the last +. Then the wildcard entries can be preprocessed into a HashMap in which entries can be looked up directly by cutting of at the last + and performing a map lookup. This takes constant time, but requires knowledge about how prefixes are constructed.

@Wyverald
Copy link
Member

Wyverald commented Feb 5, 2025

I see. I would like to avoid encoding any knowledge about the repo name format whatsoever (you'd probably expect that from me at this point :)). On the runfiles library side, some tricks can be done to speed up the lookup (e.g. constructing a trie), so performance should still be good enough.

@fmeum fmeum changed the title Add --incompatible_compact_repo_mapping Add --incompatible_compact_repo_mapping_manifest Feb 18, 2025
@fmeum fmeum force-pushed the 24808-compact-repo-mapping branch from 9b3e9fc to 5e9644b Compare February 18, 2025 12:17
@fmeum fmeum requested a review from Wyverald February 18, 2025 12:17
@fmeum
Copy link
Collaborator Author

fmeum commented Feb 18, 2025

The equivalent of a TreeMap is probably good enough. I will follow up with PRs for the main runfiles libraries.

@aignas
Copy link

aignas commented Feb 27, 2025

Regarding comment from @fmeum

language imports using the runfiles library (rules_python has been discussing this at some point)

Right now we are also researching ways to lay out the files in a way that would not require this (i.e. create a virtual env and put the files in a way that is natural to Python). Right now no one is pursuing reading the runfiles manifest to implement an importlib alternative.

The virtual env approach is being researched here: bazel-contrib/rules_python#2617

@fmeum fmeum force-pushed the 24808-compact-repo-mapping branch from 5e9644b to f6bf303 Compare March 7, 2025 07:59
@fmeum
Copy link
Collaborator Author

fmeum commented Mar 7, 2025

@Wyverald I resolved the conflicts, this should be good for another review.

fmeum added 2 commits March 19, 2025 09:13
# Conflicts:
#	src/test/java/com/google/devtools/build/lib/analysis/RunfilesRepoMappingManifestTest.java

# Conflicts:
#	src/main/java/com/google/devtools/build/lib/rules/python/PyBuiltins.java
#	src/test/java/com/google/devtools/build/lib/analysis/BUILD

# Conflicts:
#	src/main/java/com/google/devtools/build/lib/rules/python/BUILD
@fmeum fmeum force-pushed the 24808-compact-repo-mapping branch from 3185548 to 83f4865 Compare March 19, 2025 08:13
@fmeum
Copy link
Collaborator Author

fmeum commented Mar 19, 2025

@Wyverald Friendly ping, let's merge this so that runfiles library work can start :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-review PR is awaiting review from an assigned reviewer team-Configurability platforms, toolchains, cquery, select(), config transitions team-Rules-Python Native rules for Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants