Add support for decompressing .7z files #27269

willstranton · 2025-10-15T08:01:46Z

This implementation follows the implementation of ZipDecompressor with the exception that .7z files do not handle symbolic links or preserving file permissions. The commit 9c98120 where .ar/.deb support was added, was used as a reference for what additional files to change.

Closes #27231

fmeum · 2025-10-15T08:45:56Z

...java/com/google/devtools/build/lib/bazel/repository/decompressor/SevenZDecompressorTest.java

+
+/** Tests .7z decompression. */
+@RunWith(JUnit4.class)
+public class SevenZDecompressorTest {


Could you also add a test case for Unicode filename handling? You can search for examples of tests containing äöü.

ok, see the added test testUnicodeFilename which checks that ünïcödëFïlë.txt is extracted.

This test is failing internally, I'm getting

another_folder/ünïcödëFïlë.txt (No such file or directory)

Any idea how to fix?

Could you modify the test you run internally so that it prints the entries in that directory and possibly also their UTF-8 byte representations?

This test is failing internally.

It's hard for me to help here when I don't know how internal testing differs from external testing. Your help would be appreciated here!

Any idea how to fix?

I did find one thing weird when I was originally writing the test. I had to call

StringEncoding.unicodeToInternal(UNICODE_FILENAME) to get the right filename in the filesystem.

this changed ünïcödëFïlë.txt -> Ã¼nÃ¯cÃ¶dÃ«FÃ¯lÃ«.txt

I wonder if that has something to do with it?

In the updated test, I now run StringEncoding::internalToUnicode when reading filenames from the filesystem. And when trying to match/find the internal unicode filename, I run the opposite: StringEncoding.unicodeToInternal(UNICODE_FILENAME)

Could you modify the test you run internally so that it prints the entries in that directory

I did some test refactoring - see the latest two commits. Now the test does a directory listing and shows what's there and the filename we expected. Hope that helps meteorcloudy@ with more info from the internal testing.

let's remove this test case or make it Bazel only.

I ended up removing the unicode file test case. Just curious - how does one make a test "Bazel only"?

The existing decompression tests for Unicode functionality are shell integration tests, not Java tests.

I couldn't find existing integration tests which tested decompression under src/test/shell/integration

I saw you added some to bazel_workspaces_test.sh in

10169bb#diff-c8e054134e31f6f0307f7dccc79dc185f43d5ce865d9c895232e388cf6992ab9

But that file and those tests ended up getting deleted in 37e0d23 when the workspace->bzlmod transition happened.

how does one make a test "Bazel only"?

In Java tests, we can check https://cs.opensource.google/bazel/bazel/+/master:src/test/java/com/google/devtools/build/lib/testutil/TestConstants.java;l=31

But I realize this isn't easy since the archive contains the unicode file name.

Tests passing internally now

But that file and those tests ended up getting deleted in 37e0d23 when the workspace->bzlmod transition happened.

Nice find, will add them back

fmeum · 2025-10-15T10:11:03Z

Not sure why, but ci says that you need to run "bazel run //src/test/tools/bzlmod:update_default_lock_file".

willstranton · 2025-10-15T10:26:30Z

Not sure why, but ci says that you need to run "bazel run //src/test/tools/bzlmod:update_default_lock_file".

Ok, I ran it, and it changed src/test/tools/bzlmod/MODULE.bazel.lock to update thebzlTransitiveDigest which I assume is because I made edits in tools/build_defs/repo/http.bzl to update the comment.

Thanks for pointing that out!

meteorcloudy · 2025-10-15T10:31:48Z

...ain/java/com/google/devtools/build/lib/bazel/repository/decompressor/SevenZDecompressor.java

+          String.format(
+              "Failed to extract %s, 7-zipped paths cannot be absolute", strippedRelativePath));
+    }
+    Path outputPath = destinationDirectory.getRelative(strippedRelativePath);


Is it possible for strippedRelativePath to contain ../ and outputPath will actually be out of destinationDirectory? I think this might already be possible for other decompressors, but it'll be best if we can prevent it for new ones.

When a PathFragment is created, it should normalize the path:

eg. a/b/../c -> a/c

I guess multiple uplevel references could potentially escape the destination directory:

eg. a/b/../../../../root -> ../../root

I added a check to make sure there can't be any uplevel references (..) when we create the outputPath

Awesome, thank you!

This implementation follows the implementation of ZipDecompressor with the exception that .7z files do not handle symbolic links or preserving file permissions. The commit 9c98120 where .ar/.deb support was added, was used as a reference for what additional files to change. Closes bazelbuild#27231

meteorcloudy · 2025-10-16T08:54:22Z

...ain/java/com/google/devtools/build/lib/bazel/repository/decompressor/SevenZDecompressor.java

+          String.format(
+              "Failed to extract %s, 7-zipped paths cannot be absolute", strippedRelativePath));
+    }
+    Path outputPath = destinationDirectory.getRelative(strippedRelativePath);


Awesome, thank you!

The test was using the same decompressed output directory that other tests were using, a generic "out" directory. So test results were verifying/depending on the output of those other tests. Now the test uses /* outDirName= */ this.getClass().getSimpleName() Fixing that revealed that the expected decompressed file was named wrong: wrong: root_folder/another_folder/regular_file right: root_folder/another_folder/regularFile Instead of using `archiveDescriptor.assertOutputFiles`, which relied on the output of other tests, we now inline the exact assertions we need. Previously, the assertions would check that a file's `exists()` method was true. This didn't produce the most friendly test output. For example: > value of: exists() > expected to be false Now we get a directory listing of entries and check for the file in that list: > expected to contain: myfile > but was : [regularFile, renamedFile] For the unicode test, we now make use of `StringEncoding::internalToUnicode` when reading directory entries and `StringEncoding::unicodeToInternal` when checking for a files existence. This is because meteorcloudy@ reported that internal tests failed, which I am making an educated guess that it has something to do with the string encoding.

Each test was using the same output directory for the decompressed files (was this.getClass().getSimpleName()). This would cause problems as the decompression of files in one test would affect other tests. Now each test creates their own TestArchiveDescriptor with the output directory being set to: > this.getClass().getSimpleName() + "_" + name.getMethodName() As an example, for `testDecompressWithRenamedFiles` the directory would be: > SevenZDecompressorTest_testDecompressWithRenamedFiles

…formToInternal I have no idea what I'm doing - in this case, it seems like using the platform variant is better since it checks if encoding/reencoding on a platform is needed at all before doing the encoding. Seems safer than the previous calls. This is all being done because internal testing is failing on this change and I have no clue what is different internally since I don't work at Google.

fmeum · 2025-10-18T04:20:07Z

...java/com/google/devtools/build/lib/bazel/repository/decompressor/SevenZDecompressorTest.java

+    List<String> filenames =
+        path.readdir(Symlinks.NOFOLLOW).stream()
+            .map((Dirent::getName))
+            .map(StringEncoding::internalToPlatform)


Suggested change

.map(StringEncoding::internalToPlatform)

.map(StringEncoding::internalToUnicode)

since you are passing the result to the Truth assertion libraries, which expect regular Java Unicode strings

ok, reverted this change.

fmeum · 2025-10-18T04:20:29Z

...java/com/google/devtools/build/lib/bazel/repository/decompressor/SevenZDecompressorTest.java

+            .collect(Collectors.toList());
+    assertThat(filenames).contains(UNICODE_FILENAME);
+
+    Path unicodeFile = path.getRelative(StringEncoding.platformToInternal(UNICODE_FILENAME));


Suggested change

Path unicodeFile = path.getRelative(StringEncoding.platformToInternal(UNICODE_FILENAME));

Path unicodeFile = path.getRelative(StringEncoding.unicodeToInternal(UNICODE_FILENAME));

ok, reverted this change.

…orm/platformToInternal" This reverts commit 4950332. Per fmeum: since you are passing the result to the Truth assertion libraries, which expect regular Java Unicode strings

See discussion in bazelbuild#27269 - there was some internal testing difference for unicode files that we weren't able to figure out. It works locally on my local mac computer but did not work inside Google for some reason. Since this is the only decompression test that tried to test unicode files, we decided to drop it. The thinking is that this would also occur in the other decompression tests if they also had a unicode file. My own integration testing shows that it does work.

github-actions bot added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. awaiting-review PR is awaiting review from an assigned reviewer labels Oct 15, 2025

fmeum reviewed Oct 15, 2025

View reviewed changes

willstranton force-pushed the master branch from f942d36 to ee6f6c2 Compare October 15, 2025 09:54

fmeum requested a review from meteorcloudy October 15, 2025 10:11

willstranton force-pushed the master branch from ee6f6c2 to 754c626 Compare October 15, 2025 10:24

meteorcloudy reviewed Oct 15, 2025

View reviewed changes

willstranton force-pushed the master branch from 754c626 to fa0051d Compare October 15, 2025 12:01

meteorcloudy approved these changes Oct 16, 2025

View reviewed changes

meteorcloudy added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Oct 16, 2025

willstranton added 2 commits October 18, 2025 03:26

willstranton force-pushed the master branch from e117cc4 to 7f9c155 Compare October 18, 2025 03:26

fmeum reviewed Oct 18, 2025

View reviewed changes

willstranton added 2 commits October 20, 2025 06:26

Revert "Change internalToUnicode/unicodeToInternal -> internalToPlatf…

8e812e8

…orm/platformToInternal" This reverts commit 4950332. Per fmeum: since you are passing the result to the Truth assertion libraries, which expect regular Java Unicode strings

willstranton force-pushed the master branch from 389f207 to 8603c6e Compare October 22, 2025 02:33

copybara-service bot closed this in 93cd97e Oct 23, 2025

github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Oct 23, 2025

	.map(StringEncoding::internalToPlatform)
	.map(StringEncoding::internalToUnicode)

	Path unicodeFile = path.getRelative(StringEncoding.platformToInternal(UNICODE_FILENAME));
	Path unicodeFile = path.getRelative(StringEncoding.unicodeToInternal(UNICODE_FILENAME));

Add support for decompressing .7z files #27269

Add support for decompressing .7z files #27269

Uh oh!

Conversation

willstranton commented Oct 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

meteorcloudy Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmeum commented Oct 15, 2025

Uh oh!

willstranton commented Oct 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

meteorcloudy Oct 22, 2025 •

edited

Loading