Skip to content

Conversation

@willstranton
Copy link
Contributor

This implementation follows the implementation of ZipDecompressor with the exception that .7z files do not handle symbolic links or preserving file permissions. The commit 9c98120 where .ar/.deb support was added, was used as a reference for what additional files to change.

Closes #27231

@github-actions github-actions bot added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. awaiting-review PR is awaiting review from an assigned reviewer labels Oct 15, 2025

/** Tests .7z decompression. */
@RunWith(JUnit4.class)
public class SevenZDecompressorTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test case for Unicode filename handling? You can search for examples of tests containing äöü.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, see the added test testUnicodeFilename which checks that ünïcödëFïlë.txt is extracted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing internally, I'm getting

another_folder/ünïcödëFïlë.txt (No such file or directory)

Any idea how to fix?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you modify the test you run internally so that it prints the entries in that directory and possibly also their UTF-8 byte representations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing internally.

It's hard for me to help here when I don't know how internal testing differs from external testing. Your help would be appreciated here!

Any idea how to fix?

I did find one thing weird when I was originally writing the test. I had to call

StringEncoding.unicodeToInternal(UNICODE_FILENAME) to get the right filename in the filesystem.

this changed ünïcödëFïlë.txt -> ünïcödëFïlë.txt

I wonder if that has something to do with it?

In the updated test, I now run StringEncoding::internalToUnicode when reading filenames from the filesystem. And when trying to match/find the internal unicode filename, I run the opposite: StringEncoding.unicodeToInternal(UNICODE_FILENAME)

Could you modify the test you run internally so that it prints the entries in that directory

I did some test refactoring - see the latest two commits. Now the test does a directory listing and shows what's there and the filename we expected. Hope that helps meteorcloudy@ with more info from the internal testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this test case or make it Bazel only.

I ended up removing the unicode file test case. Just curious - how does one make a test "Bazel only"?

The existing decompression tests for Unicode functionality are shell integration tests, not Java tests.

I couldn't find existing integration tests which tested decompression under src/test/shell/integration

I saw you added some to bazel_workspaces_test.sh in

10169bb#diff-c8e054134e31f6f0307f7dccc79dc185f43d5ce865d9c895232e388cf6992ab9

But that file and those tests ended up getting deleted in 37e0d23 when the workspace->bzlmod transition happened.

Copy link
Member

@meteorcloudy meteorcloudy Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does one make a test "Bazel only"?

In Java tests, we can check https://cs.opensource.google/bazel/bazel/+/master:src/test/java/com/google/devtools/build/lib/testutil/TestConstants.java;l=31

But I realize this isn't easy since the archive contains the unicode file name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests passing internally now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that file and those tests ended up getting deleted in 37e0d23 when the workspace->bzlmod transition happened.

Nice find, will add them back

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmeum
Copy link
Collaborator

fmeum commented Oct 15, 2025

Not sure why, but ci says that you need to run "bazel run //src/test/tools/bzlmod:update_default_lock_file".

@willstranton
Copy link
Contributor Author

Not sure why, but ci says that you need to run "bazel run //src/test/tools/bzlmod:update_default_lock_file".

Ok, I ran it, and it changed src/test/tools/bzlmod/MODULE.bazel.lock to update thebzlTransitiveDigest which I assume is because I made edits in tools/build_defs/repo/http.bzl to update the comment.

Thanks for pointing that out!

String.format(
"Failed to extract %s, 7-zipped paths cannot be absolute", strippedRelativePath));
}
Path outputPath = destinationDirectory.getRelative(strippedRelativePath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for strippedRelativePath to contain ../ and outputPath will actually be out of destinationDirectory? I think this might already be possible for other decompressors, but it'll be best if we can prevent it for new ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a PathFragment is created, it should normalize the path:

eg. a/b/../c -> a/c

I guess multiple uplevel references could potentially escape the destination directory:

eg. a/b/../../../../root -> ../../root

I added a check to make sure there can't be any uplevel references (..) when we create the outputPath

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you!

This implementation follows the implementation of ZipDecompressor with the
exception that .7z files do not handle symbolic links or preserving file
permissions.  The commit 9c98120 where
.ar/.deb support was added, was used as a reference for what additional files
to change.

Closes bazelbuild#27231
String.format(
"Failed to extract %s, 7-zipped paths cannot be absolute", strippedRelativePath));
}
Path outputPath = destinationDirectory.getRelative(strippedRelativePath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you!

@meteorcloudy meteorcloudy added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Oct 16, 2025
The test was using the same decompressed output directory that other tests were
using, a generic "out" directory. So test results were verifying/depending on
the output of those other tests.  Now the test uses /* outDirName= */
this.getClass().getSimpleName()

Fixing that revealed that the expected decompressed file was named wrong:

wrong: root_folder/another_folder/regular_file
right: root_folder/another_folder/regularFile

Instead of using `archiveDescriptor.assertOutputFiles`, which relied on the
output of other tests, we now inline the exact assertions we need.

Previously, the assertions would check that a file's `exists()` method was
true. This didn't produce the most friendly test output.  For example:

> value of: exists()
> expected to be false

Now we get a directory listing of entries and check for the file in that list:

> expected to contain: myfile
> but was            : [regularFile, renamedFile]

For the unicode test, we now make use of `StringEncoding::internalToUnicode`
when reading directory entries and `StringEncoding::unicodeToInternal` when
checking for a files existence. This is because meteorcloudy@ reported that
internal tests failed, which I am making an educated guess that it has
something to do with the string encoding.
Each test was using the same output directory for the decompressed files (was this.getClass().getSimpleName()).
This would cause problems as the decompression of files in one test would affect other tests.

Now each test creates their own TestArchiveDescriptor with the output directory being set to:

> this.getClass().getSimpleName() + "_" + name.getMethodName()

As an example, for `testDecompressWithRenamedFiles` the directory would be:

> SevenZDecompressorTest_testDecompressWithRenamedFiles
…formToInternal

I have no idea what I'm doing - in this case, it seems like using the platform
variant is better since it checks if encoding/reencoding on a platform is
needed at all before doing the encoding.  Seems safer than the previous calls.

This is all being done because internal testing is failing on this change and I
have no clue what is different internally since I don't work at Google.
List<String> filenames =
path.readdir(Symlinks.NOFOLLOW).stream()
.map((Dirent::getName))
.map(StringEncoding::internalToPlatform)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map(StringEncoding::internalToPlatform)
.map(StringEncoding::internalToUnicode)

since you are passing the result to the Truth assertion libraries, which expect regular Java Unicode strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, reverted this change.

.collect(Collectors.toList());
assertThat(filenames).contains(UNICODE_FILENAME);

Path unicodeFile = path.getRelative(StringEncoding.platformToInternal(UNICODE_FILENAME));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Path unicodeFile = path.getRelative(StringEncoding.platformToInternal(UNICODE_FILENAME));
Path unicodeFile = path.getRelative(StringEncoding.unicodeToInternal(UNICODE_FILENAME));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, reverted this change.

…orm/platformToInternal"

This reverts commit 4950332.

Per fmeum: since you are passing the result to the Truth assertion libraries,
which expect regular Java Unicode strings
See discussion in bazelbuild#27269 - there was some internal testing difference for
unicode files that we weren't able to figure out.  It works locally on my local
mac computer but did not work inside Google for some reason.  Since this is the
only decompression test that tried to test unicode files, we decided to drop
it.  The thinking is that this would also occur in the other decompression
tests if they also had a unicode file.  My own integration testing shows that
it does work.
@github-actions github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add 7z/7zip file extraction support to Bazel

3 participants