-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Add support for decompressing .7z files #27269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| /** Tests .7z decompression. */ | ||
| @RunWith(JUnit4.class) | ||
| public class SevenZDecompressorTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a test case for Unicode filename handling? You can search for examples of tests containing äöü.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, see the added test testUnicodeFilename which checks that ünïcödëFïlë.txt is extracted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is failing internally, I'm getting
another_folder/ünïcödëFïlë.txt (No such file or directory)
Any idea how to fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you modify the test you run internally so that it prints the entries in that directory and possibly also their UTF-8 byte representations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is failing internally.
It's hard for me to help here when I don't know how internal testing differs from external testing. Your help would be appreciated here!
Any idea how to fix?
I did find one thing weird when I was originally writing the test. I had to call
StringEncoding.unicodeToInternal(UNICODE_FILENAME) to get the right filename in the filesystem.
this changed ünïcödëFïlë.txt -> ünïcödëFïlë.txt
I wonder if that has something to do with it?
In the updated test, I now run StringEncoding::internalToUnicode when reading filenames from the filesystem. And when trying to match/find the internal unicode filename, I run the opposite: StringEncoding.unicodeToInternal(UNICODE_FILENAME)
Could you modify the test you run internally so that it prints the entries in that directory
I did some test refactoring - see the latest two commits. Now the test does a directory listing and shows what's there and the filename we expected. Hope that helps meteorcloudy@ with more info from the internal testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove this test case or make it Bazel only.
I ended up removing the unicode file test case. Just curious - how does one make a test "Bazel only"?
The existing decompression tests for Unicode functionality are shell integration tests, not Java tests.
I couldn't find existing integration tests which tested decompression under src/test/shell/integration
I saw you added some to bazel_workspaces_test.sh in
10169bb#diff-c8e054134e31f6f0307f7dccc79dc185f43d5ce865d9c895232e388cf6992ab9
But that file and those tests ended up getting deleted in 37e0d23 when the workspace->bzlmod transition happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does one make a test "Bazel only"?
In Java tests, we can check https://cs.opensource.google/bazel/bazel/+/master:src/test/java/com/google/devtools/build/lib/testutil/TestConstants.java;l=31
But I realize this isn't easy since the archive contains the unicode file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests passing internally now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that file and those tests ended up getting deleted in 37e0d23 when the workspace->bzlmod transition happened.
Nice find, will add them back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Not sure why, but ci says that you need to run "bazel run //src/test/tools/bzlmod:update_default_lock_file". |
Ok, I ran it, and it changed Thanks for pointing that out! |
| String.format( | ||
| "Failed to extract %s, 7-zipped paths cannot be absolute", strippedRelativePath)); | ||
| } | ||
| Path outputPath = destinationDirectory.getRelative(strippedRelativePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for strippedRelativePath to contain ../ and outputPath will actually be out of destinationDirectory? I think this might already be possible for other decompressors, but it'll be best if we can prevent it for new ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a PathFragment is created, it should normalize the path:
eg. a/b/../c -> a/c
I guess multiple uplevel references could potentially escape the destination directory:
eg. a/b/../../../../root -> ../../root
I added a check to make sure there can't be any uplevel references (..) when we create the outputPath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you!
This implementation follows the implementation of ZipDecompressor with the exception that .7z files do not handle symbolic links or preserving file permissions. The commit 9c98120 where .ar/.deb support was added, was used as a reference for what additional files to change. Closes bazelbuild#27231
| String.format( | ||
| "Failed to extract %s, 7-zipped paths cannot be absolute", strippedRelativePath)); | ||
| } | ||
| Path outputPath = destinationDirectory.getRelative(strippedRelativePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you!
The test was using the same decompressed output directory that other tests were using, a generic "out" directory. So test results were verifying/depending on the output of those other tests. Now the test uses /* outDirName= */ this.getClass().getSimpleName() Fixing that revealed that the expected decompressed file was named wrong: wrong: root_folder/another_folder/regular_file right: root_folder/another_folder/regularFile Instead of using `archiveDescriptor.assertOutputFiles`, which relied on the output of other tests, we now inline the exact assertions we need. Previously, the assertions would check that a file's `exists()` method was true. This didn't produce the most friendly test output. For example: > value of: exists() > expected to be false Now we get a directory listing of entries and check for the file in that list: > expected to contain: myfile > but was : [regularFile, renamedFile] For the unicode test, we now make use of `StringEncoding::internalToUnicode` when reading directory entries and `StringEncoding::unicodeToInternal` when checking for a files existence. This is because meteorcloudy@ reported that internal tests failed, which I am making an educated guess that it has something to do with the string encoding.
Each test was using the same output directory for the decompressed files (was this.getClass().getSimpleName()). This would cause problems as the decompression of files in one test would affect other tests. Now each test creates their own TestArchiveDescriptor with the output directory being set to: > this.getClass().getSimpleName() + "_" + name.getMethodName() As an example, for `testDecompressWithRenamedFiles` the directory would be: > SevenZDecompressorTest_testDecompressWithRenamedFiles
…formToInternal I have no idea what I'm doing - in this case, it seems like using the platform variant is better since it checks if encoding/reencoding on a platform is needed at all before doing the encoding. Seems safer than the previous calls. This is all being done because internal testing is failing on this change and I have no clue what is different internally since I don't work at Google.
| List<String> filenames = | ||
| path.readdir(Symlinks.NOFOLLOW).stream() | ||
| .map((Dirent::getName)) | ||
| .map(StringEncoding::internalToPlatform) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .map(StringEncoding::internalToPlatform) | |
| .map(StringEncoding::internalToUnicode) |
since you are passing the result to the Truth assertion libraries, which expect regular Java Unicode strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, reverted this change.
| .collect(Collectors.toList()); | ||
| assertThat(filenames).contains(UNICODE_FILENAME); | ||
|
|
||
| Path unicodeFile = path.getRelative(StringEncoding.platformToInternal(UNICODE_FILENAME)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Path unicodeFile = path.getRelative(StringEncoding.platformToInternal(UNICODE_FILENAME)); | |
| Path unicodeFile = path.getRelative(StringEncoding.unicodeToInternal(UNICODE_FILENAME)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, reverted this change.
…orm/platformToInternal" This reverts commit 4950332. Per fmeum: since you are passing the result to the Truth assertion libraries, which expect regular Java Unicode strings
See discussion in bazelbuild#27269 - there was some internal testing difference for unicode files that we weren't able to figure out. It works locally on my local mac computer but did not work inside Google for some reason. Since this is the only decompression test that tried to test unicode files, we decided to drop it. The thinking is that this would also occur in the other decompression tests if they also had a unicode file. My own integration testing shows that it does work.
This implementation follows the implementation of ZipDecompressor with the exception that .7z files do not handle symbolic links or preserving file permissions. The commit 9c98120 where .ar/.deb support was added, was used as a reference for what additional files to change.
Closes #27231