Skip to content
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -112,11 +112,13 @@ static Decompressor getDecompressor(Path archivePath) throws RepositoryFunctionE
return TarBz2Function.INSTANCE;
} else if (baseName.endsWith(".ar") || baseName.endsWith(".deb")) {
return ArFunction.INSTANCE;
} else if (baseName.endsWith(".7z")) {
return SevenZDecompressor.INSTANCE;
} else {
throw new RepositoryFunctionException(
Starlark.errorf(
"Expected a file with a .zip, .jar, .war, .aar, .nupkg, .whl, .tar, .tar.gz, .tgz,"
+ " .tar.xz, , .tar.zst, .tzst, .tar.bz2, .tbz, .ar or .deb suffix (got %s)",
+ " .tar.xz, , .tar.zst, .tzst, .tar.bz2, .tbz, .ar, .deb or .7z suffix (got %s)",
archivePath),
Transience.PERSISTENT);
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
package com.google.devtools.build.lib.bazel.repository.decompressor;

import static java.nio.charset.StandardCharsets.UTF_8;

import com.google.common.io.ByteStreams;
import com.google.devtools.build.lib.bazel.repository.RepositoryFunctionException;
import com.google.devtools.build.lib.bazel.repository.decompressor.DecompressorValue.Decompressor;
import com.google.devtools.build.lib.vfs.Path;
import com.google.devtools.build.lib.vfs.PathFragment;
import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
import org.apache.commons.compress.archivers.sevenz.SevenZFile;

import javax.annotation.Nullable;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.HashSet;
import java.util.Map;
import java.util.Optional;
import java.util.Set;

/**
* Creates a repository by decompressing a 7-zip file. This implementation generally follows the
* logic from {@link ZipDecompressor} with the exception that the 7z format does not support file
* permissions or symbolic links.
*/
public class SevenZDecompressor implements Decompressor {
public static final Decompressor INSTANCE = new SevenZDecompressor();

/** Decompresses the file to directory {@link DecompressorDescriptor#destinationPath()} */
@Override
@Nullable
public Path decompress(DecompressorDescriptor descriptor)
throws IOException, RepositoryFunctionException, InterruptedException {
Path destinationDirectory = descriptor.destinationPath();
Optional<String> prefix = descriptor.prefix();
Map<String, String> renameFiles = descriptor.renameFiles();
boolean foundPrefix = false;

try (SevenZFile sevenZFile =
SevenZFile.builder().setFile(descriptor.archivePath().getPathFile()).get()) {
Iterable<SevenZArchiveEntry> entries = sevenZFile.getEntries();
for (SevenZArchiveEntry entry : entries) {
String entryName = entry.getName();
entryName = renameFiles.getOrDefault(entryName, entryName);
StripPrefixedPath entryPath =
StripPrefixedPath.maybeDeprefix(entryName.getBytes(UTF_8), prefix);
foundPrefix = foundPrefix || entryPath.foundPrefix();
if (entryPath.skip()) {
continue;
}
extract7zEntry(sevenZFile, entry, destinationDirectory, entryPath.getPathFragment());
}

if (prefix.isPresent() && !foundPrefix) {
Set<String> prefixes = new HashSet<>();
for (SevenZArchiveEntry entry : entries) {
StripPrefixedPath entryPath =
StripPrefixedPath.maybeDeprefix(entry.getName().getBytes(UTF_8), Optional.empty());
CouldNotFindPrefixException.maybeMakePrefixSuggestion(entryPath.getPathFragment())
.ifPresent(prefixes::add);
}
throw new CouldNotFindPrefixException(prefix.get(), prefixes);
}
}
return destinationDirectory;
}

private static void extract7zEntry(
SevenZFile sevenZFile,
SevenZArchiveEntry entry,
Path destinationDirectory,
PathFragment strippedRelativePath)
throws IOException, InterruptedException {
if (strippedRelativePath.isAbsolute()) {
throw new IOException(
String.format(
"Failed to extract %s, 7-zipped paths cannot be absolute", strippedRelativePath));
}
// Sanity/security check - at this point, uplevel references (..) should be resolved.
// There shouldn't be any remaining uplevel references, otherwise, the extracted file could
// "escape" the destination directory.
if (strippedRelativePath.containsUplevelReferences()) {
throw new IOException(
String.format(
"Failed to extract %s, 7-zipped entry contains uplevel references (..)",
strippedRelativePath));
}
Path outputPath = destinationDirectory.getRelative(strippedRelativePath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for strippedRelativePath to contain ../ and outputPath will actually be out of destinationDirectory? I think this might already be possible for other decompressors, but it'll be best if we can prevent it for new ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a PathFragment is created, it should normalize the path:

eg. a/b/../c -> a/c

I guess multiple uplevel references could potentially escape the destination directory:

eg. a/b/../../../../root -> ../../root

I added a check to make sure there can't be any uplevel references (..) when we create the outputPath

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you!

outputPath.getParentDirectory().createDirectoryAndParents();
boolean isDirectory = entry.isDirectory();
if (isDirectory) {
outputPath.createDirectoryAndParents();
} else {
try (InputStream input = sevenZFile.getInputStream(entry);
OutputStream output = outputPath.getOutputStream()) {
ByteStreams.copy(input, output);
if (Thread.interrupted()) {
throw new InterruptedException();
}
}
outputPath.setLastModifiedTime(entry.getLastModifiedTime().toMillis());
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -906,7 +906,7 @@ public Object download(
determined from the file extension of the URL. If the file has no \
extension, you can explicitly specify either "zip", "jar", "war", \
"aar", "nupkg", "whl", "tar", "tar.gz", "tgz", "tar.xz", "txz", ".tar.zst", \
".tzst", "tar.bz2", ".tbz", ".ar", or ".deb" here.
".tzst", "tar.bz2", ".tbz", ".ar", ".deb", or ".7z" here.
"""),
@Param(
name = "strip_prefix",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ java_library(
name = "DecompressorTests_lib",
srcs = glob(["*.java"]),
data = [
"test_decompress_archive.7z",
"test_decompress_archive.tar.gz",
"test_decompress_archive.zip",
"test_files.ar",
Expand All @@ -27,6 +28,7 @@ java_library(
"//src/main/java/com/google/devtools/build/lib/clock",
"//src/main/java/com/google/devtools/build/lib/unix",
"//src/main/java/com/google/devtools/build/lib/util:os",
"//src/main/java/com/google/devtools/build/lib/util:string_encoding",
"//src/main/java/com/google/devtools/build/lib/vfs",
"//src/main/java/com/google/devtools/build/lib/vfs:pathfragment",
"//src/main/java/com/google/devtools/build/lib/vfs/inmemoryfs",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ public void testKnownFileExtensionsDoNotThrow() throws Exception {
unused = DecompressorValue.getDecompressor(path);
path = fs.getPath("/foo/.external-repositories/some-repo/bar.baz.deb");
unused = DecompressorValue.getDecompressor(path);
path = fs.getPath("/foo/.external-repositories/some-repo/bar.baz.7z");
unused = DecompressorValue.getDecompressor(path);
}

@Test
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
package com.google.devtools.build.lib.bazel.repository.decompressor;

import com.google.devtools.build.lib.util.StringEncoding;
import com.google.devtools.build.lib.vfs.Dirent;
import com.google.devtools.build.lib.vfs.Path;
import com.google.devtools.build.lib.vfs.Symlinks;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.TestName;
import org.junit.runner.RunWith;
import org.junit.runners.JUnit4;

import java.util.HashMap;
import java.util.List;
import java.util.stream.Collectors;

import static com.google.common.truth.Truth.assertThat;
import static com.google.devtools.build.lib.bazel.repository.decompressor.TestArchiveDescriptor.INNER_FOLDER_NAME;
import static com.google.devtools.build.lib.bazel.repository.decompressor.TestArchiveDescriptor.ROOT_FOLDER_NAME;

/** Tests .7z decompression. */
@RunWith(JUnit4.class)
public class SevenZDecompressorTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test case for Unicode filename handling? You can search for examples of tests containing äöü.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, see the added test testUnicodeFilename which checks that ünïcödëFïlë.txt is extracted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing internally, I'm getting

another_folder/ünïcödëFïlë.txt (No such file or directory)

Any idea how to fix?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you modify the test you run internally so that it prints the entries in that directory and possibly also their UTF-8 byte representations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing internally.

It's hard for me to help here when I don't know how internal testing differs from external testing. Your help would be appreciated here!

Any idea how to fix?

I did find one thing weird when I was originally writing the test. I had to call

StringEncoding.unicodeToInternal(UNICODE_FILENAME) to get the right filename in the filesystem.

this changed ünïcödëFïlë.txt -> ünïcödëFïlë.txt

I wonder if that has something to do with it?

In the updated test, I now run StringEncoding::internalToUnicode when reading filenames from the filesystem. And when trying to match/find the internal unicode filename, I run the opposite: StringEncoding.unicodeToInternal(UNICODE_FILENAME)

Could you modify the test you run internally so that it prints the entries in that directory

I did some test refactoring - see the latest two commits. Now the test does a directory listing and shows what's there and the filename we expected. Hope that helps meteorcloudy@ with more info from the internal testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this test case or make it Bazel only.

I ended up removing the unicode file test case. Just curious - how does one make a test "Bazel only"?

The existing decompression tests for Unicode functionality are shell integration tests, not Java tests.

I couldn't find existing integration tests which tested decompression under src/test/shell/integration

I saw you added some to bazel_workspaces_test.sh in

10169bb#diff-c8e054134e31f6f0307f7dccc79dc185f43d5ce865d9c895232e388cf6992ab9

But that file and those tests ended up getting deleted in 37e0d23 when the workspace->bzlmod transition happened.

Copy link
Member

@meteorcloudy meteorcloudy Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does one make a test "Bazel only"?

In Java tests, we can check https://cs.opensource.google/bazel/bazel/+/master:src/test/java/com/google/devtools/build/lib/testutil/TestConstants.java;l=31

But I realize this isn't easy since the archive contains the unicode file name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests passing internally now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that file and those tests ended up getting deleted in 37e0d23 when the workspace->bzlmod transition happened.

Nice find, will add them back

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rule public TestName name = new TestName();

/**
* .7z file, created with two files:
*
* <ul>
* <li>root_folder/another_folder/regularFile
* <li>root_folder/another_folder/ünïcödëFïlë.txt
* </ul>
*
* Compressed with command "7zz a test_decompress_archive.7z root_folder"
*/
private static final String ARCHIVE_NAME = "test_decompress_archive.7z";

private static final String REGULAR_FILENAME = "regularFile";
private static final String UNICODE_FILENAME = "ünïcödëFïlë.txt";

/** Provides a test filesystem descriptor for a test. NOTE: unique per individual test ONLY. */
private TestArchiveDescriptor archiveDescriptor() throws Exception {
return new TestArchiveDescriptor(
ARCHIVE_NAME,
/* outDirName= */ this.getClass().getSimpleName() + "_" + name.getMethodName(),
/* withHardLinks= */ false);
}

/** Test decompressing a .7z file without stripping a prefix */
@Test
public void testDecompressWithoutPrefix() throws Exception {
Path outputDir = decompress(archiveDescriptor().createDescriptorBuilder().build());

Path fileDir = outputDir.getRelative(ROOT_FOLDER_NAME).getRelative(INNER_FOLDER_NAME);
List<String> files =
fileDir.readdir(Symlinks.NOFOLLOW).stream()
.map(Dirent::getName)
.collect(Collectors.toList());
assertThat(files).contains(REGULAR_FILENAME);
assertThat(fileDir.getRelative(REGULAR_FILENAME).getFileSize()).isNotEqualTo(0);
}

/** Test decompressing a .7z file and stripping a prefix. */
@Test
public void testDecompressWithPrefix() throws Exception {
DecompressorDescriptor.Builder descriptorBuilder =
archiveDescriptor().createDescriptorBuilder().setPrefix(ROOT_FOLDER_NAME);
Path outputDir = decompress(descriptorBuilder.build());
Path fileDir = outputDir.getRelative(INNER_FOLDER_NAME);

List<String> files =
fileDir.readdir(Symlinks.NOFOLLOW).stream()
.map(Dirent::getName)
.collect(Collectors.toList());
assertThat(files).contains(REGULAR_FILENAME);
}

/** Test decompressing a .7z with entries being renamed during the extraction process. */
@Test
public void testDecompressWithRenamedFiles() throws Exception {
String innerDirName = ROOT_FOLDER_NAME + "/" + INNER_FOLDER_NAME;

HashMap<String, String> renameFiles = new HashMap<>();
renameFiles.put(innerDirName + "/" + REGULAR_FILENAME, innerDirName + "/renamedFile");
DecompressorDescriptor.Builder descriptorBuilder =
archiveDescriptor().createDescriptorBuilder().setRenameFiles(renameFiles);
Path outputDir = decompress(descriptorBuilder.build());

Path fileDir = outputDir.getRelative(ROOT_FOLDER_NAME).getRelative(INNER_FOLDER_NAME);
List<String> files =
fileDir.readdir(Symlinks.NOFOLLOW).stream()
.map((Dirent::getName))
.collect(Collectors.toList());
assertThat(files).contains("renamedFile");
assertThat(fileDir.getRelative("renamedFile").getFileSize()).isNotEqualTo(0);
}

/** Test that entry renaming is applied prior to prefix stripping. */
@Test
public void testDecompressWithRenamedFilesAndPrefix() throws Exception {
String innerDirName = ROOT_FOLDER_NAME + "/" + INNER_FOLDER_NAME;

HashMap<String, String> renameFiles = new HashMap<>();
renameFiles.put(innerDirName + "/" + REGULAR_FILENAME, innerDirName + "/renamedFile");
DecompressorDescriptor.Builder descriptorBuilder =
archiveDescriptor()
.createDescriptorBuilder()
.setPrefix(ROOT_FOLDER_NAME)
.setRenameFiles(renameFiles);
Path outputDir = decompress(descriptorBuilder.build());

Path fileDir = outputDir.getRelative(INNER_FOLDER_NAME);
List<String> files =
fileDir.readdir(Symlinks.NOFOLLOW).stream()
.map((Dirent::getName))
.collect(Collectors.toList());
assertThat(files).contains("renamedFile");
assertThat(fileDir.getRelative("renamedFile").getFileSize()).isNotEqualTo(0);
}

/** Test that Unicode filenames are handled. **/
@Test
public void testUnicodeFilename() throws Exception {
Path outputDir = decompress(archiveDescriptor().createDescriptorBuilder().build());

Path path = outputDir.getRelative(ROOT_FOLDER_NAME).getRelative(INNER_FOLDER_NAME);
List<String> filenames =
path.readdir(Symlinks.NOFOLLOW).stream()
.map((Dirent::getName))
.map(StringEncoding::internalToPlatform)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map(StringEncoding::internalToPlatform)
.map(StringEncoding::internalToUnicode)

since you are passing the result to the Truth assertion libraries, which expect regular Java Unicode strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, reverted this change.

.collect(Collectors.toList());
assertThat(filenames).contains(UNICODE_FILENAME);

Path unicodeFile = path.getRelative(StringEncoding.platformToInternal(UNICODE_FILENAME));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Path unicodeFile = path.getRelative(StringEncoding.platformToInternal(UNICODE_FILENAME));
Path unicodeFile = path.getRelative(StringEncoding.unicodeToInternal(UNICODE_FILENAME));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, reverted this change.

assertThat(unicodeFile.exists()).isTrue();
assertThat(unicodeFile.getFileSize()).isNotEqualTo(0);
}

private Path decompress(DecompressorDescriptor descriptor) throws Exception {
return new SevenZDecompressor().decompress(descriptor);
}
}
Binary file not shown.
4 changes: 2 additions & 2 deletions src/test/tools/bzlmod/MODULE.bazel.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 3 additions & 2 deletions tools/build_defs/repo/http.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,8 @@ repository. Files are symlinked after remote files are downloaded and patches (`
By default, the archive type is determined from the file extension of the
URL. If the file has no extension, you can explicitly specify one of the
following: `"zip"`, `"jar"`, `"war"`, `"aar"`, `"tar"`, `"tar.gz"`, `"tgz"`,
`"tar.xz"`, `"txz"`, `"tar.zst"`, `"tzst"`, `"tar.bz2"`, `"ar"`, or `"deb"`.""",
`"tar.xz"`, `"txz"`, `"tar.zst"`, `"tzst"`, `"tar.bz2"`, `"ar"`, `"deb"`, or
`"7z"`.""",
),
"patches": attr.label_list(
default = [],
Expand Down Expand Up @@ -450,7 +451,7 @@ and makes its targets available for binding.

It supports the following file extensions: `"zip"`, `"jar"`, `"war"`, `"aar"`, `"tar"`,
`"tar.gz"`, `"tgz"`, `"tar.xz"`, `"txz"`, `"tar.zst"`, `"tzst"`, `tar.bz2`, `"ar"`,
or `"deb"`.
`"deb"`, or `"7z"`.

Examples:
Suppose the current repository contains the source code for a chat program,
Expand Down