Skip to content

Correctly detect final path when extracting multiple tarballs #4922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Jun 13, 2025

The current implementation of extract_file detected folders/files from the first tarball when extracting the second.
Due to the definition of find_base_dir it will then return the parent path (usually builddir)
which is used as the finalpath for sources.
This leads issues requiring workarounds in e.g. the Bundle easyblock
where sources from all components are extracted into the same folder.

Fix by storing the old state of the target folder and detect of
extraction resulted in a single (top-level) folder.

As part of this work I enhanced the documentation and test of find_base_dir and added tests for the expected and (previous) error cases of extract_file

guess_start_dir detects the fixed behavior already: If there is a prefix/<startdir>/<startdir> which doesn't exist prefix/<startdir> is used instead

E.g. GTK3 has start_dir set manually for components due to this bug which would now lead to builddir/foo/foo
IMO trivial to detect/fix

Remark: It is possible to have find_base_dir NOT change into the resulting directory which means we can do this explicitely when requested instead of undoing it. Not sure if anyone outside the main repo relies on that.

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Flamefire Rather than adding (more) binary blobs to the repo, can we create multi-0.0.tar.gz and multi-1.0.tar.gz dynamically in the test itself?

That probably even helps with exposing what's being tested, since the structure within these tarballs is now not visible.

This request is fueled by a recent example of binary blobs being abused.

I'm aware we have other binary blobs in here already, we should look into getting rid of those too.

Create an empty target folder so the extraction can change into the
extracted folder.
The current implementation of `extract_file` detected folders/files from the first tarball when extracting the second.
Due to the definition of `find_base_dir` it will then return the parent path (usually `builddir`)
which is used as the `finalpath` for sources.
This leads issues requiring workarounds in e.g. the Bundle easyblock
where sources from all components are extracted into the same folder.

Fix by storing the old state of the target folder and detect of
extraction resulted in a single (top-level) folder.
Adapt for `extract_dir` now returning the subfolder which it previously
did not because the downloaded tar file is in the same folder.
@Flamefire
Copy link
Contributor Author

@boegel I replaced the added binaries by creating them in the test (replacing the commit to remove them from the history)

I also replaced all uses of other binaries in that test. The test-method I added should be easily usable for other tests.

What I noticed: Our filetools.make_archive is lacking a way to create an archive from a list of files or contents of a folder, i.e. exclude the parent folder from the hierarchy in the archive. See the new method I added.

@boegel boegel modified the milestones: 5.1.1, release after 5.1.1 Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants