Skip to content

Conversation

dkjsone
Copy link
Contributor

@dkjsone dkjsone commented Aug 20, 2025

Fixes #13537

Add the _check_link_targetfunction to validate the file pointed to by a symlink. If path traversal is detected, it should raise an InstallationErrorexception, similar to is_within_directory.

Add test cases for symlink:

  • Test Case 1​​: Tests the extraction of a normal symlink.
  • ​​Test Case 2​​: Tests the extraction of a malicious symlink pointing to an absolute path.
  • Test Case 3​​: Tests the extraction of a malicious symlink pointing to a relative path.

@dkjsone
Copy link
Contributor Author

dkjsone commented Aug 21, 2025

Revise the description of the test case.

  • Add a new test case for a symbolic link pointing to a file in the parent directory (still within the tar archive).

Copy link
Member

@notatallshaw notatallshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, the underlying issue should no longer happen on Python 3.14+?

If so could you check for less than Python 3.14 and make sure the relevant tests still pass.

This will allow us to clearly identify it as code that can removed once we drop support for Python 3.14.

​​Add deprecation note for _untar_without_filter for future removal​
@dkjsone
Copy link
Contributor Author

dkjsone commented Aug 27, 2025

As I understand it, the underlying issue should no longer happen on Python 3.14+?

If so could you check for less than Python 3.14 and make sure the relevant tests still pass.

This will allow us to clearly identify it as code that can removed once we drop support for Python 3.14.

This issue should not happen in Python 3.12. I've added a NOTE to the _untar_without_filter function indicating it can be removed once pip's minimum supported Python version is 3.12.

@dkjsone dkjsone requested a review from notatallshaw August 30, 2025 14:35
@@ -0,0 +1,2 @@
Pip will now raise an installation error for a source distribution when it includes a symlink that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know "pip" is lowercased usually, is this also the case for news? We also need to note that this is only an issue for Python versions that don't implement PEP 706 (maybe including the version ranges starting at the minimum version pip supports).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t believe there is a formal policy that pip should be lowercase. I certainly don’t do so. I think some maintainers take that view but there’s no actual consensus. So I’m fine with leaving this as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I've updated the NEWS entry to highlight that these changes are specifically for Python versions that don't support PEP 706.
Also, there's no need to dwell on the capitalization of "pip" anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's how I normally avoid conflicts over this :-)

with pytest.raises(InstallationError) as e:
untar_file(tar_filepath, extract_path)

assert "trying to install outside target directory" in str(e.value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know the full paths, can we not template to do an exact match of the complete message in both negative cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modifications have been adjusted based on your suggestions.

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had one nit, but otherwise all my comments are resolved.

linkname = os.path.join(os.path.dirname(tarinfo.name), tarinfo.linkname)

linkname = os.path.normpath(linkname)
if "\\" in linkname:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory this does a full-scan of linkname already, which .replace() would also need to do? Is this any faster than calling linkname.replace() unconditionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very detailed question. Redundant conditional checks have been removed.

Copy link
Member

@notatallshaw notatallshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkjsone thanks for your work here and responding to reviews, the code is in a significantly higher quality state.

I'll leave this open for a couple more weeks to give anyone else time to review or object (it seems most maintainers have been busy with other stuff this summer), otherwise I will merge.

@notatallshaw notatallshaw added this to the 25.3 milestone Sep 11, 2025
@dkjsone
Copy link
Contributor Author

dkjsone commented Sep 11, 2025

Okay, thank you all for the guidance.

@notatallshaw notatallshaw merged commit f2b9231 into pypa:main Sep 24, 2025
28 checks passed
@notatallshaw
Copy link
Member

@dkjsone thanks for your contribution to pip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow tarfile.extractfile() to handle symlinks, even without data_filter support
4 participants