Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't replace underscores with hyphens in distribution name #921

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog/901.bugfix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fixed replacing underscores by hyphens in package names.
4 changes: 2 additions & 2 deletions tests/test_package.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,8 +119,8 @@ def test_package_signed_name_is_correct():
[
(string.ascii_letters, string.ascii_letters),
(string.digits, string.digits),
(string.punctuation, "-.-"),
("mosaik.SimConfig", "mosaik.SimConfig"),
(string.punctuation, "-.-_-"),
("mosaik._SimConfig", "mosaik._SimConfig"),
("mosaik$$$$.SimConfig", "mosaik-.SimConfig"),
],
)
Expand Down
4 changes: 2 additions & 2 deletions twine/package.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,12 +51,12 @@
def _safe_name(name: str) -> str:
"""Convert an arbitrary string to a standard distribution name.

Any runs of non-alphanumeric/. characters are replaced with a single '-'.
Any runs of non-alphanumeric/./_ characters are replaced with a single '-'.

Copied from pkg_resources.safe_name for compatibility with warehouse.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More that this comes from elsewhere and is intended to match behavior with PyPI (the server we care most about). This ostensibly breaks that compatibility

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. Pls check https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#name

Comparison of project names is case insensitive and treats arbitrarily-long runs of underscores, hyphens, and/or periods as equal. For example, if you register a project named cool-stuff, users will be able to download it or declare a dependency on it using any of the following spellings:

Cool-Stuff
cool.stuff
COOL_STUFF
CoOl__-.-__sTuFF

So underscores and hyphens shouldn't matter in PyPI and it should be backward compatible.
As well this PR is just correct implementation of PEP 508.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there has been some recent work on this, and possibly some changes. I want to dig into that before reviewing. That said, the linked guide is outdated; https://setuptools.pypa.io/en/stable/ is the authoritative documentation for setuptools.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some initial research, I don't know what's the right thing to do, and it seems like there's ongoing discussion about package/distribution name normalization:

(cc @pfmoore @dstufft @ewdurbin as contributors to the above discussions)

As well this PR is just correct implementation of PEP 508.

It seems like _safe_name is attempting to implement normalization defined in PEP 503.

For additional context, the issue that led to this PR states:

This behaviour is causing problems e.g. if you have defined in Artifactory mandatory distribution name prefix containing underscores. E.g. org_name_prefix. So if you have package named org_name_prefix_my_package it is converted by twine to distribution name org-name-prefix-my-package and then Artifactory refuses to publish such package.

I think the important question here is: if Twine doesn't convert underscores to hyphens, will that be compatible with PyPI?

Additionally, it makes me a little nervous that Twine has its own normalization function. Has this logic been centralized anywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the important question here is: if Twine doesn't convert underscores to hyphens, will that be compatible with PyPI?

I can't answer that, I'm afraid. But I do think that PyPI and twine should agree on either allowing unmodified distribution names (as defined in the project metadata) or using the same normalisation (and IMO PEP 503 is the main contender for a "universal" normalisation rule these days). On the other hand, PyPI should have access to the un-normalised name, so it can respect the project author's wishes in the UI and similar places. I don't have context for this issue, but if the field we're discussing here is the only place that the project name is passed to PyPI, I'd say that means that it should be passed unmodified.

My impression is that PyPI has much more difficult compatibility considerations, though, so for practical reasons it might be necessary to "follow what PyPI allows" for the short term. I'd view that as a temporary measure, though.

Has this logic been centralized anywhere?

Packaging has packaging.utils.canonicalize_name which (as far as I know) implements PEP 503 normalisation. But the PEP 503 rule is designed to be easy to copy and paste, so I'd consider replicating it to avoid a dependency on packaging as perfectly acceptable.

Copy link
Contributor

@bhrutledge bhrutledge Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I agree that Twine shouldn't normalize filenames (which it doesn't currently do).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see, I missed it switched back to effectively using safe_name().

So, the problem is still roughly the same, just being caused by the fact that safe_name normalizes _ to -.

You should not have to do any normalization of the name for PyPI.

Where PyPI wants things to be normalized it normalizes it itself, where it doesn't, it doesn't.

I think there was a time when that wasn't the case, and you had to do some normalization, unfortunately #70 and the linked issue #47 don't provide enough information for me to remember specifics.

If it were me, I'd be inclined to try removing all normalization from the name field and see if PyPI complains. I'm pretty sure it won't 1, but if it does then that would be a PyPI bug IMO.

Footnotes

  1. Assuming that the unnormalized name of the project matches the filename.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that twine should upload the file with the name as given (i.e., not normalise) I'm a bit concerned about the idea that neither twine nor the index server is validating that the files being uploaded are correct - where "correct" implies "normalised". But I guess twine can take a "trust the user" attitude, and servers might be prepared to accept unnormalised names. I'm not sure how the rest of the ecosystem would handle unnormalised wheel names (sdist names are still a bit of a mess so "legacy" forgives a lot there). I'm going to say that's not my problem though 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Difference that's splitting hairs but twine takes a trust the build-system approach followed by a trust the repository server to error with a semi-helpful error response we can provide to the user.

Genuinely, I'm not certain twine needs to exist. It provides very little benefit at this point. Most of what it does for upload can now be achieved with the standard library (historically not the case) on most (if not all of the supported versions). Most of the checks it has are simplistic and could be done by a build system to verify its output during a finalisation stage (where it would provide more value and immediate feedback to the user closer to where the issue is).

There was a dream of twine becoming a build-then-upload tool and way of hiding every build backend but neither Brian nor I have made that happen. Speaking for myself, I certainly don't have the time or desire for that. With the number of build backends, it would likely be a nightmare to paper over all their APIs (assuming they're supporting an API for us to use) and we'd have to work with whatever existing config file they're using, section, etc. Just doesn't seem worth the effort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a standard API for it, so you wouldn't need to paper over anything, just use the standard APIs... that being said, there's already a project doing that (https://github.com/pypa/build) now so adding another one seems silly.

See https://github.com/pypa/twine/issues/743.
"""
return re.sub("[^A-Za-z0-9.]+", "-", name)
return re.sub("[^A-Za-z0-9._]+", "-", name)


class PackageFile:
Expand Down