-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't replace underscores with hyphens in distribution name #921
base: main
Are you sure you want to change the base?
Don't replace underscores with hyphens in distribution name #921
Conversation
@@ -51,12 +51,12 @@ | |||
def _safe_name(name: str) -> str: | |||
"""Convert an arbitrary string to a standard distribution name. | |||
|
|||
Any runs of non-alphanumeric/. characters are replaced with a single '-'. | |||
Any runs of non-alphanumeric/./_ characters are replaced with a single '-'. | |||
|
|||
Copied from pkg_resources.safe_name for compatibility with warehouse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More that this comes from elsewhere and is intended to match behavior with PyPI (the server we care most about). This ostensibly breaks that compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. Pls check https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#name
Comparison of project names is case insensitive and treats arbitrarily-long runs of underscores, hyphens, and/or periods as equal. For example, if you register a project named cool-stuff, users will be able to download it or declare a dependency on it using any of the following spellings:
Cool-Stuff
cool.stuff
COOL_STUFF
CoOl__-.-__sTuFF
So underscores and hyphens shouldn't matter in PyPI and it should be backward compatible.
As well this PR is just correct implementation of PEP 508.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there has been some recent work on this, and possibly some changes. I want to dig into that before reviewing. That said, the linked guide is outdated; https://setuptools.pypa.io/en/stable/ is the authoritative documentation for setuptools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some initial research, I don't know what's the right thing to do, and it seems like there's ongoing discussion about package/distribution name normalization:
- A sub-thread in Amending PEP 427 (and PEP 625) on package normalization rules on the Packaging forum
- The issue that led to the addition of
_safe_name
: Namespace packages with dots in them fail to upload to PyPI
(cc @pfmoore @dstufft @ewdurbin as contributors to the above discussions)
As well this PR is just correct implementation of PEP 508.
It seems like _safe_name
is attempting to implement normalization defined in PEP 503.
For additional context, the issue that led to this PR states:
This behaviour is causing problems e.g. if you have defined in Artifactory mandatory distribution name prefix containing underscores. E.g.
org_name_prefix
. So if you have package namedorg_name_prefix_my_package
it is converted by twine to distribution nameorg-name-prefix-my-package
and then Artifactory refuses to publish such package.
I think the important question here is: if Twine doesn't convert underscores to hyphens, will that be compatible with PyPI?
Additionally, it makes me a little nervous that Twine has its own normalization function. Has this logic been centralized anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the important question here is: if Twine doesn't convert underscores to hyphens, will that be compatible with PyPI?
I can't answer that, I'm afraid. But I do think that PyPI and twine should agree on either allowing unmodified distribution names (as defined in the project metadata) or using the same normalisation (and IMO PEP 503 is the main contender for a "universal" normalisation rule these days). On the other hand, PyPI should have access to the un-normalised name, so it can respect the project author's wishes in the UI and similar places. I don't have context for this issue, but if the field we're discussing here is the only place that the project name is passed to PyPI, I'd say that means that it should be passed unmodified.
My impression is that PyPI has much more difficult compatibility considerations, though, so for practical reasons it might be necessary to "follow what PyPI allows" for the short term. I'd view that as a temporary measure, though.
Has this logic been centralized anywhere?
Packaging has packaging.utils.canonicalize_name
which (as far as I know) implements PEP 503 normalisation. But the PEP 503 rule is designed to be easy to copy and paste, so I'd consider replicating it to avoid a dependency on packaging
as perfectly acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I agree that Twine shouldn't normalize filenames (which it doesn't currently do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see, I missed it switched back to effectively using safe_name()
.
So, the problem is still roughly the same, just being caused by the fact that safe_name
normalizes _
to -
.
You should not have to do any normalization of the name for PyPI.
Where PyPI wants things to be normalized it normalizes it itself, where it doesn't, it doesn't.
I think there was a time when that wasn't the case, and you had to do some normalization, unfortunately #70 and the linked issue #47 don't provide enough information for me to remember specifics.
If it were me, I'd be inclined to try removing all normalization from the name
field and see if PyPI complains. I'm pretty sure it won't 1, but if it does then that would be a PyPI bug IMO.
Footnotes
-
Assuming that the unnormalized name of the project matches the filename. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree that twine should upload the file with the name as given (i.e., not normalise) I'm a bit concerned about the idea that neither twine nor the index server is validating that the files being uploaded are correct - where "correct" implies "normalised". But I guess twine can take a "trust the user" attitude, and servers might be prepared to accept unnormalised names. I'm not sure how the rest of the ecosystem would handle unnormalised wheel names (sdist names are still a bit of a mess so "legacy" forgives a lot there). I'm going to say that's not my problem though 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Difference that's splitting hairs but twine takes a trust the build-system approach followed by a trust the repository server to error with a semi-helpful error response we can provide to the user.
Genuinely, I'm not certain twine needs to exist. It provides very little benefit at this point. Most of what it does for upload can now be achieved with the standard library (historically not the case) on most (if not all of the supported versions). Most of the checks it has are simplistic and could be done by a build system to verify its output during a finalisation stage (where it would provide more value and immediate feedback to the user closer to where the issue is).
There was a dream of twine becoming a build-then-upload tool and way of hiding every build backend but neither Brian nor I have made that happen. Speaking for myself, I certainly don't have the time or desire for that. With the number of build backends, it would likely be a nightmare to paper over all their APIs (assuming they're supporting an API for us to use) and we'd have to work with whatever existing config file they're using, section, etc. Just doesn't seem worth the effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a standard API for it, so you wouldn't need to paper over anything, just use the standard APIs... that being said, there's already a project doing that (https://github.com/pypa/build) now so adding another one seems silly.
As an outsider, I think transforming (via |
I used an underscore in my package directory name (e.g. "my_package"), as well as in the "setup.py" file ( BUT the PyPI site displays the installation command improperly ( See: https://pypi.org/project/gspread-models/ So we should definitely not do the conversion. Right? |
FWIW, PyPI is now parsing names and canonicalising them internally. This means that we can safely revert #745 now and switch to using canonical names when interacting with the PyPI. It might be worth considering what the effects of doing so could be on non-PyPI indexes (eg: devpi). |
What does this latest reply mean, in simpler terms? Having a hard time understanding the message that is trying to be conveyed. |
The canonicalized name as defined in the relevant PEPs and implemented in the packaging library must match the file name if I remember correctly. The canonicalized name is based off of the name provided in the packaging metadata. For example, |
Fixes #920