-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: don't encode ':' or '/' as part of the canonical representation #161
base: master
Are you sure you want to change the base?
Conversation
Note that this matches the current test suite data for But https://github.com/package-url/purl-spec/blob/8040ff0be50f0c5b1986b1a0947bd539f5405fc4/test-suite-data.json#L88-L89 |
The line 112-113 where you say the |
Let me run the full test suite again without the |
OK, the first line has is for |
Checking the latest test suite file, if you don't encode the colon, if fails. The file seems to be inconsistent. It's encoded in one spot and not in the other.
|
I was talking about the one you mentioned above - there doesn't seem to be any encoding. "purl": "pkg:Maven/org.apache.xmlgraphics/[email protected]?classifier=sources&repositorY_url=repo.spring.io/release",
"canonical_purl": "pkg:maven/org.apache.xmlgraphics/[email protected]?classifier=sources&repository_url=repo.spring.io/release", |
There is no encoding in the input. However, the output without the patch is |
And I found the inconsistency in the test file: the colon is encoded when it's in the version (sha256%3A). It's not encoded when it's in a key value (repository_url=https:). This is not right, is it? I am trying to verify if the problem is the code or the test file. |
Sorry, I flipped the examples! L112-113 is for |
@@ -460,7 +460,7 @@ private static String uriEncode(String source, Charset charset) { | |||
} | |||
|
|||
private static boolean isUnreserved(int c) { | |||
return (isAlpha(c) || isDigit(c) || '-' == c || '.' == c || '_' == c || '~' == c); | |||
return (isAlpha(c) || isDigit(c) || '-' == c || '.' == c || '_' == c || '~' == c || ':' == c || '/' == c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be rather a different set for each PURL component:
namespace
does not need to encode neither:
nor/
.name
does not need to encode:
, but it needs to encode/
, which is used to separate it fromnamespace
.qualifiers
, probably does not need to encode:
,/
or?
.subpath
can probably add&
and=
to the mix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a param to the percentEncode
to take a String
of characters to allow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really, there are a lot more characters potentially allowed by the RFC than we list here.
I am of the opinion right now to add just enough in order to pass the test suite.
The '?'
is not allowed unencoded per the purl spec itself.
So, with 8ab171f, this now fixes #122, but not #92. The test suite has Should we just change the test suite back to how to was and assume that there is a mistake in the tests and that the current |
The exact definition of Maybe this PR is precocious and we should wait for the spec to be finalized. |
The author of the Rust version has proposed the set of characters to use https://github.com/phylum-dev/purl/blob/151168733f75a9802556e4b07eb577b9d99f7cea/purl/src/format.rs#L9-L27 I could take them from here. |
That code is based on the WHATWG URL standard, while, unless I am mistaken, PURL will be based on RFC 3986. The living standard seems more liberal, so I would base the code on the more conservative RFC 3986. In particular:
|
I will revisit this later. It may need to be combined with #174. |
e42515b
to
447ba2a
Compare
Now that Spotless is there, there will be some conflicts. This should help:
|
OK, it does apply on build, but if you have to fix the conflicts first, then it doesn't really help. |
Now you just need to run |
7803217
to
b6482d2
Compare
You can:
|
This makes the Java canonical representation match the majority of other implementations. Fixes package-url#122 Fixes package-url#92
This makes the Java canonical representation match the majority of other implementations.
Fixes #122
Fixes #92