verify_url Hotfix#20
Conversation
…tain proc spec mid-url
…tain proc spec mid-url
Pull Request Test Coverage Report for Build 15618963613Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
kelockhart
left a comment
There was a problem hiding this comment.
I think there's an issue with the regex here.
| #This block exists because of an incompatibility between urlparse.urljoin and urllib.parse.urljoin | ||
| #This incompatibility results in http(s):// -> http(s):/ if the proc spec occurs in the middle of the url. | ||
| try: | ||
| resolver_check = verify_url_regex.match(path) |
There was a problem hiding this comment.
Will the bibcode+verify_url:https://path... always be at the beginning of the path?
There was a problem hiding this comment.
For the particular issue we are handling, yes. We will need to adapt this when we move away from bibcodes, but I would hope it would just be mimicking the fix in resolver-gateway
| try: | ||
| resolver_check = verify_url_regex.match(path) | ||
| if resolver_check: | ||
| resolver_groups = resolver_check.groups() |
There was a problem hiding this comment.
Pretty sure this will only ever return one item in the tuple, since there's only one capturing group in your regex
There was a problem hiding this comment.
Yeah... not sure what happened but an older version of the regex was pushed with the newer normalizing code. Should be fixed now.
There was a subtle change in
urljoinbetweenpython 2andpython 3when proc specs are mid-url that resulted in urls that used to be joined to formnow being interpreted as
To remedy this, we have added a regex that catches those specific urls and handles them differently.