-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make all the invalid urls valid #4
Conversation
This does not affect any previous results, it is just like an enhancement to the project. |
giturlparse/parser.py
Outdated
d['pathname'] = self._get_path() | ||
d['protocol'] = m3.group('protocol') | ||
d['user'] = m3.group('user') | ||
d['resource'] = m3.group('site') + '.com' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem ideal. The domain '.com' is hard coded here. It wont match any other domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, actually all the valid and invalid urls were for the .com
so I hard coded this for .com
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub and GitLab can be self-hosted, and may not be on .com
, and there are other git hosting software.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have changed the regex for all the hosts now. :)
giturlparse/parser.py
Outdated
'((?P<site>[\w\.\-]+)\.com)' | ||
'(:(?P<port>\d+))?' | ||
'(\/(?P<owner>\w+))?' | ||
'(\/(?P<repo>[\w\-]+)(\.git)?)?') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the regexp captures getting named, but not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, can you explain a bit more? I am not getting the point.
giturlparse/parser.py
Outdated
'(\/(?P<owner>\w+))?' | ||
'(\/(?P<repo>[\w\-]+)(\.git)?)?') | ||
|
||
m3 = re.search(regexp, self._url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably solve retr0h/git-url-parse#5 first, so adding more is not more expensive unnecessarily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using named captures?
regexp = (r'(git\+)?'
+ '((?P<protocol>\w+)://)'
+ '((?P<user>\w+)@)?'
+ '((?P<site>[\w\.\-]+))'
+ '(:(?P<port>\d+))?'
+ '(\/(?P<owner>\w+))?'
+ '(\/(?P<repo>[\w\-]+)(\.git)?)?')
For example <protcol>
, <user>
, <site>
, etc... These named captures are even used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR doesn't really address the real issue. It makes the unit tests pass but the unit tests are not necessarily valid since they are hard coded to .com domains.
I have removed the |
giturlparse/parser.py
Outdated
'((?P<resource>[\w\.\-]+))' | ||
'[\:\/]{1,2}' | ||
'(?P<pathname>((?P<owner>\w+)/)?' | ||
'((?P<name>[\w\-]+)\.git))') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a trailing ,
so the next regex added doesnt need to modify this line.
test/conftest.py
Outdated
'resource': 'example.in', | ||
'user': None, | ||
'port': None, | ||
'name': None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is wrong. the repo must have a name. the https git access will expand this to be owner.git
. A repo doesnt need to have an owner; owners are a conceptual addition provided by git hosters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jayvdb, I was thinking about this url https://github.com/coala/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/jayvdb
is an invalid github remote url.
https://gitlab.com/jayvdb
is an invalid gitlab remote url.
etc.
that is because we know those sites host repos under user/org/group prefix.
but the git world is not only github & gitlab. While a library may give special treatment to very common cases, it should not do so at the expense of other git hosting.
https://example.com/jayvdb
is assumed to be a git remote url for a repo called jayvdb.git
hosted by example.com
. that is the only possible way to parse it, and not parsing it that way means that valid URL would be rejected, which would be a bug.
test/conftest.py
Outdated
'pathname': '/repo', | ||
'protocols': ['https'], | ||
'protocol': 'https', | ||
'href': 'https://example.com/owner', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is happening here?? how can this be .../owner
. Something must be wrong with the test suite to allow this value here??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still hasnt been addressed. Bypassing this question doesnt fix it.
test/conftest.py
Outdated
'pathname': '/repo', | ||
'protocols': ['https'], | ||
'protocol': 'https', | ||
'href': 'https://example.in/owner', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, ../owner
where does it come from? is it not validated?
@retr0h, Does it looks good? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is something wrong with the test suite, and that needs to be fixed.
Oh it seems there is no CI, which is why it looked like the test suite was acting strangely. |
@retr0h, can you please take a look why the build is failing? |
It's failing flake8. I suggest reading the docs on how to test this project. |
@retr0h, thanks; tests are passing now ;) |
nice work all |
This makes all the invalid urls
like without '.git' and others valid
with true results.
Closes retr0h/git-url-parse#2 retr0h/git-url-parse#3