Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make all the invalid urls valid #4

Merged
merged 1 commit into from
Apr 7, 2018
Merged

Make all the invalid urls valid #4

merged 1 commit into from
Apr 7, 2018

Conversation

sks444
Copy link
Member

@sks444 sks444 commented Mar 19, 2018

This makes all the invalid urls
like without '.git' and others valid
with true results.

Closes retr0h/git-url-parse#2 retr0h/git-url-parse#3

@sks444
Copy link
Member Author

sks444 commented Mar 19, 2018

This does not affect any previous results, it is just like an enhancement to the project.

d['pathname'] = self._get_path()
d['protocol'] = m3.group('protocol')
d['user'] = m3.group('user')
d['resource'] = m3.group('site') + '.com'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem ideal. The domain '.com' is hard coded here. It wont match any other domain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually all the valid and invalid urls were for the .com so I hard coded this for .com

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub and GitLab can be self-hosted, and may not be on .com, and there are other git hosting software.

Copy link
Member Author

@sks444 sks444 Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have changed the regex for all the hosts now. :)

'((?P<site>[\w\.\-]+)\.com)'
'(:(?P<port>\d+))?'
'(\/(?P<owner>\w+))?'
'(\/(?P<repo>[\w\-]+)(\.git)?)?')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the regexp captures getting named, but not used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, can you explain a bit more? I am not getting the point.

'(\/(?P<owner>\w+))?'
'(\/(?P<repo>[\w\-]+)(\.git)?)?')

m3 = re.search(regexp, self._url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably solve retr0h/git-url-parse#5 first, so adding more is not more expensive unnecessarily

Copy link
Collaborator

@retr0h retr0h Mar 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using named captures?

regexp = (r'(git\+)?'
+                  '((?P<protocol>\w+)://)'
+                  '((?P<user>\w+)@)?'
+                  '((?P<site>[\w\.\-]+))'
+                  '(:(?P<port>\d+))?'
+                  '(\/(?P<owner>\w+))?'
+                  '(\/(?P<repo>[\w\-]+)(\.git)?)?')

For example <protcol>, <user>, <site>, etc... These named captures are even used.

retr0h
retr0h previously requested changes Mar 22, 2018
Copy link
Collaborator

@retr0h retr0h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR doesn't really address the real issue. It makes the unit tests pass but the unit tests are not necessarily valid since they are hard coded to .com domains.

@sks444
Copy link
Member Author

sks444 commented Mar 22, 2018

I have removed the .com from my regex; these regexes are valid for all the domains now. :)

'((?P<resource>[\w\.\-]+))'
'[\:\/]{1,2}'
'(?P<pathname>((?P<owner>\w+)/)?'
'((?P<name>[\w\-]+)\.git))')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a trailing , so the next regex added doesnt need to modify this line.

test/conftest.py Outdated
'resource': 'example.in',
'user': None,
'port': None,
'name': None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong. the repo must have a name. the https git access will expand this to be owner.git . A repo doesnt need to have an owner; owners are a conceptual addition provided by git hosters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayvdb, I was thinking about this url https://github.com/coala/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/jayvdb is an invalid github remote url.
https://gitlab.com/jayvdb is an invalid gitlab remote url.
etc.

that is because we know those sites host repos under user/org/group prefix.

but the git world is not only github & gitlab. While a library may give special treatment to very common cases, it should not do so at the expense of other git hosting.

https://example.com/jayvdb is assumed to be a git remote url for a repo called jayvdb.git hosted by example.com. that is the only possible way to parse it, and not parsing it that way means that valid URL would be rejected, which would be a bug.

test/conftest.py Outdated
'pathname': '/repo',
'protocols': ['https'],
'protocol': 'https',
'href': 'https://example.com/owner',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is happening here?? how can this be .../owner. Something must be wrong with the test suite to allow this value here??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still hasnt been addressed. Bypassing this question doesnt fix it.

test/conftest.py Outdated
'pathname': '/repo',
'protocols': ['https'],
'protocol': 'https',
'href': 'https://example.in/owner',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, ../owner where does it come from? is it not validated?

@sks444
Copy link
Member Author

sks444 commented Apr 2, 2018

@retr0h, Does it looks good?

Copy link
Member

@jayvdb jayvdb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is something wrong with the test suite, and that needs to be fixed.

@jayvdb
Copy link
Member

jayvdb commented Apr 6, 2018

Oh it seems there is no CI, which is why it looked like the test suite was acting strangely.

@sks444 sks444 mentioned this pull request Apr 6, 2018
@retr0h retr0h dismissed their stale review April 6, 2018 18:19

No longer hard coding .com

@sks444
Copy link
Member Author

sks444 commented Apr 6, 2018

@retr0h, can you please take a look why the build is failing?

@retr0h
Copy link
Collaborator

retr0h commented Apr 6, 2018

@retr0h, can you please take a look why the build is failing?

It's failing flake8. I suggest reading the docs on how to test this project.

@sks444
Copy link
Member Author

sks444 commented Apr 6, 2018

@retr0h, thanks; tests are passing now ;)

@retr0h
Copy link
Collaborator

retr0h commented Apr 7, 2018

nice work all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants