Fix regex pattern for citedby_url extraction #580
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
m = re.search(r"cites=[\d+,]*", object["citedby_url"])scholarly/_scholarly.py:312: SyntaxWarning: invalid escape sequence '\d' m = re.search("cites=[\d+,]*", object["citedby_url"])Fixes: The issue discripted in: #569 (4th post)
Description
Summary: Declare the RegEx pattern as a raw string by prepending r.
In Python, non-raw string literals like
"cites=[\d+,]*"interpret\das an escape sequence (even though it eventually works in the regex engine, Python itself flags it with a SyntaxWarning).By prefixing the string with
r(i.e.,r"cites=[\d+,]*"), you instruct Python to treat the backslashes literally, which is the best practice for defining regular expressions and cleanly resolves the SyntaxWarning.Checklist
developand notmain.If you don't have a premium proxy, some of the tests will be skipped.
The tests that are run should pass without raising
MaxTriesExceededExceptionor other exceptions.