-
-
Notifications
You must be signed in to change notification settings - Fork 490
Validate HTTP response codes across the extractor #1322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Validate HTTP response codes across the extractor #1322
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thaank you. Changes LGTM, but conflicts need to be resolved first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this can be merged in the current state:
- There are Todo comments
- The changes in the test will be irrelevant after #1332
- Response codes that are ok are 2XX response codes
- I'm really not sure about another HttpUtils method and a new Exception class
I think the most simple approach would be to have a boolean Response#isOk()
method and then use this method to control the underlying use cases accordingly.
The comment was kind of intentionally left in there in the hopes that whoever reviewed it would provide an explanation and then I would remove it.
Not exactly sure what you mean by this? Do you mean that 2xx is the only "success" code range all network calls throughout the codebase?
Do you mean another as in having two And does this also mean we shouldn't have a My idea was to have a generic method we can use to check for expected response codes instead of just checking it's 2xx or not 2xx, because I was of the assumption that:
The end goal is that if a network call ever fails, we want to know why, using code that is:
If there's only one place in code where non-2xx codes shouldn't be considered an error then we could handle that case explicitly at that call site, but if it's at least 3 times then I think we should have more than a simple Since I am almost certain 1. is true, I am more than happy to trawl through the entire codebase for every http call we make (not jusr for I could update this PR with these changes and/or split it into multiple PRs if it is too big. Thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! We do need to check response codes more often. For example, PeerTube sometimes throws 429 Too Many Requests (TeamNewPipe/NewPipe#2979), but that shouldn't be treated as a recaptcha by the Downloader
(while that's what happens now). It should be the PeerTube service's job to throw a normal ExtractorException for 429s, and instead the YouTube service's job to throw a RecaptchaException (along with a proper URL that users can use to solve the captcha, instead of having the Downloader
guess that URL).
It would be good if you could take the above paragraph into consideration for this PR, and properly validate the HTTP response codes of all requests made by all services in the extractor, so we don't have to change the design once again later.
extractor/src/main/java/org/schabi/newpipe/extractor/utils/HttpUtils.java
Outdated
Show resolved
Hide resolved
...java/org/schabi/newpipe/extractor/services/soundcloud/SoundcloudChannelTabExtractorTest.java
Show resolved
Hide resolved
extractor/src/main/java/org/schabi/newpipe/extractor/exceptions/HttpResponseException.java
Outdated
Show resolved
Hide resolved
|
||
return null; | ||
} | ||
// CHECKSTYLE:OFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why checkstyle off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, checkstyle was complaining about line being too long in the comment below, so I had to disable it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really important, but why don't you make the lines shorter then? Here you don't have long URLs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
Make HttpResponseException extend ExtractionException Remove HttpUtils
5a3f458
to
ae4e8fc
Compare
* or when {@code validResponseCodes} is empty and the code is a 4xx or 5xx error. | ||
*/ | ||
// CHECKSTYLE:ON | ||
public static void validateResponseCode(final Response response, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not inline this directly within the non-static Response::validateResponseCode
? This is basically a method on the Response
class since it takes Response
as the first argument, so you might as well make it non-static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was to allow for response.validateResponseCode
and validateResponseCode(response)
depending on the format of the call site.
But in hindsight, I now recall that in C# (my main language) there is EnsureSuccessStatusCode
which can be used on the response itself, rather than a separate method.
So yeah I think you're right, validation can just be done inline like how you have mentioned in your other comment
* or when {@code validResponseCodes} is empty and the code is a 4xx or 5xx error. | ||
*/ | ||
// CHECKSTYLE:ON | ||
public Response validateResponseCode(final int... validResponseCodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to support the needs of other services (like youtube), I'd add the following methods:
public Response throwIfResponseCode(int errorCode, final Function<Response, HttpResponseException> errorSupplier);
public Response throwIfResponseCodeInRange(final int errorCodeMin, final int errorCodeMax, final Function<Response, HttpResponseException> errorSupplier);
I'd also rename this method to throwIfInvalidResponseCode()
Then e.g. for YouTube we'd be able to do:
response
// "https://www.youtube.com" is the URL to solve the captcha
.throwIfResponseCode(429, r -> new ReCaptchaException(r, "https://www.youtube.com"))
.throwIfInvalidResponseCode()
While in PeerTube
response
.throwIfResponseCode(429, r -> new TooManyRequestsException(r))
.throwIfInvalidResponseCode()
And we'd remove the ReCaptchaException
checks in the Downloader
. Also we'd have ReCaptchaException
and TooManyRequestsException
extend HttpResponseException
.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this.
However, I think throwIfInvalidResponseCode
should be called ensureValidResponseCode
or ensureSuccessResponseCode
if it's going to be the same as validateResponseCode
which takes in a list of valid response codes. Consider these:
throwIfInvalidResponseCode(200, 201)
ensureValidResponseCode(200, 201)
Reading the second, it is easier to understand that 200 and 201 are the valid response codes, but one could presume with the first that 200 and 201 are invalid response codes and it should throw if one of those are returned.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensureValidResponseCode
is a better name indeed
.../src/main/java/org/schabi/newpipe/extractor/services/soundcloud/SoundcloudParsingHelper.java
Show resolved
Hide resolved
@Stypox Some problems. In Now, Semantically, only Now, what exactly is the difference between an
If we add http response validation throughout the extractor, should the How exactly do we want to do this, so that we can have correct semantic exceptions? |
Validate HTTP response codes for network requests throughout the extractor
EDIT: Repurposing this PR to be the first of a series of PRs that will refactor the extractor codebase to add HTTP response checks for all network requests to improve logging and error handling/reporting.
Almsot all code that uses the Downloader does not do any checking on the response code: therefore errors and exceptions thrown due to error response code give false information (e.g. tests failing due to 404 were giving error about
clientId
when the error was 404).Based on discussions you can read below, I have initially added some methods in Response.java for validating response codes and used it in places for soundcloud extraction. It will throw exception with information about the error code, which will propagate upwards to the call site which will print a more useful stack trace for debugging.
As we discuss more on how we want to design and refactor this, further changes will be made in other places