Skip to content

Conversation

@ppolewicz
Copy link
Collaborator

When a server returns a 5xx error (e.g., 503 Service Unavailable) during file upload, the SDK now properly refreshes upload tokens instead of retrying with stale cached upload URLs a bunch of times first.

Previously, the HTTP layer would catch ServiceError and retry using the same pre-fetched upload URLs from the token pool. This caused repeated failures because the upload URLs may no longer be valid after a 5xx error and it delayed the recovery.

The fix introduces ServiceErrorDuringUpload exception that disables HTTP-level retries (similar to B2RequestTimeoutDuringUpload), forcing the error to propagate to the upload manager. The upload manager then clears cached tokens via clear_bucket_upload_data() and requests fresh upload URLs for retry attempts.

Changes:

  • Added ServiceErrorDuringUpload exception class
  • Convert ServiceError to ServiceErrorDuringUpload in b2http.py
  • Fixed RawSimulator to generate unique upload tokens per URL
  • Added comprehensive test for token refresh on 503 errors

Fixes Backblaze/B2_Command_Line_Tool#1118

When a server returns a 5xx error (e.g., 503 Service Unavailable) during
file upload, the SDK now properly refreshes upload tokens instead of
retrying with stale cached upload URLs a bunch of times first.

Previously, the HTTP layer would catch ServiceError and retry using the
same pre-fetched upload URLs from the token pool. This caused repeated
failures because the upload URLs may no longer be valid after a 5xx error
and it delayed the recovery.

The fix introduces ServiceErrorDuringUpload exception that disables
HTTP-level retries (similar to B2RequestTimeoutDuringUpload), forcing
the error to propagate to the upload manager. The upload manager then
clears cached tokens via clear_bucket_upload_data() and requests fresh
upload URLs for retry attempts.

Changes:
- Added ServiceErrorDuringUpload exception class
- Convert ServiceError to ServiceErrorDuringUpload in b2http.py
- Fixed RawSimulator to generate unique upload tokens per URL
- Added comprehensive test for token refresh on 503 errors

Fixes Backblaze/B2_Command_Line_Tool#1118
@ppolewicz ppolewicz added the bug Something isn't working label Oct 12, 2025
# Convert ServiceError to ServiceErrorDuringUpload for upload operations.
# This disables HTTP-level retries and forces the upload manager to clear
# cached upload tokens and request fresh ones.
raise ServiceErrorDuringUpload(str(e))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method post_content_return_json is being called by _post_json_return_json, and the latter is being used by almost all post requests.

I am afraid this will break retries for all POST requests, not just ones related to uploads.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would be terrible. Please make sure we have tests for this (I assumed that we do and that the fact that the tests pass after the fix means it's ok, but maybe not?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've redesigned b2http tests in another PR and checked the logic here once again.
The code doesn't actually break the retrying part as per my concern above, but, unfortunately, it doesn't quite work either.
The problem is that the retrying logic happens in the _translate_and_retry block, and the exception is being re-raised later in the post_content_return_json block, once all the retries are finished. That also explains why the retrying part didn't break per se.
So we need a different approach, perhaps with introduction of a new keyword argument.
The test here probably also needs to be rewritten (or modified), cause it didn't quite reveal the problem - the refresh happened after losing time retrying with the old token.
Will see what I can do here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

B2 CLI Hanging and Not Completing

3 participants