Skip to content

Refactor error handling in backup upload and folder creation processes#342

Merged
egalvis27 merged 5 commits into
feat/go-fuse-daemonfrom
fix/error-429-uploading-files
May 21, 2026
Merged

Refactor error handling in backup upload and folder creation processes#342
egalvis27 merged 5 commits into
feat/go-fuse-daemonfrom
fix/error-429-uploading-files

Conversation

@egalvis27
Copy link
Copy Markdown

What is Changed / Added

Bug fix: Race condition causing FolderNotFoundError on large uploads

When uploading a folder with hundreds of files, some uploads were failing with FolderNotFoundError even though the folder clearly existed on the server. The root cause was a race condition between folder creation and the periodic remote sync.

Here's what was happening: when FUSE calls mkdir, FolderCreator creates the folder via the API and immediately adds it to the in-memory repository. However, the folder is not yet in the SQLite sync store — the incremental sync had already completed its API call before the folder was created. When the next REMOTE_CHANGES_SYNCHED event fires shortly after, FolderRepositorySynchronizer rebuilds the in-memory repo from the SQLite snapshot, notices the new folder is missing, and deletes it. Any files being uploaded into that folder at that moment then fail to find their parent.

The fix adds a second argument to FolderRepositorySynchronizer.run(): allRemoteFolderIds, the full set of folder IDs present in the SQLite store (all statuses, not just EXISTS). A folder is now only evicted from the in-memory repo if it is both absent from the EXISTS remote tree and confirmed in the SQLite store — the logic being that if a folder is not in SQLite at all, it must have been created locally and not yet picked up by the incremental sync, so it should be left alone.

updateVirtualDriveContainer was updated to fetch the EXISTS-filtered tree and the full RemoteItemsGenerator snapshot in parallel and pass the resulting ID set down to the synchronizer. RemoteItemsGenerator was also changed from .private() to public in the DI container so it can be resolved directly.


Improvement: Retry on transient errors (429 / 5xx) for folder creation

Folder creation was silently failing on any 5xx or 429 response. HttpRemoteFileSystem.persist() had a manual recursive retry only for 400s, and everything else fell through to a generic UNHANDLED error that FolderCreator would simply throw — no retry, no backoff.

To fix this, persist() now maps all HTTP error codes to typed DriveDesktopError causes (INTERNAL_SERVER_ERROR for 5xx, RATE_LIMITED for 429 with the parsed Retry-After value, etc.), aligning it with how SDKRemoteFileSystem handles errors on the file side. FolderCreator then wraps the remote call in retryWithBackoff using the shared createTransientErrorHandler, which applies exponential backoff for RATE_LIMITED and INTERNAL_SERVER_ERROR and fails immediately for anything non-recoverable like BAD_REQUEST or UNKNOWN. The old recursive attempt-based retry inside HttpRemoteFileSystem is removed entirely.

Comment thread src/backend/features/backup/upload/update-file-to-backup.ts Outdated
@@ -0,0 +1,3 @@
export const INITIAL_RATE_LIMIT_DELAY_MS = 30_000;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You moved this from a module we created to a legacy structure, why?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has already moved

Comment thread src/backend/common/rate-limit/transient-error-handler.ts Outdated
const tree = await container.get(RemoteTreeBuilder).run(user.root_folder_id, user.rootFolderId);
const [tree, allRemoteItems] = await Promise.all([
container.get(RemoteTreeBuilder).run(user.root_folder_id, user.rootFolderId),
container.get(RemoteItemsGenerator).getAll(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a promise all and not wait for the treeBuilder to finish?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to wait for the tree to be ready; both tasks can be done at the same time without interfering with each other.


stopWatching();
if (error) {
throw error;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you throw an error here but not on L45?

Comment on lines +41 to +45
try {
const uploadedContentsId = await uploader();
return { data: uploadedContentsId };
} catch (uploadError) {
return { error: mapEnvironmentUploadError(uploadError as Error & { status?: unknown }) };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the code harder to read, maybe you could create a helper function or even ensure that uploaded is not going to throw, or even if necessary wrap it in the tryCatch method utility

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reorganized the code to make it easier to read; the current function was too long.

folderId: fileFolderId,
folderUuid: folder.uuid,
});
const { data: persistedFile, error: persistedError } = await retryWithBackoff(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you wrap TemporalFileUploader but not here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The responsibilities of the original function have been broken down into smaller ones to improve readability

controller.signal,
);

if (error) throw error;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not trow an error here because is not being handled on
renameController -> rename -> handleTemporalFileUploadOnRename -> uploadTemporalFileOnRename -> TemporalFileUploader.run
I assume it should be like the same on the other callers

I woud handle it as soon as the exception happens so that we can control properly this behaviour and not let the exception without being handled, what do you think?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is auxiliary and throws an exception that is handled in the main run function, which is the one that actually uses this class. The exception is useful for stopping the watcher regardless of whether an error occurs; if we avoided throwing the exception, we would have to stop the watch when an error occurs and also at the end of the function if it is successful—both separately.

Comment thread src/context/virtual-drive/files/application/create/FileCreator.ts
FolderCreatedAt.fromString(dto.createdAt),
FolderUpdatedAt.fromString(dto.updatedAt),
);
throw new Error(`Could not create folder ${folderPath.value}: ${error.cause}`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats fine because it was there on previous version

@egalvis27 egalvis27 force-pushed the fix/error-429-uploading-files branch from d1d2674 to ec6a676 Compare May 20, 2026 14:12
@sonarqubecloud
Copy link
Copy Markdown

@egalvis27 egalvis27 merged commit f3f487f into feat/go-fuse-daemon May 21, 2026
10 of 11 checks passed
@egalvis27 egalvis27 deleted the fix/error-429-uploading-files branch May 21, 2026 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants