Skip to content

Per-drive sync stays in Aborted after a single 401 on /me/drives, even when subsequent requests succeed #900

@bernardgut

Description

@bernardgut

Steps to reproduce

  1. Sign in to a desktop client and let one or more drives sync to "Up to date".
  2. Restart the IdP while the client is running (e.g. a Keycloak pod restart, or a database failover that cascades into Keycloak). The restart is brief; on our cluster the rollout takes about 35 seconds.
  3. One of the periodic /graph/v1.0/me/drives polls returns 401 because the session was mid-refresh when the IdP was unavailable.
  4. Wait for the IdP to recover. Subsequent /me/drives polls return 200 (verified server-side, see log evidence below).

Expected behavior

  • Per-drive sync resumes automatically once 200s come back, optionally with a short "retrying" indicator while the drive re-discovers.

Actual behavior

  • The Personal drive (and any other drive polled during the 401 window) shows "Aborted due to: This space is currently unavailable" indefinitely.
  • The user has to manually Reconnect (gear menu) or Log out / Log in to clear the wedge. Files are not lost since they live on the server, but sync does not resume on its own.
  • Roughly 95% of subsequent calls succeed server-side, the client just does not re-attempt the affected drive.

Setup

  • Operating system: Linux Mint (separate user reports indicate the same on macOS 3.0.x)
  • Client version: OpenCloud Desktop 3.0.3.2073 (AppImage on Linux)
  • Server: OpenCloud with a custom storage driver. The bug is not storage-driver-specific, it is an OIDC-discovery state-machine issue, so any deployment that ever restarts its IdP during an active session should reproduce it.
  • Log: anonymised server-side log evidence is below. A client-side log can be provided on request.

Server-side log evidence (anonymised)

07:35:54  GET /graph/v1.0/me/drives -> 401 (token expired during IdP restart)
07:35:55  GET /graph/v1.0/me/drives -> 200 (1 second later, same client)
07:36:24  GET /graph/v1.0/me/drives -> 200
07:37:24  GET /graph/v1.0/me/drives -> 200
... continues 200 for hours ...

Throughout this window the client UI continues to show the Personal drive as Aborted. The 401 is correct, spec-compliant behaviour from the IdP. The bug is that the desktop client treats it as a terminal state for the affected drive instead of retrying discovery on a backoff.

Suggested fix

When a drive is marked Aborted because of an auth failure on /me/drives, re-attempt discovery on a backoff schedule (for example 30 s, 1 min, 2 min, 5 min, 10 min) for as long as the account itself is still authenticated. Surface a "retrying" state in the UI rather than a terminal failure. If the account itself is de-authenticated, the existing "Log in" prompt is the correct fallback.

This is architecturally adjacent to, but distinct from, the proactive token-refresh request in #763 / opencloud-eu/opencloud#919. Proactively refreshing before expiry would prevent the 401 from happening in the first place. This issue is about graceful recovery once a 401 does happen, which it will, for example whenever the IdP restarts during an active session.

Related observations

Validated 2026-05-10 against v3.0.3.2073 on Linux Mint. All three live in the per-folder / per-drive status pipeline and were surfaced by the same Keycloak-restart exercise. They may share a root cause, or at least share a triage owner.

1. No Reconnect action in the gear menu on Linux

The gear menu next to the account only offers Log out. Operator runbooks and community guides that say "click Reconnect" assume a macOS/Windows-only UI. Either Linux is missing the action or the cross-platform documentation is wrong.

2. Slow auto-recovery from "Queued"

Concrete repro:

  1. Sign in, drop a 100 KB file into a synced folder, wait for the green check.
  2. Trigger an IdP rollout while a new file is being uploaded ("0 seconds left, file 0 of 1" or actively transferring).
  3. The client transitions to Queued with the loading animation during the rollout (about 35 s in our measurement).
  4. IdP returns 200 on /me/drives within roughly 1 second of being back up. Verified server-side.
  5. The client stays in Queued for more than 10 minutes despite no further 4xx/5xx from the server. Only Log out / Log in clears it (Reconnect is not available on Linux, see observation 1).

This appears to be the same symptom as the now-closed #661 ("Sync gets stuck in 'Queued' (v3.0.0-beta.5)"). #661 was auto-closed by the stale-bot without a fix, and the bug still reproduces on 3.0.3.2073.

3. "0 seconds left, file 0 of 1" progress widget never dismisses

Every small-file upload reproduces it. Drop a file into a synced folder, the file uploads successfully (verified in the web UI), and the per-folder progress widget continues to show "0 seconds left, file 0 of 1" indefinitely. It survives until a state-refresh event (IdP restart, Log out / Log in). Probably unrelated to the Aborted bug above, but worth bundling for the same release given the shared per-folder status pipeline.

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions