feat(relocation) Use a shared bucket for relocations by markstory · Pull Request #116108 · getsentry/sentry

markstory · 2026-05-22T23:51:49Z

We've had a number of problems with relocations timing out or consuming too much memory as export files are moved between control + cells. The RPC calls and file operations would be much simpler if there was only a single shared bucket, instead of one bucket per-cell. Futhermore, I've removed all the of the file copying which should improve performance and reliability.

We've already provisioned a new shared bucket and configuration changes to use that bucket are in getsentry/ops#20737

There is still some cleanup work to be done and I've added some TODOs for what I plan to address in the short term.

Refs INFRENG-318

There were more file moves :| Instead of copying all that data around lets put the upload/export in the right place to start with. Now that we have a shared bucket there is no reason to move files around.

With a shared bucket we don't need ot shuffle files around nearly as much. This should address both the slow RPC durations as well as the overall performance of relocations.

Transfer operations don't touch relocation blobs anymore.

The storage backend caches file contents and this cost me an hour of debugging.

linear-code · 2026-05-22T23:51:53Z

INFRENG-318

Because retries are initialized with the old relocation bucket path, we have to move the file when processing a retry. The file needs to be moved because cloudbuild tooling copies all of `runs/{uuid}/in` which doesn't cover the raw-relocation-data.tar. I thought conditionally copying the file during processing was simpler than having cloudbuild configuration always copy files twice.

markstory · 2026-05-25T18:14:54Z

+        # If the currrent UUID is not in the bucket_path, we are processing
+        # a retry and need to copy the file.
+        if uuid not in raw_relocation_file.bucket_path:
+            new_path = relocation_raw_data_path(UUID(uuid))
+            with relocation_storage.open(raw_relocation_file.bucket_path) as f:
+                contents = f.read()
+            relocation_storage.save(new_path, BytesIO(contents))
+            raw_relocation_file.bucket_path = new_path
+            raw_relocation_file.save()


I don't love this as we end up copying the file again. It did seem better to do this for retries than to always copy the relocation data file twice (once via bucket_path and once with the directory glob).

cursor · 2026-05-25T18:16:00Z

-                file=file,
                kind=RelocationFile.Kind.RAW_USER_DATA.value,
+                bucket_path=relocation_file.bucket_path,
+                file=relocation_file.file,


Retry copies null bucket path

Medium Severity

Failed-relocation retry reuses relocation_file.bucket_path on the new RelocationFile without checking it is set. Failures created before shared-bucket uploads can have a null path while data still lives on the legacy File record, so the retried run cannot open raw data from storage.

Additional Locations (1)

src/sentry/relocation/tasks/process.py#L769-L770

^{Reviewed by Cursor Bugbot for commit 93dc700. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit ba3a35b. Configure here.}

sentry · 2026-05-25T20:56:44Z

+        relocation_storage = get_relocation_storage()
+        if not relocation_storage.exists(raw_relocation_file.bucket_path):
+            retry_task()


Bug: Calling retry_task() for a transient "file not in storage" error incorrectly consumes the relocation's retry budget, leading to a permanent failure after a few attempts.
_{Severity: HIGH}

Suggested Fix

The retry_task() call should not be caught by the retry_task_or_fail_relocation context manager. The task should be retried in a way that does not consume the relocation's fatal error budget. This could be achieved by calling retry_task() outside the context manager or by modifying the context manager to ignore RetryTaskError.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/sentry/relocation/tasks/process.py#L515-L517 Potential issue: In the `uploading_complete` task, when a file is not found in storage, `retry_task()` is called to handle what is considered a transient condition. However, this call is wrapped by the `retry_task_or_fail_relocation` context manager, which catches the `RetryTaskError` and decrements a limited retry counter. Once this retry budget is exhausted, the context manager calls `fail_relocation()`, permanently marking the entire relocation as FAILED. This incorrectly treats an expected, transient error as a fatal one, leading to premature and permanent relocation failures.

sentry · 2026-05-25T20:56:44Z

+            relocation_storage = get_relocation_storage()
+            blobsize = relocation_storage.size(relocation_raw_data_path(uuid))
+            # TODO(cells) Remove this once RelocationFile.file is optional.
+            file = File.objects.create(name="stub", type=RELOCATION_FILE_TYPE, size=blobsize)

            # This write ensures that the entire chain triggered by `uploading_start` remains
            # idempotent, since only one (relocation_uuid, relocation_file_kind) pairing can exist


Bug: Catching an IntegrityError leaves the transaction in an aborted state, leading to a TransactionManagementError and an infinite retry loop that repeatedly schedules a Celery task.
_{Severity: CRITICAL}

Suggested Fix

When an IntegrityError is caught, the transaction should be explicitly rolled back to a savepoint created before the failing create() call. This will allow the transaction to be committed successfully, preventing the TransactionManagementError and the subsequent infinite retry loop. The pass statement should be replaced with logic to handle the transaction state correctly.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/sentry/relocation/services/relocation_export/impl.py#L114-L120 Potential issue: When handling an `IntegrityError` from `RelocationFile.objects.create()` for idempotency, the exception is caught, but the underlying database transaction is left in an aborted state. The code then proceeds to schedule an `uploading_complete` task via Celery. When the `atomic_transaction` block exits, Django raises a `TransactionManagementError` because the transaction was aborted. The calling function catches this error but fails to delete the `transfer` record, causing an infinite retry loop where the `uploading_complete` task is repeatedly scheduled on each attempt.

markstory added 11 commits May 22, 2026 19:43

Update relocation creation to write to the shared bucket.

c24ebe1

Changed my mind

0362992

Update retry endpoint to use direct bucket storage

dd2dc81

Make encrypted_bytes optional

be24349

Rework file paths again

17e6149

There were more file moves :| Instead of copying all that data around lets put the upload/export in the right place to start with. Now that we have a shared bucket there is no reason to move files around.

Remove logic to shuffle files around

177a6e1

With a shared bucket we don't need ot shuffle files around nearly as much. This should address both the slow RPC durations as well as the overall performance of relocations.

Remove storage bucket operations from transfer tests

021d723

Transfer operations don't touch relocation blobs anymore.

Fix more failing tests

f3cae56

Get EndToEndTest passing

ec388cf

Get UploadingStartTest passing

2975977

Fix up tests

0ececbc

The storage backend caches file contents and this cost me an hour of debugging.

markstory requested a review from a team as a code owner May 22, 2026 23:51

markstory requested a review from a team May 22, 2026 23:51

github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 22, 2026

sentry Bot reviewed May 22, 2026

View reviewed changes

Comment thread src/sentry/relocation/services/relocation_export/impl.py Outdated

cursor Bot reviewed May 22, 2026

View reviewed changes

Comment thread src/sentry/relocation/api/endpoints/retry.py

Fix mistake

42b31cf

sentry Bot reviewed May 25, 2026

View reviewed changes

Comment thread src/sentry/relocation/tasks/process.py

markstory added 3 commits May 25, 2026 14:03

Fix hardcoded value

6764a8d

Fix mypy

93dc700

markstory commented May 25, 2026

View reviewed changes

cursor Bot reviewed May 25, 2026

View reviewed changes

sentry Bot reviewed May 25, 2026

View reviewed changes

Comment thread src/sentry/relocation/tasks/process.py Outdated

Address feedback from robots

ba3a35b

cursor Bot reviewed May 25, 2026

View reviewed changes

Comment thread src/sentry/relocation/tasks/process.py Outdated

markstory added 2 commits May 25, 2026 16:44

Fix more failing tests

8234909

Merge branch 'master' into feat-relocation-no-blob-move

144e5e3

vercel Bot deployed to Preview May 25, 2026 20:50 View deployment

sentry Bot reviewed May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(relocation) Use a shared bucket for relocations#116108

feat(relocation) Use a shared bucket for relocations#116108
markstory wants to merge 18 commits into
masterfrom
feat-relocation-no-blob-move

markstory commented May 22, 2026

Uh oh!

linear-code Bot commented May 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markstory May 25, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot May 25, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

sentry Bot May 25, 2026

Uh oh!

sentry Bot May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

markstory commented May 22, 2026

Uh oh!

linear-code Bot commented May 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markstory May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor Bot May 25, 2026

Choose a reason for hiding this comment

Retry copies null bucket path

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sentry Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

sentry Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant