Skip to content

fix(gitlab): prevent duplicate webhook creation via lock+double-check#117347

Draft
wedamija wants to merge 2 commits into
masterfrom
fix/gitlab-duplicate-webhook-race
Draft

fix(gitlab): prevent duplicate webhook creation via lock+double-check#117347
wedamija wants to merge 2 commits into
masterfrom
fix/gitlab-duplicate-webhook-race

Conversation

@wedamija

Copy link
Copy Markdown
Member

Problem

When a manual "Sync now" and the autosync beat (scm_repo_sync_beat, fires every 1 min) both fire for the same org integration before create_repos_batch has written the new repo to DB, both tasks compute the repo as new and each dispatches a create_repos_batch. Both tasks then call on_create_repository concurrently. The in-memory guard (repo.config.get('webhook_id')) uses the stale object passed by the caller — neither task has saved webhook_id yet — so both proceed to call create_project_webhook, producing a duplicate webhook in GitLab.

The existing lock on sync_repos_for_org serialises the diff+dispatch phase correctly, but create_repos_batch runs async after the lock is released, so a subsequent beat tick firing in that window still sees the repo as new.

Fix

In on_create_repository, acquire a per-(integration_id, project_id) distributed lock, then re-read the repository from DB inside the lock (double-checked locking). The losing concurrent caller finds webhook_id already present in the fresh read and returns without touching GitLab.

blocking_acquire(initial_delay=1, timeout=30) lets the loser wait briefly rather than fail immediately. UnableToAcquireLock propagates so the Celery task (retry=Retry(times=3, delay=120)) can recover.

What was verified

  • Existing test_create_repository and test_create_repository_verify_payload still cover the normal happy path (unchanged).
  • New test test_on_create_repository_skips_if_webhook_id_already_in_db: stale in-memory object with no webhook_id but DB row has it → no GitLab API call.
  • New test test_on_create_repository_propagates_lock_timeout: UnableToAcquireLock from lock propagates so Celery can retry.

What remains unverified

Full test suite not run (sandbox limitation). The two new tests exercise the core correctness invariants.


View Session in Sentry

@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant