Skip to content

fix(CPL-290): collapse double-credit race in confirm_payment_and_credit#332

Open
GTC6244 wants to merge 1 commit intonextfrom
feature/cpl-290-crit-stripe-confirm_payment_and_credit-double-credit-race
Open

fix(CPL-290): collapse double-credit race in confirm_payment_and_credit#332
GTC6244 wants to merge 1 commit intonextfrom
feature/cpl-290-crit-stripe-confirm_payment_and_credit-double-credit-race

Conversation

@GTC6244
Copy link
Copy Markdown
Contributor

@GTC6244 GTC6244 commented May 5, 2026

Summary

CPL-290confirm_payment_and_credit performed an unguarded check-then-act on metadata.credited and posted to customers/{cid}/balance_transactions with no Idempotency-Key. Concurrent POSTs to /billing/confirm_payment with the same payment_intent_id all observed metadata.credited != "true" and each created a distinct credit ledger entry, multiplying the credit by the racing concurrency. For a $100 PI at N=10 racers, an attacker netted $900 of free credit per fresh PI.

Two layers of defence, defense-in-depth:

  1. Per-PaymentIntent tokio::sync::Mutex — stored in a moka cache (Cache<String, Arc<Mutex<()>>>, 120s time_to_idle) on StripeState, acquired at function entry. Concurrent racers serialize locally; the second to enter the critical section reads the freshly-set metadata.credited == "true" from the GET and bails with "already credited."

  2. Deterministic Idempotency-Keys on both Stripe POSTs:

    • credit-mark:{payment_intent_id} for the PaymentIntent metadata update
    • credit-tx:{payment_intent_id} for the customers/{cid}/balance_transactions create

    Distinct keys per endpoint avoid Stripe's "same key, different request fingerprint" error. The 24h idempotency window collapses duplicates server-side even if the local mutex is bypassed (cache eviction, multi-process deploys, restart mid-request).

The stripe_post_with_idempotency helper already existed at stripe.rs:176 (used by the per-charge ledger path) — this PR just wires confirm_payment_and_credit's two POSTs through it.

Pre-Landing Review

Self-review focused on the new locking + idempotency layer. No issues found.

  • Lock cache eviction: TTL is 120s time_to_idle, well above the typical confirm-payment latency. Even on eviction, idempotency keys are the authoritative defence — Stripe collapses duplicates regardless of local lock state.
  • get_with vs try_get_with: Used get_with for the lock cache (returns Arc<Mutex>, no fallible result to cache). The mutex is released as soon as the function exits via the _guard RAII binding.
  • Idempotency-key collision: Distinct keys per endpoint (credit-mark: vs credit-tx:) — safe because Stripe scopes idempotency keys per (account, key) and stores the request fingerprint. Reusing one key across two different paths would error on the second call.
  • Replay-guard ordering: Lock → GET → status/credited/customer checks → metadata mark → ledger entry. The metadata mark still happens before the credit creation, preserving the existing crash-safety property (a crash between mark and ledger creates a stuck PI but never a double-credit).
  • Stripe customer lookup inside the guard: get_customer_by_wallet is called while holding the lock. The customer cache is in-process and bounded; under contention the second racer hits a cache hit. No deadlock risk (no nested same-key locks).

Test Coverage

No new automated coverage. The existing test pattern in stripe.rs is unit tests for pure helpers (parse_stripe_response, cents_to_display, cache_key, should_update_balance_cache, unix_to_utc_date, aggregate_report_rows) — there is no HTTP mock infrastructure for stripe_post/stripe_get, so neither confirm_payment_and_credit nor charge have direct unit tests today. Adding one for this fix would require introducing a mock layer beyond the scope of a CRIT race-condition patch.

The fix is amenable to manual verification via the exploit POC in the Linear ticket against a test PaymentIntent — concurrent N×curl will now produce one ledger entry instead of N.

Verification

  • cargo build — clean
  • cargo test --lib (lit-api-server) — 225 passed, 0 failed (23 stripe tests included)
  • cargo clippy --lib --no-deps — clean, no warnings
  • cargo fmt --all -- --check — clean

No rocket route changes; k6 client regen not required.

TODOS

No TODOS.md items completed in this PR.

Test plan

  • cargo build clean
  • cargo test --lib — 225/225 passing
  • cargo clippy --lib --no-deps — clean
  • cargo fmt --all -- --check — clean
  • Manual: replay the exploit POC (for i in 1..5; do curl -X POST .../billing/confirm_payment -d '{"payment_intent_id":"pi_xxx"}' & done; wait) on a test PI and verify exactly one credit ledger entry is created in the Stripe dashboard (was N, should be 1).
  • Manual: confirm legitimate first-time confirm still credits successfully and the metadata.credited flag flips to "true".

🤖 Generated with Claude Code

Concurrent POSTs to /billing/confirm_payment with the same payment_intent_id
all observed metadata.credited != "true" and each created a distinct credit
ledger entry, multiplying the credit by the racing concurrency.

Two layers of defence:
- Per-PaymentIntent tokio Mutex via a moka cache serializes racers locally so
  the second to enter sees metadata.credited == "true" and bails.
- Both Stripe POSTs now use deterministic Idempotency-Keys
  (credit-mark:{pi}, credit-tx:{pi}). Distinct keys per endpoint avoid
  Stripe's same-key/different-fingerprint error; the 24h window collapses
  duplicates server-side even if the local lock is bypassed (eviction,
  multi-process).

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@GTC6244 GTC6244 requested review from a team and Copilot May 5, 2026 23:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens Stripe top-up confirmation in lit-api-server by preventing duplicate credits when multiple /billing/confirm_payment requests race on the same PaymentIntent.

Changes:

  • Added a per-PaymentIntent in-process mutex cache to serialize concurrent confirm_payment_and_credit calls.
  • Switched the PaymentIntent metadata update and Stripe balance-transaction creation to deterministic idempotent POSTs.
  • Updated inline documentation to describe the new replay-protection flow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +588 to +594
let lock = state
.confirm_payment_locks
.get_with(payment_intent_id.to_string(), async {
Arc::new(tokio::sync::Mutex::new(()))
})
.await;
let _guard = lock.lock().await;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants