Skip to content

fix: resolve transaction monitoring race condition causing false failures#1433

Open
danwt wants to merge 1 commit intomainfrom
danwt/claude/fix-tx-monitor
Open

fix: resolve transaction monitoring race condition causing false failures#1433
danwt wants to merge 1 commit intomainfrom
danwt/claude/fix-tx-monitor

Conversation

@danwt
Copy link
Contributor

@danwt danwt commented Dec 18, 2025

Summary

  • Fixed race condition in MonitorTransaction that caused false "tx failed" reports
  • Fixed channel closure detection logic (channels don't become nil when closed)
  • Fixed resource leak from defer inside loop

Context

Fixes #1414 ("roller reports tx failure, while it actually passed")

Users were seeing "failed to register sequencer" errors even though transactions succeeded on-chain. The root cause was a race condition in the transaction monitoring logic.

Root Cause Analysis

Bug 1: Premature Error Exit

// OLD: Exit early if error count reaches threshold
if len(errors) == 2 || (strings.HasPrefix(endpoint, "ws") && len(errors) == 1) {
    return fmt.Errorf("all monitoring methods failed: %v", errors)
}

If WebSocket monitoring failed immediately (e.g., due to wrong URL scheme), we'd exit before API polling had a chance to find the transaction.

Bug 2: Incorrect Channel Closure Check

// OLD: This can NEVER be true - channels don't become nil when closed
if resultChan == nil && errChan == nil {
    break
}

Bug 3: Resource Leak

// OLD: defer inside loop accumulates closes at function exit
for {
    resp, err := client.Get(url)
    defer resp.Body.Close()  // Leaks resources!
}

Solution

  1. Track channel closure with boolean flags instead of nil checks
  2. Wait for both monitoring goroutines to complete before deciding failure
  3. Close HTTP response body immediately after reading, not via defer

Changes

  • utils/tx/tx.go: Fixed MonitorTransaction function

Test plan

  • Test sequencer registration on mainnet
  • Verify transaction monitoring correctly reports success when tx succeeds
  • Verify proper cleanup of HTTP connections during long polling

🤖 Generated with Claude Code

Fixed multiple bugs in MonitorTransaction that caused false tx failure
reports:

1. Race condition: Previously checked error count prematurely (len == 2)
   and could exit before API polling discovered the transaction. Now we
   wait for both goroutines to complete before deciding success/failure.

2. Channel logic bug: The condition `if resultChan == nil && errChan ==
   nil` could never be true because channels don't become nil when
   closed in Go. Now using boolean flags to track channel closure.

3. Resource leak: Fixed defer inside loop that accumulated closes. Now
   closing response body immediately after reading.

Fixes #1414

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@danwt danwt requested a review from a team as a code owner December 18, 2025 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roller reports tx failure, while it actually passed

1 participant