Skip to content

fix(api): adaptive GraphQL page size on 5xx + file changes retry#185

Merged
anderdc merged 5 commits intoentrius:testfrom
eureka928:fix/graphql-reduce-page-size-on-5xx
Feb 13, 2026
Merged

fix(api): adaptive GraphQL page size on 5xx + file changes retry#185
anderdc merged 5 commits intoentrius:testfrom
eureka928:fix/graphql-reduce-page-size-on-5xx

Conversation

@eureka928
Copy link
Contributor

Summary

  • Halve GraphQL page size on 502/503/504 errors so the retry loop self-heals instead of failing 8 times with the same oversized request
  • Add retry logic to get_pull_request_file_changes (3 attempts with exponential backoff) — previously a single transient failure silently returned an empty list, causing the PR to score 0

Problem

The GraphQL PR query (get_github_graphql_query) fetches ~20+ fields per PR including nested relations (closingIssuesReferences, reviews, repository, headRepository). At 100 PRs per page, this can exceed GitHub's server-side processing timeout for miners with many PRs, returning a 502. The existing retry loop retried 8 times at the same page size — all failing identically.

Separately, get_pull_request_file_changes had zero retry logic. A single transient 502/timeout on this REST endpoint would silently return [], causing the entire PR to score 0 with no warning.

Changes

gittensor/utils/github_api_tools.py

get_github_graphql_query (page size reduction):

  • Move variables dict inside the retry loop so limit can update between attempts
  • On 5xx status codes (502, 503, 504), halve the limit before retrying (floor of 10)
  • Log the reduced page size so operators can see the adaptive behavior

get_pull_request_file_changes (retry logic):

  • Add retry loop with 3 max attempts and exponential backoff (5s, 10s)
  • Handle both HTTP error responses and RequestException (connection errors, timeouts)
  • Follows the same pattern used by all other API functions in the module

tests/utils/test_github_api_tools.py

  • 5 new tests for page size reduction (halving, floor at 10, non-5xx no-op, 503/504 parity, small initial limit)
  • 6 new tests for file changes retry (success, 502 recovery, exhaustion, connection errors, backoff timing)

API usage impact

These changes are budget-neutral in the happy path and net-positive under failures:

Scenario Page size fix File changes retry
Normal (no errors) Zero change Zero change
Transient 5xx Saves calls (2-3 retries instead of 8 wasted) Max 2 extra REST calls per failing PR
Budget per validator per miner ~0.6% of hourly limit Unchanged
20 validators sharing a PAT ~12% total Well within 100% budget

The per-PR scoring calls (file_changes + file_contents) are the dominant API cost (~97% of calls), but at ~0.6% per validator per miner they are well within the ~5% target needed for 20-validator redundancy.

Verified

Observed in production-like run — the 502 warning fires and self-heals:

GraphQL request for PRs failed with status 502 (attempt 1/8), reducing page size to 50 and retrying in 5s...

Miner scored 1122.47 at Silver tier after recovery.

Test plan

  • pytest tests/utils/test_github_api_tools.py -v — 23 passed
  • pytest tests/ -v — 210 passed
  • Live validation with check_miner_status.py — 502 recovery confirmed

…e queries

GitHub returns 502 when a GraphQL query is too expensive to process.
Halve the page limit on 502/503/504 responses (floor of 10) before
retrying, so the retry loop can self-heal instead of failing 8 times
with the same oversized request.
This REST endpoint had no retry logic — a single transient failure
would silently return an empty list, causing the PR to score 0.
Add 3 attempts with exponential backoff (matching the pattern used
by all other API functions in the module).
@eureka928 eureka928 force-pushed the fix/graphql-reduce-page-size-on-5xx branch from 29a0e92 to 464a50f Compare February 13, 2026 00:29
Address PR entrius#185 review feedback: remove redundant `and limit > 10`
guard since `max(limit // 2, 10)` already floors at 10, and simplify
the warning log from "reducing page size to" to "page size set to".
…s retry

Address PR review feedback: change retry guard from
`attempt < (max_attempts - 1)` to `attempt < max_attempts` and move
error logging after the loop for clarity.
@eureka928
Copy link
Contributor Author

Updated the PR

@anderdc
Copy link
Collaborator

anderdc commented Feb 13, 2026

lgtm

@anderdc anderdc merged commit beae22b into entrius:test Feb 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants