-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Problem
Antenna has no API rate limiting. This creates two risks:
-
Worker polling storms — Multiple ADC workers polling
/jobs/every 5 seconds each. With N workers x M pipelines, this scales linearly. We recently consolidated to a singlepipeline__slug__incall per poll cycle (PSv2: Improve task fetching & web worker concurrency configuration #1142, feat: fetch jobs for all pipelines in a single API request ami-data-companion#114), but there's no server-side protection if workers misbehave or a bug causes tight retry loops. -
Crawler / bot abuse — The public API is exposed to the internet. Facebook's crawler bots and similar have been observed hitting the API. Without throttling, a bot or misconfigured client can saturate the server.
Proposed Approach
Use DRF's built-in throttling framework (rest_framework.throttling).
Throttle classes to configure
| Scope | Rate | Purpose |
|---|---|---|
anon |
60/min | Unauthenticated requests (bots, crawlers) |
user |
300/min | Authenticated users (UI, normal API usage) |
worker |
30/min | Worker job-polling endpoints specifically |
Implementation sketch
1. Settings (config/settings/base.py):
REST_FRAMEWORK = {
...
"DEFAULT_THROTTLE_CLASSES": [
"rest_framework.throttling.AnonRateThrottle",
"rest_framework.throttling.UserRateThrottle",
],
"DEFAULT_THROTTLE_RATES": {
"anon": "60/min",
"user": "300/min",
"worker": "30/min",
},
}2. Worker-specific throttle for job polling endpoints:
from rest_framework.throttling import SimpleRateThrottle
class WorkerPollThrottle(SimpleRateThrottle):
scope = "worker"
def get_cache_key(self, request, view):
# Throttle by auth token or processing_service_name
if request.user.is_authenticated:
return self.cache_format % {"scope": self.scope, "ident": request.user.pk}
return NoneApply to JobViewSet.list when ids_only=1 or incomplete_only=1 query params are present (these are the worker polling patterns).
3. Backend: Use Redis (already in the stack) for throttle state via django-redis cache backend.
Considerations
- DRF returns
429 Too Many Requestswith aRetry-Afterheader — well-behaved clients (including ADC worker) should respect this - The ADC worker's
get_jobs()already has a try/except around the HTTP call, so 429s would be handled gracefully (logged as error, retry on next poll cycle) - Consider whether
ScopedRateThrottleis more appropriate for fine-grained per-view control - Burst vs sustained rates: DRF's default uses a simple sliding window; for more sophisticated token-bucket behavior, consider
django-ratelimitor a reverse proxy (nginx/Caddy) layer
Context
- Observed during PSv2 integration testing (2026-02-20)
- Worker poll consolidation: PSv2: Improve task fetching & web worker concurrency configuration #1142, feat: fetch jobs for all pipelines in a single API request ami-data-companion#114
- Stale Celery retry loops were also observed flooding the worker with retries — rate limiting on the result endpoint would help contain this