-
-
Notifications
You must be signed in to change notification settings - Fork 130
Description
Currently, the SCRAPE_WAIT_TIMEOUT environment variable (defaulting to 30s) controls the timeout for both live requests (triggered by the user in real-time) and background scraping jobs.
This creates a dilemma:
For live scraping: We often want a shorter timeout to prevent the UI from hanging for too long if a target is unresponsive.
For background scraping: We often want a longer timeout to maximize the success rate, as speed is less critical than data retrieval completeness.
Using a single variable forces a compromise that hurts either the user experience (waiting too long) or data reliability (timing out too early in the background).
Describe the solution you'd like
I would like to separate this configuration into two distinct environment variables:
LIVE_SCRAPE_WAIT_TIMEOUT: Applied to synchronous/live scrape requests.
BACKGROUND_SCRAPE_WAIT_TIMEOUT: Applied to asynchronous/background jobs.
Ideally, these should default to the existing standard (e.g., 30s) to maintain backward compatibility, or allow SCRAPE_WAIT_TIMEOUT to act as a fallback if the specific ones are not set.
Describe alternatives you've considered
Status Quo: Keeping the timeout at 30s is "okay," but it prevents optimizing for slow sites in the background without ruining the live dashboard experience.
Hardcoding: Hardcoding a multiplier for background tasks (e.g., always 2x the live timeout), but this lacks flexibility.
Additional context
Splitting these allows administrators to set a "fail-fast" policy for the UI (e.g., 10s) while allowing background workers to persist on slower connections or heavy pages (e.g., 60s+).