Skip to content

Feature Request: Split SCRAPE_WAIT_TIMEOUT into separate live and background timeouts #515

@charleeislegend

Description

@charleeislegend

Currently, the SCRAPE_WAIT_TIMEOUT environment variable (defaulting to 30s) controls the timeout for both live requests (triggered by the user in real-time) and background scraping jobs.

This creates a dilemma:

For live scraping: We often want a shorter timeout to prevent the UI from hanging for too long if a target is unresponsive.

For background scraping: We often want a longer timeout to maximize the success rate, as speed is less critical than data retrieval completeness.

Using a single variable forces a compromise that hurts either the user experience (waiting too long) or data reliability (timing out too early in the background).
Describe the solution you'd like

I would like to separate this configuration into two distinct environment variables:

LIVE_SCRAPE_WAIT_TIMEOUT: Applied to synchronous/live scrape requests.

BACKGROUND_SCRAPE_WAIT_TIMEOUT: Applied to asynchronous/background jobs.

Ideally, these should default to the existing standard (e.g., 30s) to maintain backward compatibility, or allow SCRAPE_WAIT_TIMEOUT to act as a fallback if the specific ones are not set.
Describe alternatives you've considered

Status Quo: Keeping the timeout at 30s is "okay," but it prevents optimizing for slow sites in the background without ruining the live dashboard experience.

Hardcoding: Hardcoding a multiplier for background tasks (e.g., always 2x the live timeout), but this lacks flexibility.

Additional context

Splitting these allows administrators to set a "fail-fast" policy for the UI (e.g., 10s) while allowing background workers to persist on slower connections or heavy pages (e.g., 60s+).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions