Skip to content

Improve scraping on recent releases#472

Open
kekkokk wants to merge 1 commit intog0ldyy:mainfrom
kekkokk:feature/dynamic_ttl
Open

Improve scraping on recent releases#472
kekkokk wants to merge 1 commit intog0ldyy:mainfrom
kekkokk:feature/dynamic_ttl

Conversation

@kekkokk
Copy link

@kekkokk kekkokk commented Jan 11, 2026

Improve scraping on recent releases with a more aggressive TTL and fallback to live search if no debrid cached results are returned

#466

Summary by CodeRabbit

Release Notes

  • New Features
    • Added intelligent cache management that adapts Time-To-Live (TTL) based on content release date for recently released items.
    • Introduced configurable fallback cache behavior to trigger more frequent searches when torrents lack debrid caching.
    • Enhanced content availability through dynamic cache refresh logic.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 11, 2026

Walkthrough

This PR adds release-date-aware caching logic for torrent searches. It introduces three new configurable TTL parameters and implements conditional cache freshness checks based on content release recency and debrid availability, with fallback scraping when needed.

Changes

Cohort / File(s) Summary
Configuration & Settings
.env-sample, comet/core/models.py
Added three new optional configuration parameters: LIVE_TORRENT_CACHE_TTL_RECENT_RELEASE (TTL override for recent releases), RECENT_RELEASE_DAYS (threshold for classifying recent content), and LIVE_TORRENT_CACHE_TTL_NO_DEBRID (fallback TTL when no debrid cached torrents exist).
Stream Endpoint Logic
comet/api/endpoints/stream.py
Implemented release-date-aware TTL computation by fetching release dates from cache and calculating effective TTL based on recency. Added fallback logic triggered when no release date is available and no debrid cached torrents are found, using distributed locking to coordinate concurrent scrape attempts. Includes post-scrape cache re-validation and enhanced logging for scraping decisions.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/API
    participant Cache as Release<br/>Date Cache
    participant Debrid as Debrid<br/>Service
    participant Lock as Distributed<br/>Lock
    participant Scraper as Live<br/>Scraper
    participant TorrentCache as Torrent<br/>Cache

    Client->>Cache: Fetch release_date for media_id
    Cache-->>Client: Return release_date (if available)
    
    alt Release Date Available
        Client->>Client: Compute effective_cache_ttl<br/>based on RECENT_RELEASE_DAYS
    else No Release Date
        Client->>Client: Use standard LIVE_TORRENT_CACHE_TTL
    end
    
    Client->>TorrentCache: Check cache freshness<br/>using effective_cache_ttl
    TorrentCache-->>Client: Return cached_count
    
    alt Cache Stale & No Debrid Cached Torrents
        Client->>Lock: Acquire distributed lock
        Lock-->>Client: Lock acquired/waiting
        Client->>Scraper: Initiate live scrape
        Scraper->>Debrid: Search for torrents
        Debrid-->>Scraper: Return results
        Scraper->>TorrentCache: Update cache
        TorrentCache-->>Scraper: Confirm
        Scraper-->>Client: Scrape complete
        Client->>Lock: Release lock
        Client->>TorrentCache: Re-check cache<br/>after scrape
        TorrentCache-->>Client: Return updated counts
    else Cache Valid or Debrid Torrents Present
        Client-->>Client: Use existing cache
    end
    
    Client-->>Client: Return results with appropriate TTL
Loading

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: introducing release-date aware TTL logic and no-debrid fallback scraping to improve scraping behavior for recent releases.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
comet/api/endpoints/stream.py (1)

540-616: Consider extracting the live scrape + debrid re-check pattern into a helper function.

This fallback block duplicates logic from earlier in the function (lock acquisition, scraping, debrid availability check). The complexity is manageable but could benefit from extraction for maintainability.

Additionally, the cached_count == 0 check on line 572 is redundant since it's already ensured by the outer condition on line 544.

♻️ Suggested simplification for line 572
-                if cached_count == 0 and not needs_scraping and not lock_acquired:
+                if not needs_scraping and not lock_acquired:
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 73e383e and 1f0ebbb.

📒 Files selected for processing (3)
  • .env-sample
  • comet/api/endpoints/stream.py
  • comet/core/models.py
🧰 Additional context used
🧬 Code graph analysis (1)
comet/api/endpoints/stream.py (4)
comet/core/db_router.py (1)
  • fetch_val (84-87)
comet/services/lock.py (3)
  • DistributedLock (9-113)
  • acquire (25-81)
  • release (83-94)
comet/services/orchestration.py (2)
  • scrape_torrents (59-105)
  • get_cached_torrents (107-156)
comet/services/debrid.py (1)
  • check_existing_availability (78-116)
🔇 Additional comments (4)
.env-sample (1)

74-89: LGTM!

The new environment variables are well-documented with clear explanations of their purpose and interaction with existing TTL settings. The "disabled by default" approach is appropriate for optional features that change caching behavior.

comet/core/models.py (1)

45-47: LGTM!

The new configuration fields are correctly typed as Optional[int] with None defaults, matching the .env-sample documentation. Logical placement alongside related TTL settings.

comet/api/endpoints/stream.py (2)

283-300: LGTM!

The effective TTL is correctly integrated into the fresh cache count query. The outer condition properly checks if the TTL feature is enabled (>= 0), while the inner query uses the computed effective_cache_ttl which accounts for recent releases.


249-277: No issue found. The release_date is correctly stored as a Unix timestamp (BIGINT in the database). It's populated via datetime.strptime(release_date_str, "%Y-%m-%d").timestamp() in comet/metadata/filter.py (line 76-77), which converts date strings to Unix timestamps in seconds. The calculation (time.time() - release_date) / 86400 in line 268 is correct for converting the difference to days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant