Skip to content

feat: implement PURGE method for cache invalidation #801

@hartym

Description

@hartym

Epic: Cache

Summary

TLDR - Implement HTTP PURGE method for immediate cache invalidation, following Varnish's pattern. This allows operators to remove specific cached responses instantly via PURGE /api/endpoint requests. Essential for maintaining cache consistency when backend data changes, providing immediate invalidation without waiting for TTL expiration.

Context

HARP uses hishel 1.0 for HTTP caching within the AsyncCacheTransport layer. Currently, cached responses can only expire via TTL or be cleared entirely. Production deployments need targeted invalidation when:

  • Content updates require immediate propagation
  • Cached error responses need removal
  • Security patches invalidate previous responses
  • Data corrections must reflect immediately

Varnish's PURGE method is the industry standard: send PURGE /path to immediately remove that exact URL from cache. HARP's kernel event architecture allows intercepting PURGE requests at the ASGI layer before they reach the proxy, enabling http_client to handle cache operations directly.

sequenceDiagram
    participant Client
    participant Kernel as ASGIKernel
    participant PurgeService as CachePurgeListener
    participant Storage as Cache Storage
    
    Client->>Kernel: PURGE /api/users/123
    Kernel->>PurgeService: EVENT_CORE_REQUEST
    PurgeService->>PurgeService: Check IP ACL
    PurgeService->>Storage: delete cache entry
    Storage-->>PurgeService: deleted
    PurgeService-->>Kernel: 200 Purged
    Kernel-->>Client: 200 Purged
    
    Note over Kernel,Storage: Bypasses proxy entirely
Loading

Input

PURGE Request Format:

PURGE /api/users/123 HTTP/1.1
Host: api.example.com
X-Forwarded-For: 10.0.0.5

Configuration (config.yml):

http_client:
  cache:
    enabled: true
    purge:
      enabled: true  # Opt-in for security (default: false)
      acl:
        - 127.0.0.1
        - 10.0.0.0/8
        - 172.16.0.0/12
    policy:
      shared: true
      # ... existing cache policy settings

Security: IP-based ACL restricts PURGE to authorized sources (configurable allowlist)

Cache Key Matching: Exact URL + Host header using injected SpecificationPolicy instance - guaranteed match with hishel's cache key generation

Output and Testing Scenarios

Expected Responses:

  • 200 Purged - Successfully removed from cache
  • 404 Not in cache - URL was not cached
  • 403 Forbidden - IP not in allowlist
  • 405 Method Not Allowed - PURGE disabled

Testing Scenarios:

  1. Happy Path: Cache GET /api/users/123, PURGE /api/users/123, verify removal
  2. Security: PURGE from unauthorized IP returns 403
  3. Not Cached: PURGE non-existent URL returns 404
  4. Case Sensitivity: PURGE /Api/Users/123 doesn't match /api/users/123
  5. Host Header: Different Host headers create different cache entries
  6. Bypass Verification: PURGE returns without hitting backend service
  7. Disabled: When cache.purge.enabled=false, PURGE returns 405

Possible Implementation

Chosen Approach: Kernel Event Interception via EVENT_CORE_REQUEST

Implement in harp_apps/http_client/ using kernel-level event interception with subscriber pattern:

1. Settings Structure

File: harp_apps/http_client/settings/cache.py

class CachePurgeSettings(BaseModel):
    """Cache purge configuration."""
    enabled: bool = False  # Opt-in for security
    acl: list[str] = Field(default_factory=lambda: ["127.0.0.1"])

class CacheSettings(BaseModel):
    enabled: bool = True
    purge: CachePurgeSettings = Field(default_factory=CachePurgeSettings)
    transport: Service
    policy: Service     # SpecificationPolicy - reused for cache key generation!
    storage: Service

2. Service Registration

File: harp_apps/http_client/services.yml

- condition: [!cfg "cache.enabled", !!bool "true"]
  services:
    # Existing cache services (options, policy, transport, storage)
    
    # NEW: Purge listener (nested condition)
    - condition: [!cfg "cache.purge.enabled", !!bool "false"]
      services:
        - name: http_client.cache.purge.listener
          class: harp_apps.http_client.contrib.cache.purge_listener.CachePurgeListener
          kwargs:
            settings: !cfg "cache.purge"
            policy: !svc http_client.cache.policy  # Reuse injected SpecificationPolicy
            storage: !svc http_client.cache.storage

3. Listener Implementation

File: harp_apps/http_client/contrib/cache/purge_listener.py

class CachePurgeListener:
    """Listens for PURGE requests and invalidates cache entries."""
    
    def __init__(self, settings: CachePurgeSettings, policy: SpecificationPolicy, storage: AsyncStorage):
        self.settings = settings
        self.policy = policy  # Same instance used by AsyncCacheTransport
        self.storage = storage
    
    async def subscribe(self, dispatcher):
        """Register EVENT_CORE_REQUEST listener."""
        dispatcher.add_listener(EVENT_CORE_REQUEST, self.on_core_request, priority=100)
    
    async def unsubscribe(self, dispatcher):
        """Unregister listener on shutdown."""
        dispatcher.remove_listener(EVENT_CORE_REQUEST, self.on_core_request)
    
    async def on_core_request(self, event: RequestEvent):
        """Intercept PURGE requests and handle cache invalidation."""
        if event.request.method == "PURGE":
            # Check ACL, generate cache key using self.policy, delete from storage
            # Set early response via event.set_controller()

4. Lifecycle Management

File: harp_apps/http_client/__app__.py

async def on_bind(event: OnBindEvent):
    """Subscribe purge listener when enabled."""
    settings = event.settings.get("http_client", {}).get("cache", {})
    
    if settings.get("enabled") and settings.get("purge", {}).get("enabled"):
        listener = await event.container.resolve("http_client.cache.purge.listener")
        await listener.subscribe(event.container.dispatcher)

async def on_shutdown(event: OnShutdownEvent):
    """Unsubscribe purge listener."""
    settings = event.settings.get("http_client", {}).get("cache", {})
    
    if settings.get("enabled") and settings.get("purge", {}).get("enabled"):
        listener = await event.container.resolve("http_client.cache.purge.listener")
        await listener.unsubscribe(event.container.dispatcher)

application = Application(
    on_bind=on_bind,
    on_shutdown=on_shutdown,
    settings_type=HttpClientSettings,
)

Key Architecture Benefits

  • Lives in http_client (semantic fit - cache operations)
  • Uses EVENT_CORE_REQUEST (earliest kernel interception)
  • Bypasses proxy flow entirely (performance + simplicity)
  • Reuses injected SpecificationPolicy (guaranteed cache key match)
  • Subscriber pattern (clean lifecycle, follows Rules app convention)
  • Nested configuration (cache.purge) - logical hierarchy
  • Security by default (opt-in via enabled: false)
  • No modifications to proxy or kernel code

Integration Points

  • harp/asgi/events.py - EVENT_CORE_REQUEST definition (line 15)
  • harp/asgi/kernel.py - Event dispatch at request arrival (line 166-174)
  • harp_apps/http_client/__app__.py - Listener registration in on_bind/on_shutdown hooks
  • harp_apps/http_client/services.yml - Conditional service registration (lines 42-56)
  • harp_apps/storage/types/blob_storage.py - IBlobStorage.delete()

Current Challenges

None - The SpecificationPolicy is already injected via DI and can be reused directly for cache key generation, ensuring exact matches with hishel's algorithm.

Pattern Matching: Initial version supports exact URL only. Wildcard support (PURGE /api/users/*) requires additional metadata storage or blob scanning, deferred to future enhancement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions