docs: update README and recipes to clarify multi-pass extraction and cache interaction

IgnatG · IgnatG · commit c61fb3ece458 · 2026-02-20T16:31:59.000Z
diff --git a/README.md b/README.md
@@ -369,6 +369,12 @@ The API implements a two-tier caching strategy for extraction cost-control and f
 
 Enabled automatically via `ProviderManager.ensure_cache()`. Every `litellm.completion()` call is cached in Redis, keyed by the full request parameters (prompt, model, temperature). Identical LLM prompts hit the cache directly — no API cost.
 
+> **Multi-pass bypass:** When `passes > 1`, only the first pass (pass 0) is
+> served from the LiteLLM cache. Passes ≥ 2 automatically include
+> `cache={"no-cache": True}` so each subsequent pass gets a fresh LLM response.
+> This is handled transparently by the `langextract-litellm` provider via the
+> `pass_num` kwarg that LangExtract threads through the annotation loop.
+
 ### Tier 2 — Extraction-Result Cache
 
 An **extraction-result-level** cache that sits above the LLM layer. When a document is extracted with the same text, prompt, examples, model, temperature, and passes, the complete result (entities + metadata) is returned from cache in < 500 ms with zero API cost.
diff --git a/docs/recipes.md b/docs/recipes.md
@@ -100,6 +100,11 @@ curl -X POST http://localhost:8000/api/v1/extract \
 Run multiple extraction passes to get a `confidence_score` (0.0–1.0) on every entity.  Higher values mean the entity was found consistently across passes.
 Early stopping kicks in automatically when consecutive passes yield identical results, so extra passes cost nothing when the model is already stable.
 
+> **Cache interaction:** The first pass may be served from the LiteLLM Redis
+> cache (fast, zero cost). Passes ≥ 2 **always bypass** the LLM response cache
+> so that each subsequent pass produces a genuinely independent extraction. This
+> is handled automatically by the `langextract-litellm` provider.
+
 ```bash
 curl -X POST http://localhost:8000/api/v1/extract \
   -H "Content-Type: application/json" \