Skip to content

Commit c61fb3e

Browse files
committed
docs: update README and recipes to clarify multi-pass extraction and cache interaction
1 parent 23363f6 commit c61fb3e

File tree

2 files changed

+11
-0
lines changed

2 files changed

+11
-0
lines changed

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,12 @@ The API implements a two-tier caching strategy for extraction cost-control and f
369369

370370
Enabled automatically via `ProviderManager.ensure_cache()`. Every `litellm.completion()` call is cached in Redis, keyed by the full request parameters (prompt, model, temperature). Identical LLM prompts hit the cache directly — no API cost.
371371

372+
> **Multi-pass bypass:** When `passes > 1`, only the first pass (pass 0) is
373+
> served from the LiteLLM cache. Passes ≥ 2 automatically include
374+
> `cache={"no-cache": True}` so each subsequent pass gets a fresh LLM response.
375+
> This is handled transparently by the `langextract-litellm` provider via the
376+
> `pass_num` kwarg that LangExtract threads through the annotation loop.
377+
372378
### Tier 2 — Extraction-Result Cache
373379

374380
An **extraction-result-level** cache that sits above the LLM layer. When a document is extracted with the same text, prompt, examples, model, temperature, and passes, the complete result (entities + metadata) is returned from cache in < 500 ms with zero API cost.

docs/recipes.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,11 @@ curl -X POST http://localhost:8000/api/v1/extract \
100100
Run multiple extraction passes to get a `confidence_score` (0.0–1.0) on every entity. Higher values mean the entity was found consistently across passes.
101101
Early stopping kicks in automatically when consecutive passes yield identical results, so extra passes cost nothing when the model is already stable.
102102

103+
> **Cache interaction:** The first pass may be served from the LiteLLM Redis
104+
> cache (fast, zero cost). Passes ≥ 2 **always bypass** the LLM response cache
105+
> so that each subsequent pass produces a genuinely independent extraction. This
106+
> is handled automatically by the `langextract-litellm` provider.
107+
103108
```bash
104109
curl -X POST http://localhost:8000/api/v1/extract \
105110
-H "Content-Type: application/json" \

0 commit comments

Comments
 (0)