Long memory content returns 500 instead of a structured 413/422 when embedder context is exceeded

## What I hit

While doing pre-production hardening on my omem instance, I tried to consolidate four multi-part memory chains (via the new `replaces` flow from #14). Four of them failed:

```
project_backup_rails            4-part → 7046 chars  FAIL
project_ci_runner_architecture  4-part → 7285 chars  FAIL
project_pki_infrastructure      5-part → 7410 chars  FAIL
project_syncdb                  4-part → 7023 chars  FAIL
```

The server returned **HTTP 500** with:

```json
{"error":{"code":"internal_error","message":"embedding error: failed to embed content: embedding error: embedding API returned 400 Bad Request: {\"error\":{\"message\":\"the input length exceeds the context window...\"}}"}}
```

(My embedder is `nomic-embed-text` at 2048 ctx; ~7000 chars of technical prose with URLs/code/IPs tokenized past the limit.)

## Context

I'm evaluating omem as the shared-knowledge layer for **Synchresis** (the MSP I work at). The intent is to roll it out across the team so that everyone using Claude has a consistent set of org rules, runbooks, and architecture context loaded. For that use case, long memories are the norm — architecture overviews, multi-section runbooks, full project summaries.

The current behavior surfaces three problems that matter more at team scale than at single-user scale:

1. **`500 internal_error` is wrong.** This is a client-input problem, not a server fault. It should be `413 Payload Too Large` (or `422 Unprocessable Entity` with a clear shape).
2. **The real cause is buried** inside a stringified upstream embedder error. Callers have to substring-match `"exceeds the context window"` to recognize the case. No structured fields.
3. **No pre-flight check.** The error only surfaces after a full round-trip to the embedder service. For batch consolidation work or bulk ingest of an existing knowledge base, you eat the latency for every too-long item.

## Where it lives

`api/handlers/memory.rs` calls `state.embed.embed(&[content]).await` and maps any failure to `OmemError::Embedding(...)`, which routes through `api/error.rs` to `500 internal_error` (the catch-all for `Storage/Embedding/Llm/Internal`).

## Options

### A. Structured error mapping (cheapest)

In the embed call site, recognize the upstream "input length exceeds context window" signature and surface it as a distinct error variant (e.g. `OmemError::ContentTooLong { length, hint }`), mapped to `413` in `api/error.rs`. Response body becomes:

```json
{"error":{"code":"content_too_long","message":"content (7410 chars) exceeded the configured embedder's context window","hint":"consider splitting or using a longer-context embedder"}}
```

Pros: tiny diff, no new config surface, no behavior change for any other path.
Cons: still reactive (round-trips to embedder before failing).

### B. Pre-flight token estimate (cleaner, slightly bigger)

Add `OMEM_EMBED_MAX_TOKENS` env (or auto-discover from the embedder's metadata if it advertises). Reject in the handler before calling embed when a conservative `chars/4` estimate exceeds the limit. Pair with A for the actual-too-long case the estimate misses.

Pros: instant failure, no round-trip cost.
Cons: new config, conservative estimate has false positives.

### C. Server-side chunk-on-ingest (most invasive)

Split content into N memories sized to fit, link via `superseded_by` / `replaces` semantics so the "logical memory" is a chain. Changes the contract from "one memory = one record."

Pros: long memories just work.
Cons: changes data model, breaks the simple shape, makes search ranking more complex (which fragment do you score? all? top one?).

## My lean

**A first** — it's a self-contained 30-line change, no new config, no behavior change for the happy path. **B as a follow-up** if it turns out people hit this often and the round-trip latency matters at their scale. **C feels like a separate, bigger conversation** about whether omem should support "long-form memory" as a first-class concept. *(Worth raising if Synchresis rollout proves long-form is the common case, which it might.)*

Happy to PR A. Wanted to surface the design question first since this is the kind of error shape that's worth getting consistent across the API surface from the start.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long memory content returns 500 instead of a structured 413/422 when embedder context is exceeded #15

What I hit

Context

Where it lives

Options

A. Structured error mapping (cheapest)

B. Pre-flight token estimate (cleaner, slightly bigger)

C. Server-side chunk-on-ingest (most invasive)

My lean

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Long memory content returns 500 instead of a structured 413/422 when embedder context is exceeded #15

Description

What I hit

Context

Where it lives

Options

A. Structured error mapping (cheapest)

B. Pre-flight token estimate (cleaner, slightly bigger)

C. Server-side chunk-on-ingest (most invasive)

My lean

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions