What I hit
While doing pre-production hardening on my omem instance, I tried to consolidate four multi-part memory chains (via the new replaces flow from #14). Four of them failed:
project_backup_rails 4-part → 7046 chars FAIL
project_ci_runner_architecture 4-part → 7285 chars FAIL
project_pki_infrastructure 5-part → 7410 chars FAIL
project_syncdb 4-part → 7023 chars FAIL
The server returned HTTP 500 with:
{"error":{"code":"internal_error","message":"embedding error: failed to embed content: embedding error: embedding API returned 400 Bad Request: {\"error\":{\"message\":\"the input length exceeds the context window...\"}}"}}
(My embedder is nomic-embed-text at 2048 ctx; ~7000 chars of technical prose with URLs/code/IPs tokenized past the limit.)
Context
I'm evaluating omem as the shared-knowledge layer for Synchresis (the MSP I work at). The intent is to roll it out across the team so that everyone using Claude has a consistent set of org rules, runbooks, and architecture context loaded. For that use case, long memories are the norm — architecture overviews, multi-section runbooks, full project summaries.
The current behavior surfaces three problems that matter more at team scale than at single-user scale:
500 internal_error is wrong. This is a client-input problem, not a server fault. It should be 413 Payload Too Large (or 422 Unprocessable Entity with a clear shape).
- The real cause is buried inside a stringified upstream embedder error. Callers have to substring-match
"exceeds the context window" to recognize the case. No structured fields.
- No pre-flight check. The error only surfaces after a full round-trip to the embedder service. For batch consolidation work or bulk ingest of an existing knowledge base, you eat the latency for every too-long item.
Where it lives
api/handlers/memory.rs calls state.embed.embed(&[content]).await and maps any failure to OmemError::Embedding(...), which routes through api/error.rs to 500 internal_error (the catch-all for Storage/Embedding/Llm/Internal).
Options
A. Structured error mapping (cheapest)
In the embed call site, recognize the upstream "input length exceeds context window" signature and surface it as a distinct error variant (e.g. OmemError::ContentTooLong { length, hint }), mapped to 413 in api/error.rs. Response body becomes:
{"error":{"code":"content_too_long","message":"content (7410 chars) exceeded the configured embedder's context window","hint":"consider splitting or using a longer-context embedder"}}
Pros: tiny diff, no new config surface, no behavior change for any other path.
Cons: still reactive (round-trips to embedder before failing).
B. Pre-flight token estimate (cleaner, slightly bigger)
Add OMEM_EMBED_MAX_TOKENS env (or auto-discover from the embedder's metadata if it advertises). Reject in the handler before calling embed when a conservative chars/4 estimate exceeds the limit. Pair with A for the actual-too-long case the estimate misses.
Pros: instant failure, no round-trip cost.
Cons: new config, conservative estimate has false positives.
C. Server-side chunk-on-ingest (most invasive)
Split content into N memories sized to fit, link via superseded_by / replaces semantics so the "logical memory" is a chain. Changes the contract from "one memory = one record."
Pros: long memories just work.
Cons: changes data model, breaks the simple shape, makes search ranking more complex (which fragment do you score? all? top one?).
My lean
A first — it's a self-contained 30-line change, no new config, no behavior change for the happy path. B as a follow-up if it turns out people hit this often and the round-trip latency matters at their scale. C feels like a separate, bigger conversation about whether omem should support "long-form memory" as a first-class concept. (Worth raising if Synchresis rollout proves long-form is the common case, which it might.)
Happy to PR A. Wanted to surface the design question first since this is the kind of error shape that's worth getting consistent across the API surface from the start.
What I hit
While doing pre-production hardening on my omem instance, I tried to consolidate four multi-part memory chains (via the new
replacesflow from #14). Four of them failed:The server returned HTTP 500 with:
{"error":{"code":"internal_error","message":"embedding error: failed to embed content: embedding error: embedding API returned 400 Bad Request: {\"error\":{\"message\":\"the input length exceeds the context window...\"}}"}}(My embedder is
nomic-embed-textat 2048 ctx; ~7000 chars of technical prose with URLs/code/IPs tokenized past the limit.)Context
I'm evaluating omem as the shared-knowledge layer for Synchresis (the MSP I work at). The intent is to roll it out across the team so that everyone using Claude has a consistent set of org rules, runbooks, and architecture context loaded. For that use case, long memories are the norm — architecture overviews, multi-section runbooks, full project summaries.
The current behavior surfaces three problems that matter more at team scale than at single-user scale:
500 internal_erroris wrong. This is a client-input problem, not a server fault. It should be413 Payload Too Large(or422 Unprocessable Entitywith a clear shape)."exceeds the context window"to recognize the case. No structured fields.Where it lives
api/handlers/memory.rscallsstate.embed.embed(&[content]).awaitand maps any failure toOmemError::Embedding(...), which routes throughapi/error.rsto500 internal_error(the catch-all forStorage/Embedding/Llm/Internal).Options
A. Structured error mapping (cheapest)
In the embed call site, recognize the upstream "input length exceeds context window" signature and surface it as a distinct error variant (e.g.
OmemError::ContentTooLong { length, hint }), mapped to413inapi/error.rs. Response body becomes:{"error":{"code":"content_too_long","message":"content (7410 chars) exceeded the configured embedder's context window","hint":"consider splitting or using a longer-context embedder"}}Pros: tiny diff, no new config surface, no behavior change for any other path.
Cons: still reactive (round-trips to embedder before failing).
B. Pre-flight token estimate (cleaner, slightly bigger)
Add
OMEM_EMBED_MAX_TOKENSenv (or auto-discover from the embedder's metadata if it advertises). Reject in the handler before calling embed when a conservativechars/4estimate exceeds the limit. Pair with A for the actual-too-long case the estimate misses.Pros: instant failure, no round-trip cost.
Cons: new config, conservative estimate has false positives.
C. Server-side chunk-on-ingest (most invasive)
Split content into N memories sized to fit, link via
superseded_by/replacessemantics so the "logical memory" is a chain. Changes the contract from "one memory = one record."Pros: long memories just work.
Cons: changes data model, breaks the simple shape, makes search ranking more complex (which fragment do you score? all? top one?).
My lean
A first — it's a self-contained 30-line change, no new config, no behavior change for the happy path. B as a follow-up if it turns out people hit this often and the round-trip latency matters at their scale. C feels like a separate, bigger conversation about whether omem should support "long-form memory" as a first-class concept. (Worth raising if Synchresis rollout proves long-form is the common case, which it might.)
Happy to PR A. Wanted to surface the design question first since this is the kind of error shape that's worth getting consistent across the API surface from the start.