perf: consolidate tool metrics queries (11 → 4 pipelines) by dborysenko · Pull Request #61 · virtUOS/librechat_exporter

dborysenko · 2026-05-07T16:02:00Z

Summary

The previous implementation ran 11 sequential aggregation pipelines, each
starting with $unwind on the full content array. On DocumentDB this
is expensive because the array can contain large blobs (images, long text)
that must be materialised in memory before the $match on content.type
can filter them out — causing noticeably slow collection cycles on
larger installations.
Rewrote _fetch_all_tool_metrics() to use 4 pipelines instead:
- Query A — pre-filters documents with $match {"content.type": "tool_call"}
  (hits the multikey index), then uses $project + $filter to extract a
  tiny _tc array of only the tool-call items (typically 1–3 elements vs
  10–50+), and unwinds that. A single pass accumulates total calls,
  per-tool, per-model, and per-endpoint counts.
- Query B — all-time error counts per tool; uses $indexOfCP instead of
  $regex for error string matching (DocumentDB-compatible, avoids a
  full string scan on every output blob).
- Query C — 5-minute window; time filter is placed first so DocumentDB
  uses the updatedAt index before touching the content array at all.
  Total calls, per-tool counts, and error counts in a single pass.
- Query D — messages with tool calls via count_documents() (index-only,
  no aggregation pipeline needed) + active tool users.

Test plan

Tested on production server — metrics are still showing as previously
No errors in container logs, collection time went from infinite to ~60sec on my 5Gb database

The previous implementation ran 11 sequential MongoDB aggregations, each starting with $unwind on the full content array. On DocumentDB this is very expensive because the content array can contain large blobs (images, long text) that must be materialised before the $match on content.type can filter them out. New approach: - Pre-filter documents at the collection level using $match on content.type (hits the multikey index, skips non-tool messages). - Use $project + $filter to extract a tiny "_tc" array containing only the tool_call items before unwinding — typically 1-3 elements instead of 10-50+. - Query A: single all-time pass that accumulates total calls, per-tool, per-model, and per-endpoint counts in one pipeline. - Query B: all-time error counts; uses $indexOfCP instead of $regex to avoid a full string scan on every output blob (DocumentDB-compatible). - Query C: 5-minute window — total calls, per-tool, and errors in one pipeline; time filter is placed first to hit the updatedAt index. - Query D: messages_with_tools via count_documents (index-only, no aggregation); active_tool_users with a minimal group pipeline. Co-authored-by: Cursor <cursoragent@cursor.com>

Odrec · 2026-05-08T01:17:09Z

Thanks for this. The strategy (prefilter at the document level, $filter the array down before $unwind) is the right shape for DocumentDB, and the inline docs explaining why are great. A couple of things before merge.

🔴 Case-sensitivity regression in error matching

The old code used {"$regex": "Error processing tool", "$options": "i"} — case-insensitive. The new $indexOfCP substring check is case-sensitive, so it won't match outputs that LibreChat actually emits in lowercase.

Verified against the LibreChat source:

api/server/services/ToolService.js:464
  output: `Error processing tool ${currentAction.tool}: …`     // capitalized
api/server/controllers/assistants/errors.js:170
api/server/controllers/assistants/chatV1.js:226
  output: 'error processing tool'                              // lowercase

The lowercase variant from the assistants/v1 paths will silently drop out of errors_per_tool, total_errors, and errors_5m. The "metrics still show as previously" sanity check on prod likely just means that code path isn't exercised on your instance.

Cheapest fix that preserves the perf intent is to OR two prefix checks:

ERROR_PREFIX_VARIANTS = ("Error processing tool", "error processing tool")

def _matches_error(output_expr):
    return {"$or": [
        {"$gte": [{"$indexOfCP": [output_expr, p]}, 0]}
        for p in ERROR_PREFIX_VARIANTS
    ]}

…and use _matches_error({"$ifNull": ["$$c.tool_call.output", ""]}) in both Query B and the $cond inside Query C.

If you'd rather match the old semantics exactly (any case, anywhere in the string), $regexMatch with options: "i" is supported on DocumentDB 4.0+ and is closer to a 1:1 swap.

🟡 Unnamed `tool_call` items now counted as "unknown"

The old per-tool / per-model / per-endpoint / errors_per_tool pipelines all required "content.tool_call.name": {"$exists": True}. The new pipelines drop that and use {"$ifNull": [..., "unknown"]}, so unnamed items now show up under the label unknown. Side effect: sum(per_tool.values()) now equals total_calls — it didn't before.

Probably an improvement (the old breakdowns were inconsistent with the totals), but it's a visible label change for any dashboard filtering on tool name. Either call it out in the PR description, or restore the filter inside the $filter cond if you'd rather preserve the old behavior:

"cond": {"$and": [
    {"$eq": ["$$c.type", "tool_call"]},
    {"$ne": [{"$type": "$$c.tool_call.name"}, "missing"]},
]}

🟡 Defensive cast on `tool_call.output`

$indexOfCP requires a string and will throw if output is ever stored as a non-string (object, number, array). $ifNull only guards null/missing. The old $regex was more tolerant. Swap the $ifNull for:

{"$convert": {"input": "$$c.tool_call.output", "to": "string", "onError": "", "onNull": ""}}

Nice work overall. The perf gain is significant and the consolidation is well thought out.

- Replace $indexOfCP with $regexMatch (options:"i") in Query B and C to match both capitalised and lowercase error strings emitted by LibreChat (ToolService.js vs assistants/chatV1.js). - Replace $ifNull on tool_call.output with $convert (to:string, onError:"", onNull:"") to guard against non-string stored values that would cause $regexMatch to throw. Co-authored-by: Cursor <cursoragent@cursor.com>

- Remove alignment whitespace in Query A and C result loops (E221, E272) - Wrap long $convert expressions across multiple lines (E501) Co-authored-by: Cursor <cursoragent@cursor.com>

dborysenko · 2026-05-08T02:52:39Z

Thanks for the thorough review, @Odrec — all three points are valid and addressed in the latest commit.

🔴 Case-sensitivity / $indexOfCP → $regexMatch
Replaced both occurrences (Query B $filter cond and Query C $cond) with $regexMatch + "options": "i", which matches the old $regex semantics exactly. Also swapped $ifNull for $convert (to: "string", onError: "", onNull: "") to guard against non-string output values.

This turned out to fix a real bug, not just a defensive improvement. We verified that our production database has 779 documents where tool_call.output is stored as a non-string type. With $indexOfCP, DocumentDB was throwing a type-mismatch error inside the $filter condition and silently treating it as "include this item" — so Query B was expanding the full tool-call arrays of all 779 documents instead of only actual error items. This was also corrupting errors_per_tool and errors_5m with false positives. The fix resolves both the performance regression and the incorrect error counts.

🟡 Unnamed tool_call items counted as "unknown"
Keeping the new behaviour. The old pipelines excluded unnamed items entirely via {"$exists": True}, which meant sum(per_tool.values()) < total_calls — the totals were internally inconsistent. The new approach counts them under "unknown", which at least makes the per-tool breakdown add up to the total. The "unknown" label also makes these calls visible in dashboards rather than silently dropping them, which is useful for diagnosing agent transfers or other tool-call variants that don't carry a name.

🟡 $convert defensive cast
Done as part of the error-matching fix above — both spots use $convert now.

Dmytro Borysenko and others added 2 commits May 7, 2026 21:02

fix: resolve flake8 linting errors

d12d36e

- Remove alignment whitespace in Query A and C result loops (E221, E272) - Wrap long $convert expressions across multiple lines (E501) Co-authored-by: Cursor <cursoragent@cursor.com>

Odrec merged commit c23d5b3 into virtUOS:main May 8, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: consolidate tool metrics queries (11 → 4 pipelines)#61

perf: consolidate tool metrics queries (11 → 4 pipelines)#61
Odrec merged 3 commits into
virtUOS:mainfrom
dborysenko:perf/optimize-tool-metrics-queries

dborysenko commented May 7, 2026

Uh oh!

Odrec commented May 8, 2026

Uh oh!

dborysenko commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dborysenko commented May 7, 2026

Summary

Test plan

Uh oh!

Odrec commented May 8, 2026

🔴 Case-sensitivity regression in error matching

🟡 Unnamed tool_call items now counted as "unknown"

🟡 Defensive cast on tool_call.output

Uh oh!

dborysenko commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🟡 Unnamed `tool_call` items now counted as "unknown"

🟡 Defensive cast on `tool_call.output`