feat(sites): 站点级模型探测 —— 实时日志、延迟阈值与自动禁用#510
Conversation
- Move post-refresh probe settings from global config to per-site (sites table) - Add manual 'probe now' button in site editor with real-time SSE log stream - Use fetch+ReadableStream instead of EventSource to support Authorization header - Concurrent probe execution (default 10 workers) via worker-pool pattern - Add stop button to cancel in-flight probe while preserving completed results - Add latency threshold: auto-disable models exceeding response time limit - Treat inconclusive probe results as unsupported (auto-disable) - Show specific failure reason in logs (timeout, no token, model not found, etc.) - Refresh and display post-probe model status list (green=available, red=disabled) - Remove probe settings from global Settings page and RuntimeSettingsPayload
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds per‑site post‑refresh probing: DB migrations and schema fields, backup import/export propagation, server endpoints (one‑shot + SSE), modelService probing + availability updates, and UI controls/streaming logs for manual probes. Changes
Sequence DiagramsequenceDiagram
participant UI as "Web UI (Sites)"
participant API as "API Server (sites routes)"
participant ModelSvc as "Model Service"
participant Runtime as "Runtime Models"
participant DB as "Database"
participant TokenRoutes as "Token Routes"
UI->>API: POST /api/sites/:id/probe-now (or open /probe-stream)
API->>ModelSvc: probeSiteModels(siteId, options, onProgress)
ModelSvc->>Runtime: Probe selected models (concurrent)
Runtime-->>ModelSvc: Return probe results (status, latency)
ModelSvc->>DB: Persist availability changes / record disables
DB-->>ModelSvc: Acknowledge persistence
ModelSvc->>TokenRoutes: rebuildTokenRoutesFromAvailability()
TokenRoutes-->>ModelSvc: Rebuild complete
ModelSvc-->>API: Emit progress/final result
API-->>UI: JSON response or SSE events (start/model/action/complete/error)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (1)
src/server/services/modelService.ts (1)
533-548:runPostRefreshProbeIfEnabled采用串行探测,可能显著拉长模型刷新耗时。该函数在模型刷新成功路径的收尾处被
await(如 Line 735-739、821-825、921-925、1007-1011、1270-1274)。当站点设置postRefreshProbeScope='all'且发现模型较多时,每个模型最坏会跑满config.modelAvailabilityProbeTimeoutMs;一个定时批量刷新里所有账号的刷新会被这段串行探测累加,可能把refreshModelsAndRebuildRoutes从几秒拖到几分钟。同时
probeSiteModels(Line 442-477)已经实现了带并发上限的 worker-pool,这里与之几乎是逐行重复。建议:
- 把 worker-pool +
latencyThresholdMs+unsupported/inconclusive处理抽成共享工具(例如runSiteProbeBatch(site, account, modelsToProbe, options)),被probeSiteModels和runPostRefreshProbeIfEnabled复用,减少分叉。- 在
runPostRefreshProbeIfEnabled中至少使用与probeSiteModels相同的并发(默认 10),避免 refresh 路径上的串行阻塞。- 顺带让自动刷新路径也能接入延迟阈值(见
Sites.tsx上另一条评论)。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/services/modelService.ts` around lines 533 - 548, runPostRefreshProbeIfEnabled currently probes models sequentially via probeRuntimeModel, which can hugely delay refresh when postRefreshProbeScope='all' and many models hit config.modelAvailabilityProbeTimeoutMs; extract the worker-pool + latencyThresholdMs + unsupported/inconclusive handling into a shared helper (e.g., runSiteProbeBatch(site, account, modelsToProbe, options)) reusing the logic in probeSiteModels, then modify runPostRefreshProbeIfEnabled to call that helper with the same concurrency limit (default 10) instead of the serial for-loop and ensure the helper enforces latencyThresholdMs and marks unsupported/inconclusive results consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/server/db/schema.ts`:
- Around line 18-20: The three new schema columns (postRefreshProbeEnabled,
postRefreshProbeModel, postRefreshProbeScope) were added to the Drizzle schema
but the migration artifacts and SQL patches were not regenerated; run Drizzle's
migration tooling to generate a new migration that includes these columns,
update the snapshot and journal (e.g., 0024_snapshot.json and _journal.json) and
regenerate the checked-in SQL patches (mysql.bootstrap.sql, mysql.upgrade.sql,
postgres.bootstrap.sql, postgres.upgrade.sql) so the schema, snapshot, and SQL
outputs stay in sync.
In `@src/server/routes/api/sites.ts`:
- Around line 23-25: The SSE helper sseWrite currently swallows write errors and
the route keeps running probeSiteModels even after the client disconnects;
change the route to create an AbortController, listen for request.raw
'close'/'finish'/'error' (or check raw.destroyed/writableEnded) and call
controller.abort() when the stream closes, pass controller.signal into
probeSiteModels (and any downstream service/worker pool APIs) so work cancels
promptly, and update sseWrite to early-return if the response is
closed/destroyed to avoid ignoring disconnects; ensure all callers in this route
accept and forward the AbortSignal rather than letting the route own
cancellation logic.
- Around line 905-908: The endpoint builds options for probeSiteModels but omits
latencyThresholdMs, so update the handler to read latencyThresholdMs from body
(e.g., const latencyThresholdMs = typeof body?.latencyThresholdMs === 'number'
&& Number.isFinite(body.latencyThresholdMs) && body.latencyThresholdMs > 0 ?
body.latencyThresholdMs : undefined) and pass it into probeSiteModels(id, {
scope, modelName, latencyThresholdMs }); ensure you reference the existing
variables body, scope, modelName and the probeSiteModels call and validate the
value (reject NaN/negative) so only a valid number is forwarded.
- Around line 691-694: The current logic in the route uses loose coercion
(Boolean(anyBody.postRefreshProbeEnabled) and forcing postRefreshProbeScope to
'single' for any non-'all' value), which accepts invalid inputs; update the
handler that reads anyBody to explicitly validate inputs: for
postRefreshProbeEnabled accept only true/false booleans or the string
'true'/'false' (reuse the existing boolean normalizer used elsewhere), for
postRefreshProbeModel trim and accept only non-empty strings, and for
postRefreshProbeScope accept only the allowed values 'all' or 'single' (reject
or return a 400 for anything else) before assigning to
updates.postRefreshProbeEnabled, updates.postRefreshProbeModel and
updates.postRefreshProbeScope so invalid values are rejected rather than
silently coerced.
In `@src/server/services/backupService.ts`:
- Around line 761-763: The importAccountsSection() function currently ignores
the postRefreshProbeEnabled, postRefreshProbeModel, and postRefreshProbeScope
fields when inserting into schema.sites, causing loss of probe configuration
when restoring backups. To fix this, update importAccountsSection() to include
these postRefreshProbe* fields from the backup data into the site records during
insertion, ensuring the probe settings persist after import.
In `@src/server/services/modelService.ts`:
- Around line 482-506: The current code treats status === 'inconclusive' the
same as 'unsupported' and auto-disables models (updating
modelAvailability.available=false, inserting into siteDisabledModels, calling
setAccountRuntimeHealth and rebuildTokenRoutesFromAvailability). Change the
logic in the unsupportedModels block so only status === 'unsupported' (and
optionally 'latencyExceeded' if you choose) are auto-disabled and inserted into
schema.siteDisabledModels; for entries with status === 'inconclusive' instead
emit a progress/warning (onProgress?.({ type: 'inconclusive', modelName, reason
})) and optionally update schema.modelAvailability.available=false without
inserting into siteDisabledModels or marking account unhealthy via
setAccountRuntimeHealth; apply the same correction to
runPostRefreshProbeIfEnabled to avoid permanent disabling from
transient/inconclusive probe results.
In `@src/web/api.ts`:
- Around line 789-790: The probeSiteNow API wrapper (probeSiteNow) must accept
and forward a latencyThresholdMs value so one-shot probes can use the same
slow-model auto-disable behavior; update the options type to include
latencyThresholdMs?: number, ensure the call to
request(`/api/sites/${siteId}/probe-now`, ...) includes that property in the
JSON body (i.e., JSON.stringify(options || {}) will contain latencyThresholdMs
when provided), and adjust any callers to pass the threshold as needed.
In `@src/web/pages/Sites.tsx`:
- Around line 558-577: The probe latency threshold isn't persisted or used for
automatic post-refresh probes; update the component and service to persist and
consume a new field (postRefreshProbeLatencyThresholdMs): add that field to the
payload in handleSaveProbeSettings and to the setSites update, initialize/reset
the local probeLatencyThreshold state in openEdit from
site.postRefreshProbeLatencyThresholdMs, and extend runPostRefreshProbeIfEnabled
(modelService.ts) to read/accept postRefreshProbeLatencyThresholdMs from the
site record so automatic probes apply the saved threshold (or alternatively
clearly separate and label a manual-only threshold if you choose the other
approach).
- Around line 586-700: handleProbeNow clears probeLog but doesn't reset
probeCompleted, and the 'error' SSE branch doesn't refresh model lists or set
probeCompleted; update handleProbeNow to call setProbeCompleted(false) when
starting a new probe, and inside the SSE 'error' handling (the branch that
checks type === 'error' in handleProbeNow) perform the same model-list refresh
Promise.all(...) sequence that you use in the 'complete' branch (calling
api.getSiteAvailableModels / setAvailableModels and api.getSiteDisabledModels /
setDisabledModels) and ensure you call setProbeCompleted(true) in finally/after
the refresh (or set it on error path) so the UI no longer shows stale "探测完成"
state.
---
Nitpick comments:
In `@src/server/services/modelService.ts`:
- Around line 533-548: runPostRefreshProbeIfEnabled currently probes models
sequentially via probeRuntimeModel, which can hugely delay refresh when
postRefreshProbeScope='all' and many models hit
config.modelAvailabilityProbeTimeoutMs; extract the worker-pool +
latencyThresholdMs + unsupported/inconclusive handling into a shared helper
(e.g., runSiteProbeBatch(site, account, modelsToProbe, options)) reusing the
logic in probeSiteModels, then modify runPostRefreshProbeIfEnabled to call that
helper with the same concurrency limit (default 10) instead of the serial
for-loop and ensure the helper enforces latencyThresholdMs and marks
unsupported/inconclusive results consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c848a233-210d-4a4f-814c-57eeaa3d9872
📒 Files selected for processing (7)
src/server/db/schema.tssrc/server/routes/api/settings.tssrc/server/routes/api/sites.tssrc/server/services/backupService.tssrc/server/services/modelService.tssrc/web/api.tssrc/web/pages/Sites.tsx
| postRefreshProbeEnabled: integer('post_refresh_probe_enabled', { mode: 'boolean' }).default(false), | ||
| postRefreshProbeModel: text('post_refresh_probe_model').default(''), | ||
| postRefreshProbeScope: text('post_refresh_probe_scope').default('single'), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf 'All references to the new site probe columns:\n'
rg -n 'post_refresh_probe_(enabled|model|scope)' || true
printf '\nMigration/schema artifact files containing the new columns:\n'
fd -i '(migration|snapshot|journal|schema)' --type f -x sh -c '
file="$1"
if rg -q "post_refresh_probe_(enabled|model|scope)" "$file"; then
printf "\n-- %s --\n" "$file"
rg -n "post_refresh_probe_(enabled|model|scope)" "$file"
fi
' sh {}Repository: cita-777/metapi
Length of output: 1015
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find migration-related directories
echo "=== Migration and artifact directory structure ==="
fd -i '(migration|migrations|snapshot|artifact)' -type d | head -20
# Check package.json for drift-check script
echo -e "\n=== Checking for drift-check script ==="
grep -A2 -B2 'drift-check' package.json 2>/dev/null || echo "Not found in package.json"
# Look for drizzle migration files
echo -e "\n=== Looking for Drizzle-related files ==="
fd '\.sql$' --type f | head -20
# Check for schema snapshot or journal files
echo -e "\n=== Schema artifacts and snapshots ==="
fd -e '.snapshot' -o -e '.json' -o -e '.sql' --path '*schema*' --path '*artifact*' --type f | head -20Repository: cita-777/metapi
Length of output: 297
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# List all files in common Drizzle migration locations
echo "=== Checking drizzle/migrations directory ==="
if [ -d "drizzle/migrations" ]; then
ls -la drizzle/migrations/ | head -30
echo "---"
ls -1 drizzle/migrations/*.sql 2>/dev/null || echo "No SQL migration files found"
fi
# Check meta directory
echo -e "\n=== Checking drizzle/meta directory ==="
if [ -d "drizzle/meta" ]; then
ls -la drizzle/meta/ | head -30
echo "---"
find drizzle/meta -type f | head -20
fi
# Check db directory structure
echo -e "\n=== Checking src/server/db directory ==="
find src/server/db -type f -name '*.sql' -o -name '*migration*' -o -name '*snapshot*' 2>/dev/null | head -20Repository: cita-777/metapi
Length of output: 1700
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check the latest snapshot for the new columns
echo "=== Checking 0024_snapshot.json for new columns ==="
rg 'post_refresh_probe_(enabled|model|scope)' drizzle/meta/0024_snapshot.json || echo "NOT FOUND in 0024_snapshot.json"
# Check the journal
echo -e "\n=== Checking _journal.json ==="
rg 'post_refresh_probe_(enabled|model|scope)' drizzle/meta/_journal.json || echo "NOT FOUND in _journal.json"
# Check generated SQL files
echo -e "\n=== Checking generated SQL files ==="
for file in src/server/db/generated/*.sql; do
echo "--- $file ---"
if rg -q 'post_refresh_probe_(enabled|model|scope)' "$file"; then
echo "FOUND"
rg -n 'post_refresh_probe_(enabled|model|scope)' "$file" | head -5
else
echo "NOT FOUND"
fi
doneRepository: cita-777/metapi
Length of output: 489
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Run drift-check if available
echo "=== Running npm run repo:drift-check ==="
if grep -q 'drift-check' package.json; then
npm run repo:drift-check 2>&1 || echo "Drift check encountered issues"
else
echo "No drift-check script found in package.json"
fiRepository: cita-777/metapi
Length of output: 1354
Regenerate Drizzle migration artifacts and SQL patches for the three new schema columns.
The schema definition in src/server/db/schema.ts was updated with the three new columns, but the corresponding migration snapshot and generated SQL patches were not regenerated. The new columns are absent from:
drizzle/meta/0024_snapshot.json(latest snapshot)drizzle/meta/_journal.jsonsrc/server/db/generated/mysql.bootstrap.sql,mysql.upgrade.sql,postgres.bootstrap.sql,postgres.upgrade.sql
Database upgrades will fail without these artifacts. Regenerate them using Drizzle's migration tooling to ensure all three outputs stay synchronized (Drizzle schema + SQLite migration history + checked-in SQL patches).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/server/db/schema.ts` around lines 18 - 20, The three new schema columns
(postRefreshProbeEnabled, postRefreshProbeModel, postRefreshProbeScope) were
added to the Drizzle schema but the migration artifacts and SQL patches were not
regenerated; run Drizzle's migration tooling to generate a new migration that
includes these columns, update the snapshot and journal (e.g.,
0024_snapshot.json and _journal.json) and regenerate the checked-in SQL patches
(mysql.bootstrap.sql, mysql.upgrade.sql, postgres.bootstrap.sql,
postgres.upgrade.sql) so the schema, snapshot, and SQL outputs stay in sync.
| const anyBody = body as Record<string, unknown>; | ||
| if (anyBody.postRefreshProbeEnabled !== undefined) updates.postRefreshProbeEnabled = Boolean(anyBody.postRefreshProbeEnabled); | ||
| if (anyBody.postRefreshProbeModel !== undefined) updates.postRefreshProbeModel = String(anyBody.postRefreshProbeModel || '').trim(); | ||
| if (anyBody.postRefreshProbeScope !== undefined) updates.postRefreshProbeScope = anyBody.postRefreshProbeScope === 'all' ? 'all' : 'single'; |
There was a problem hiding this comment.
Validate probe settings instead of silently coercing them.
Boolean(anyBody.postRefreshProbeEnabled) turns "false" into true, and any invalid postRefreshProbeScope overwrites the setting as 'single'. Reject invalid values or reuse the existing boolean normalizer.
Suggested fix
const anyBody = body as Record<string, unknown>;
- if (anyBody.postRefreshProbeEnabled !== undefined) updates.postRefreshProbeEnabled = Boolean(anyBody.postRefreshProbeEnabled);
+ if (anyBody.postRefreshProbeEnabled !== undefined) {
+ const normalizedProbeEnabled = normalizePinnedFlag(anyBody.postRefreshProbeEnabled);
+ if (normalizedProbeEnabled === null) {
+ return reply.code(400).send({ error: 'Invalid postRefreshProbeEnabled value. Expected boolean.' });
+ }
+ updates.postRefreshProbeEnabled = normalizedProbeEnabled;
+ }
if (anyBody.postRefreshProbeModel !== undefined) updates.postRefreshProbeModel = String(anyBody.postRefreshProbeModel || '').trim();
- if (anyBody.postRefreshProbeScope !== undefined) updates.postRefreshProbeScope = anyBody.postRefreshProbeScope === 'all' ? 'all' : 'single';
+ if (anyBody.postRefreshProbeScope !== undefined) {
+ if (anyBody.postRefreshProbeScope !== 'single' && anyBody.postRefreshProbeScope !== 'all') {
+ return reply.code(400).send({ error: 'Invalid postRefreshProbeScope value. Expected single or all.' });
+ }
+ updates.postRefreshProbeScope = anyBody.postRefreshProbeScope;
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/server/routes/api/sites.ts` around lines 691 - 694, The current logic in
the route uses loose coercion (Boolean(anyBody.postRefreshProbeEnabled) and
forcing postRefreshProbeScope to 'single' for any non-'all' value), which
accepts invalid inputs; update the handler that reads anyBody to explicitly
validate inputs: for postRefreshProbeEnabled accept only true/false booleans or
the string 'true'/'false' (reuse the existing boolean normalizer used
elsewhere), for postRefreshProbeModel trim and accept only non-empty strings,
and for postRefreshProbeScope accept only the allowed values 'all' or 'single'
(reject or return a 400 for anything else) before assigning to
updates.postRefreshProbeEnabled, updates.postRefreshProbeModel and
updates.postRefreshProbeScope so invalid values are rejected rather than
silently coerced.
| const unsupportedModels = details.filter((d) => d.status === 'unsupported' || d.status === 'inconclusive').map((d) => d.modelName); | ||
| if (unsupportedModels.length > 0) { | ||
| const checkedAt = new Date().toISOString(); | ||
| for (const modelName of unsupportedModels) { | ||
| await db.update(schema.modelAvailability) | ||
| .set({ available: false, checkedAt }) | ||
| .where(and( | ||
| eq(schema.modelAvailability.accountId, account.id), | ||
| eq(schema.modelAvailability.modelName, modelName), | ||
| )) | ||
| .run(); | ||
| await db.insert(schema.siteDisabledModels) | ||
| .values({ siteId, modelName }) | ||
| .onConflictDoNothing() | ||
| .run(); | ||
| onProgress?.({ type: 'action', modelName, action: 'disabled' }); | ||
| } | ||
| const reason = unsupportedModels.length === 1 | ||
| ? `手动探测失败:模型 ${unsupportedModels[0]} 不可用` | ||
| : `手动探测失败:${unsupportedModels.length} 个模型不可用(${unsupportedModels.slice(0, 3).join('、')}${unsupportedModels.length > 3 ? '…' : ''})`; | ||
| await setAccountRuntimeHealth(account.id, { state: 'unhealthy', reason, source: 'manual-probe', checkedAt }); | ||
| rebuildTokenRoutesFromAvailability().catch((err) => { | ||
| console.warn('[probe-site-now] route rebuild failed', err); | ||
| }); | ||
| } |
There was a problem hiding this comment.
把 inconclusive 一律当作 unsupported 并自动加入站点禁用列表,风险较高。
根据 runtimeModelProbe.ts 的实现,inconclusive 会在以下场景返回(与“模型不可用”无关):
- 缺少凭据(
missing credential for probe); - 没有任何与模型匹配的上游端点(
no compatible probe endpoint candidates); - 端点发现/执行阶段抛异常(网络抖动、代理失败、DNS 等瞬时错误);
- 请求超时。
这些都是“探测不出结论”而非“模型确实不支持”,但当前逻辑会把它们一起写入 modelAvailability.available=false 并插入 siteDisabledModels,路由重建后该模型就不会再参与调度。最坏情况下,一次瞬时超时或代理异常就会永久性地禁用某个热门模型,且需要用户手动从站点禁用列表中移除才能恢复(下一轮刷新不会自动解禁,因为 siteDisabledModels 不会自动清理)。
建议:
- 仅对
unsupported和(可选地)latencyExceeded执行自动禁用;对inconclusive仅发出告警/进度事件、或写入modelAvailability.available=false但不写入siteDisabledModels,让下一次探测可以自愈; - 若保留当前行为,至少记录每个模型的触发原因,并在 UI 明确区分“不支持”与“无法确定”,以便用户判断是否手动恢复。
runPostRefreshProbeIfEnabled Line 551 也有同样问题,建议一并修复。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/server/services/modelService.ts` around lines 482 - 506, The current code
treats status === 'inconclusive' the same as 'unsupported' and auto-disables
models (updating modelAvailability.available=false, inserting into
siteDisabledModels, calling setAccountRuntimeHealth and
rebuildTokenRoutesFromAvailability). Change the logic in the unsupportedModels
block so only status === 'unsupported' (and optionally 'latencyExceeded' if you
choose) are auto-disabled and inserted into schema.siteDisabledModels; for
entries with status === 'inconclusive' instead emit a progress/warning
(onProgress?.({ type: 'inconclusive', modelName, reason })) and optionally
update schema.modelAvailability.available=false without inserting into
siteDisabledModels or marking account unhealthy via setAccountRuntimeHealth;
apply the same correction to runPostRefreshProbeIfEnabled to avoid permanent
disabling from transient/inconclusive probe results.
| const handleSaveProbeSettings = async () => { | ||
| if (!editor || editor.mode !== 'edit') return; | ||
| setProbeSaving(true); | ||
| try { | ||
| await api.updateSite(editor.editingSiteId, { | ||
| postRefreshProbeEnabled: probeEnabled, | ||
| postRefreshProbeModel: probeModel.trim(), | ||
| postRefreshProbeScope: probeScope, | ||
| }); | ||
| setSites((prev) => prev.map((s) => s.id === editor.editingSiteId | ||
| ? { ...s, postRefreshProbeEnabled: probeEnabled, postRefreshProbeModel: probeModel.trim(), postRefreshProbeScope: probeScope } | ||
| : s, | ||
| )); | ||
| toast.success('刷新后探测设置已保存'); | ||
| } catch (e: any) { | ||
| toast.error(e.message || '保存失败'); | ||
| } finally { | ||
| setProbeSaving(false); | ||
| } | ||
| }; |
There was a problem hiding this comment.
延迟阈值未持久化,也不会作用于“刷新后自动探测”。
probeLatencyThreshold 只存在于组件本地 state 中,且:
handleSaveProbeSettings保存时没有把阈值写回站点(payload 只有postRefreshProbeEnabled / Model / Scope三个字段)。openEdit(Line 492-498) 没有从site重置probeLatencyThreshold,跨站点编辑时会把上一次设置的阈值泄露到下一个站点的 UI 上。- 更重要的是,
runPostRefreshProbeIfEnabled(modelService.tsLine 511-591)根本没有读取/接收延迟阈值,所以该面板的核心卖点“刷新后自动探测并按延迟自动禁用”在当前实现下完全不生效——阈值只对手动点“立即探测”的这一次会话有效。
面板文案明确写的是“ms(响应超过该时间则自动禁用,0=不限)”,这会让用户误以为保存后会在每次刷新后触发。建议:
- 要么在 schema 中新增
postRefreshProbeLatencyThresholdMs字段,将其纳入handleSaveProbeSettings的保存、openEdit的初始化、以及runPostRefreshProbeIfEnabled的探测逻辑; - 要么明确区分两种阈值(UI 上标注“仅用于手动探测”),并为自动探测场景提供独立的持久化配置。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/web/pages/Sites.tsx` around lines 558 - 577, The probe latency threshold
isn't persisted or used for automatic post-refresh probes; update the component
and service to persist and consume a new field
(postRefreshProbeLatencyThresholdMs): add that field to the payload in
handleSaveProbeSettings and to the setSites update, initialize/reset the local
probeLatencyThreshold state in openEdit from
site.postRefreshProbeLatencyThresholdMs, and extend runPostRefreshProbeIfEnabled
(modelService.ts) to read/accept postRefreshProbeLatencyThresholdMs from the
site record so automatic probes apply the saved threshold (or alternatively
clearly separate and label a manual-only threshold if you choose the other
approach).
… probe columns - Add drizzle/0025_site_post_refresh_probe.sql (ALTER TABLE sites ADD 3 columns) - Update drizzle/meta/_journal.json (idx=24 entry) - Regenerate mysql/postgres bootstrap+upgrade SQL and schemaContract.json
- Fix SSE abort: propagate AbortSignal to worker pool; close connection
immediately on client disconnect without sending 'complete' event
- Fix Boolean coercion: use strict === true/=== 1 check instead of Boolean()
to avoid Boolean('false') === true foot-gun
- Fix probe-now: parse and forward latencyThresholdMs from request body
- Fix api.ts probeSiteNow: add latencyThresholdMs to options type
- Fix backup import: restore postRefreshProbe* fields when inserting sites
so that backup/restore round-trips preserve per-site probe config
- Fix latency threshold persistence: add postRefreshProbeLatencyThresholdMs
column (migration 0026), save/load it in site editor, pass to
runPostRefreshProbeIfEnabled so auto-probe also respects the threshold
- Fix state management: setProbeCompleted(false) at probe start; error
branch also refreshes model lists and sets probeCompleted to show results
- Regenerate schema artifacts (mysql/postgres bootstrap+upgrade, contract)
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/web/pages/Sites.tsx (1)
732-746:⚠️ Potential issue | 🟡 MinorPersist the latency threshold from the primary Save button too.
The footer save persists the other probe settings but omits
postRefreshProbeLatencyThresholdMs; changing only the threshold and clicking “保存修改” silently keeps the old saved threshold unless the separate probe-settings button is used.Suggested fix
globalWeight: Number(parsedGlobalWeight.toFixed(3)), postRefreshProbeEnabled: probeEnabled, postRefreshProbeModel: probeModel.trim(), postRefreshProbeScope: probeScope, + postRefreshProbeLatencyThresholdMs: Math.max(0, parseInt(probeLatencyThreshold, 10) || 0), };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/web/pages/Sites.tsx` around lines 732 - 746, The payload built in the Save handler omits the postRefreshProbeLatencyThresholdMs field so changes to the threshold aren't persisted; add a postRefreshProbeLatencyThresholdMs property to the payload (similar to globalWeight and postRefreshProbeModel) and source it from the form state or the parsed latency variable (e.g., form.postRefreshProbeLatencyThresholdMs or parsedPostRefreshProbeLatency), converting to the appropriate numeric type before assigning so the value is included when payload is sent.
🧹 Nitpick comments (1)
src/web/pages/Sites.tsx (1)
67-70: Add the latency threshold toSiteRowinstead of casting.Line 495 reads a persisted field through
as any, so API/schema drift aroundpostRefreshProbeLatencyThresholdMswill not be caught by TypeScript.Suggested fix
postRefreshProbeEnabled?: boolean; postRefreshProbeModel?: string | null; postRefreshProbeScope?: string | null; + postRefreshProbeLatencyThresholdMs?: number | null;- setProbeLatencyThreshold(String((site as any).postRefreshProbeLatencyThresholdMs ?? 0)); + setProbeLatencyThreshold(String(site.postRefreshProbeLatencyThresholdMs ?? 0));Also applies to: 492-495
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/web/pages/Sites.tsx` around lines 67 - 70, The code is reading postRefreshProbeLatencyThresholdMs by casting a persisted site record to any, which hides schema/API drift; add postRefreshProbeLatencyThresholdMs?: number | null to the SiteRow type/interface so TypeScript knows the field exists, then remove the as any cast and access site.postRefreshProbeLatencyThresholdMs directly (update usages where SiteRow is constructed or read, e.g., the code around where postRefreshProbeEnabled/postRefreshProbeModel/postRefreshProbeScope are accessed).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/web/api.ts`:
- Around line 789-790: probeSiteNow currently calls request(...) and inherits
the default 30s timeout, which can abort long "scope: 'all'" one-shot probes;
update probeSiteNow to detect when options?.scope === 'all' and pass a longer
timeout to request (e.g., a probe-specific timeout value or constant) via the
request options so the client will wait long enough for multiple upstream probes
to complete; refer to the probeSiteNow arrow function and the request(...) call
when making this change.
In `@src/web/pages/Sites.tsx`:
- Around line 330-335: The current useEffect tied to [editor] only aborts
probeAbortRef when editor becomes null; add a separate useEffect with an empty
dependency array that returns a cleanup function to abort any active probe
stream on component unmount. In that cleanup call probeAbortRef.current?.abort()
and set probeAbortRef.current = null (same actions as in the existing effect),
so active SSE fetches are aborted if the user navigates away while the editor is
still open.
---
Duplicate comments:
In `@src/web/pages/Sites.tsx`:
- Around line 732-746: The payload built in the Save handler omits the
postRefreshProbeLatencyThresholdMs field so changes to the threshold aren't
persisted; add a postRefreshProbeLatencyThresholdMs property to the payload
(similar to globalWeight and postRefreshProbeModel) and source it from the form
state or the parsed latency variable (e.g.,
form.postRefreshProbeLatencyThresholdMs or parsedPostRefreshProbeLatency),
converting to the appropriate numeric type before assigning so the value is
included when payload is sent.
---
Nitpick comments:
In `@src/web/pages/Sites.tsx`:
- Around line 67-70: The code is reading postRefreshProbeLatencyThresholdMs by
casting a persisted site record to any, which hides schema/API drift; add
postRefreshProbeLatencyThresholdMs?: number | null to the SiteRow type/interface
so TypeScript knows the field exists, then remove the as any cast and access
site.postRefreshProbeLatencyThresholdMs directly (update usages where SiteRow is
constructed or read, e.g., the code around where
postRefreshProbeEnabled/postRefreshProbeModel/postRefreshProbeScope are
accessed).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 73b2cf0b-6b1c-4076-8b45-6aa43d155468
⛔ Files ignored due to path filters (5)
src/server/db/generated/mysql.bootstrap.sqlis excluded by!**/generated/**src/server/db/generated/mysql.upgrade.sqlis excluded by!**/generated/**src/server/db/generated/postgres.bootstrap.sqlis excluded by!**/generated/**src/server/db/generated/postgres.upgrade.sqlis excluded by!**/generated/**src/server/db/generated/schemaContract.jsonis excluded by!**/generated/**
📒 Files selected for processing (8)
drizzle/0026_site_probe_latency_threshold.sqldrizzle/meta/_journal.jsonsrc/server/db/schema.tssrc/server/routes/api/sites.tssrc/server/services/backupService.tssrc/server/services/modelService.tssrc/web/api.tssrc/web/pages/Sites.tsx
✅ Files skipped from review due to trivial changes (3)
- drizzle/0026_site_probe_latency_threshold.sql
- drizzle/meta/_journal.json
- src/server/services/modelService.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/server/routes/api/sites.ts
- src/server/services/backupService.ts
…e SQL Better-SQLite3 requires --> statement-breakpoint between each statement in a migration file; without it all statements are passed as a single string to Database.prepare() which throws RangeError on multi-statement input.
MySQL information_schema.COLUMNS.COLUMN_DEFAULT stores DEFAULT '' as an empty string, not NULL. normalizeDefaultValueForColumn was treating any empty rawDefaultValue as 'no default' and returning null, causing the MySQL schema parity test to see defaultValue: null for post_refresh_probe_model while schemaContract.json (generated from TypeScript schema) had "''". Fix: when rawDefaultValue is empty string, return "''" for text/json columns instead of null, so the round-trip through MySQL introspection matches the contract.
Refresh the PR onto current main and close the remaining probe-setting review gaps: the primary site save now persists the latency threshold, the site row type carries that field without an any-cast, active probe streams are aborted on unmount, and all-model one-shot probes get the longer timeout they need. Constraint: PR cita-777#510 must preserve the per-site model probe feature while merging cleanly into current main. Rejected: Leave threshold persistence behind the separate probe-settings button | the primary save path would still silently drop user changes. Rejected: Keep the SiteRow any-cast | it hides API/schema drift for the new persisted field. Confidence: high Scope-risk: narrow Directive: Keep probe setting fields represented in the typed site save payload and covered by focused web API/editor tests. Tested: npm run typecheck; npm run repo:drift-check; npx vitest run --root . src/server/db/schemaContract.test.ts src/server/db/schemaArtifactGenerator.test.ts src/server/db/schemaIntrospection.test.ts src/server/db/schemaParity.test.ts src/web/api.test.ts src/web/pages/helpers/sitesEditor.test.ts src/web/pages/sites.disabled-models-save.test.tsx src/web/pages/sites.edit-scroll.test.tsx Not-tested: Full npm test.
PR
Title
feat(sites): 站点级模型探测 —— 实时日志、延迟阈值与自动禁用
Description
问题 / Problem
部分站点api中包含大量失效模型,导致路由庞大,拉取models时间长,因此增加模型探测和自动禁用。探测只能在模型刷新时被动触发,缺少手动立即探测的入口;探测过程完全黑盒,结果只有成功/失败,无法看到具体原因和实时进度。
方案 / Solution
将探测功能完整迁移到站点编辑弹窗,实现按站点独立配置并新增手动探测能力:

后端:
POST /api/sites/:id/probe-now:一次性 JSON 探测接口GET /api/sites/:id/probe-stream:SSE 流式探测接口,每个模型结果实时推送scope(single/all)、modelName、latencyThresholdMs查询参数options.concurrency覆盖)supported结果若响应时间超过阈值,覆盖为unsupported并自动禁用inconclusive(网络异常、超时等)统一降级为unsupported,触发自动禁用reason字段),含超时、无 Token、无权限、模型不存在等config.ts和settings.ts移除三个探测配置字段前端:
fetch + ReadableStream消费 SSE(规避EventSource不支持自定义 header 的问题)AbortController.abort()终止请求,日志追加「已手动停止」,已收到结果保留availableModels/disabledModels,并在日志下方展示模型状态列表(绿色=可用,红色=已禁用)测试 / Tests
tsconfig.web.json+tsconfig.server.json零错误)不变 / What Does NOT Change
modelAvailabilityProbeService)不受影响影响文件 / Changed Files
src/server/db/schema.ts— sites 表新增postRefreshProbeEnabled/Model/Scope字段src/server/routes/api/sites.ts— 新增probe-now和probe-stream两个端点;PUT 处理新字段src/server/routes/api/settings.ts— 移除探测相关配置项src/server/services/modelService.ts— 新增probeSiteModels()导出函数(并发 worker-pool、延迟阈值、进度回调)src/server/services/backupService.ts— 补齐站点对象字面量的新字段默认值src/web/api.ts— 新增probeSiteNow(),移除全局探测配置字段src/web/pages/Sites.tsx— 站点编辑弹窗新增探测配置 UI、实时日志面板、探测后模型状态列表Summary by CodeRabbit
New Features
UI
API
Model Refresh
Backup/Import
Database