Trae/solo agent 1 wl ux n by chenzhenguo · Pull Request #468 · cita-777/metapi

chenzhenguo · 2026-04-10T02:30:04Z

Summary by CodeRabbit

Release Notes

New Features
- Backup import now queues as a background task, returning immediately with status tracking instead of waiting for completion.
- Added per-site model availability probe frequency limiting (5-minute window) and improved concurrent probe handling.
Bug Fixes
- Improved backup import error handling to collect and report all issues instead of failing immediately.
- Enhanced support for different backup format versions with better data compatibility.
Performance Improvements
- Optimized database connection pooling and timeout settings across MySQL and PostgreSQL.
- Batch operations now process large imports efficiently without hitting operation timeouts.
Documentation
- Added comprehensive API and operational documentation including timeout configuration best practices.
- Included test plans for long-running database operations.

- Add requestTimeout: 300_000 (5min) to handle large backup imports - Add keepAliveTimeout: 65_000 for connection management - Fixes 30s timeout issue on /api/settings/backup/import endpoint Co-authored-by: monkeycode-ai <monkeycode-ai@chaitin.com>

- 新增 batchQueryAll 分页查询工具函数，每次最多加载10条记录 - 新增 batchInsertHelper 分批插入工具函数，每批最多插入10条记录 - 重构 collectCurrentRuntimeStateSnapshot() 使用分页加载 proxyLogs 和 checkinLogs - 重构 importAccountsSection() 将所有表的插入改为批量执行 - 解决大数据量导入时内存暴涨导致的502问题 Co-authored-by: monkeycode-ai <monkeycode-ai@chaitin.com>

Co-authored-by: traeagent <traeagent@users.noreply.github.com>

feat: 优化超时处理并提升备份服务性能

Co-authored-by: traeagent <traeagent@users.noreply.github.com>

coderabbitai · 2026-04-10T02:30:19Z

📝 Walkthrough

Walkthrough

Comprehensive optimization initiative addressing timeout configuration across all system layers, implementing database batch operation utilities for large-scale data handling, refactoring backup import logic to improve error handling and data integrity, and enhancing model availability probe with site-level concurrency control and dynamic prompt selection. Includes extensive specifications, test plans, and new test infrastructure.

Changes

Cohort / File(s)	Summary
Specification & Implementation Planning `.monkeycode/specs/db-batch-optimization/`, `.trae/specs/database-timeout-optimization/`, `.trae/specs/import-issue-analysis/`, `.trae/specs/model-probe-optimization/`, `.trae/specs/timeout-optimization/*`	Added comprehensive spec documents (PRD, implementation tasks, verification checklists) for database batch optimization, timeout configuration across database/gateway/server layers, import feature issue resolution, and model probe feature enhancements.
Documentation & Test Plans `CODE_WIKI.md`, `LONG_RUNNING_OPERATIONS_TEST_PLAN.md`, `docs/configuration.md`, `docs/operations.md`	Added comprehensive system architecture guide (CODE_WIKI.md with 622 lines covering Fastify/React stack, data flow, model probing, routing engine), long-running operation test strategy, timeout configuration best practices, and operational guidance for timeout-related monitoring.
Database Configuration & Connection Management `src/server/db/index.ts`	Extended MySQL connection pool with `waitForConnections`, `connectionLimit: 10`, `queueLimit: 0`, and `connectTimeout: 300s`; upgraded PostgreSQL pool configuration to include `max: 10`, `idleTimeoutMillis: 300s`, and `connectionTimeoutMillis: 300s`.
Server Configuration `src/server/config.ts`	Added model availability probe configuration fields (`modelAvailabilityProbeTpm`, `modelAvailabilityProbeMaxPerFiveMinutes`, `modelAvailabilityProbePrompts`) and Fastify server timeout options (`requestTimeout: 300s`, `keepAliveTimeout: 65s`).
Backup Service Core Refactoring `src/server/services/backupService.ts`	Implemented `batchQueryAll()` and `batchInsertHelper()` utilities for batch-based data operations. Refactored `importAccountsSection` to perform smart upserts instead of full table replacement, with per-table cleanup of obsolete records. Updated `importPreferencesSection` to return detailed stats. Changed `importBackup` to collect errors into array rather than fail immediately. Extended `BackupImportResult.summary` with operation counters (`newSites`, `updatedSites`, etc.) and added optional `errors` field. Increased WebDAV fetch timeout to 300s. Modified section coercion to treat empty arrays as valid and allow object-format preferences.
Backup Import API Route `src/server/routes/api/settings.ts`	Converted `/api/settings/backup/import` from synchronous request handling to asynchronous background task execution, returning HTTP 202 with queued status and `jobId` instead of immediate success/error responses.
Model Availability Probe Service `src/server/services/modelAvailabilityProbeService.ts`	Added site-level probe frequency limiting using 5-minute sliding windows (`checkSiteProbeFrequency(siteId)`). Converted to site-aware lease acquisition (`tryAcquireProbeAccountLease(accountId, siteId)`) with separate site-level tracking. Refactored probe execution to group accounts by site with parallel site processing and sequential per-site model probing, enforcing ~60ms delay between probes for TPM=1 behavior.
Runtime Model Probe `src/server/services/runtimeModelProbe.ts`	Added dynamic per-site prompt selection mechanism via `DEFAULT_PROBE_PROMPTS`, `PROBE_PROMPTS` (from config), `sitePromptUsage` tracking, and `getRandomPrompt(siteId)`. Updated `buildProbeBody` to use selected prompt and receive `siteId` parameter. Strengthened timer `unref()` handling with runtime type checks for compatibility across different timer implementations.
Web API Client `src/web/api.ts`	Increased request timeouts for import operations: OAuth import to 300s, backup import to 300s, and WebDAV backup import from 60s to 300s.
Long-Running Operation Tests `src/server/db/longRunningOperations.test.ts`, `src/server/services/backupService.longRunning.test.ts`	Added Vitest tests for long-running scenarios: database operations with 50 batches of 1000-row inserts, large backup import with 100 sites/1000 accounts/5000 tokens, sequential imports with export validation, and post-operation connectivity verification. Tests configured with extended timeouts (10–15 minute range).
Manual Testing Scripts `test-long-running.js`, `test-long-running-backup.js`	Added Node.js test scripts for manual validation of long-running database and backup operations with real-world data scales, progress logging, and connection stability checks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Add WebDAV backup sync and armv7 Docker builds #220 — Modifies backup-related code in backupService.ts including WebDAV scheduler integration and scheduler reload hooks, directly overlapping with this PR's import refactoring.
fix: refine config backup import semantics #232 — Modifies backup import/export logic for tables like siteDisabledModels and modelAvailability, intersecting with this PR's batch insert and import state refactoring.
fix: harden codex runtime routing and oauth recovery #237 — Modifies buildConfig in src/server/config.ts to add runtime configuration fields, similar to this PR's probe and timeout config additions.

Suggested labels

area: db, area: server, area: web, area: scripts, area: docs, size: XXL

Poem

🐰 Batches and timeouts, oh what a sight,
Long operations now run through the night,
Smart upserts dance, no more full deletes,
Probes pick their prompts like selecting their treats,
Five hundred seconds—no rushing these ops! 🕐✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 13.89% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The PR title 'Trae/solo agent 1 wl ux n' is vague and does not clearly convey the main purpose of the changeset, which involves database optimization, timeout configuration, import issue fixes, and model probe improvements.	Replace the vague title with a clear, descriptive summary of the main changes, such as 'Add database batch optimization, timeout configuration, and import/probe improvements' or break into multiple PRs if these represent distinct features.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

⚔️ Resolve merge conflicts

Resolve merge conflict in branch trae/solo-agent-1WLUxN

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

src/server/services/backupService.ts (3)
1651-1711: ⚠️ Potential issue | 🔴 Critical

The smart sync still keys child rows by imported numeric IDs.

Sites are matched by identity key, but every child table here still persists the backup's raw siteId / accountId / tokenId / routeId and decides update-vs-insert from raw IDs. If the backup was created on another database, the same numeric IDs can refer to different entities or not exist at all. That can overwrite unrelated rows, point children at the wrong parents, or trip FK/unique constraints as soon as IDs diverge. This path needs imported-identity → persisted-ID remapping before inserting or updating children.

Also applies to: 1714-1882
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/services/backupService.ts` around lines 1651 - 1711, The code
updates/inserts sites by identity (buildSiteIdentityKey) but then continues to
persist child rows using the backup's raw numeric IDs (e.g.,
section.siteApiEndpoints.map(... siteId: row.siteId ...)), which can point to
wrong parents; fix by building an importedId→persistedId map as you process each
site (when checking existingSite and after tx.insert or tx.update of
schema.sites capture the actual persisted id), store mapping keyed by the backup
site.id, then before calling batchInsertHelper (and for other child blocks
dealing with accountId/tokenId/routeId) replace row.siteId (and other parent
ids) with the mapped persisted id (or resolve to a newly inserted id) and use
that mapped id for updates/inserts and FK checks so children reference the
correct local records.
1476-1498: ⚠️ Potential issue | 🟠 Major

This coercion now swallows legacy account backups before the converter runs.

detectAccountsSection() calls coerceAccountsSection(data.accounts) before buildAllApiHubV2AccountsSection(). After this change, any nested object with a non-empty accounts array is treated as a native AccountsBackupSection, even when those rows are still legacy site_url / account_info shapes. That bypasses the legacy converter and feeds malformed rows into importAccountsSection().
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/services/backupService.ts` around lines 1476 - 1498, The coercion
in coerceAccountsSection() is too permissive and is treating legacy nested
objects with a non-empty accounts array as a v2 AccountsBackupSection, which
bypasses the legacy converter; update the coercion logic (used by
detectAccountsSection(), coerceAccountsSection(), and before
buildAllApiHubV2AccountsSection()/importAccountsSection()) to perform a strict
structural check for v2 account rows (e.g., require v2-specific fields) and
explicitly reject or return undefined for objects that contain legacy-only keys
like site_url or account_info so the legacy converter
(buildAllApiHubV2AccountsSection()) still runs instead of swallowing legacy
data.
1931-2116: ⚠️ Potential issue | 🔴 Critical

These runtime tables are appended back into the live DB instead of being replaced or merged.

runtimeState is captured from the current database, then modelAvailability, tokenModelAvailability, siteAnnouncements, downstreamApiKeys, proxyLogs, and checkinLogs are inserted again without clearing or upserting existing rows. On any import into a populated DB, the log inserts reuse existing ids and will hit PK conflicts; the other tables either duplicate stale state or leave rows that the backup removed. The long-running tests hide this because they start from an empty database.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/services/backupService.ts` around lines 1931 - 2116, The runtime
tables (runtimeState.modelAvailability, tokenAvailability, siteAnnouncements,
downstreamApiKeys, proxyLogs, checkinLogs) are being blindly inserted back into
the live DB causing duplicates and PK conflicts; update the backup import to
either (a) remove existing runtime rows for the affected scope then insert, or
(b) perform upserts/ON CONFLICT merges so existing rows are replaced/merged.
Concretely, in the transaction blocks that write to schema.modelAvailability,
schema.tokenModelAvailability, schema.siteAnnouncements,
schema.downstreamApiKeys, schema.proxyLogs and schema.checkinLogs, replace the
plain batchInsertHelper/tx.insert calls with logic that either deletes matching
rows first (using importedIndexes lookups to limit scope) or uses an upsert API
(onConflict/merge) to handle primary-key conflicts and to ensure manual entries
(isManual flag) are preserved/overwritten as intended; also ensure proxyLogs
preserves unique id handling (do not reuse ids into an occupied DB) by
generating new ids or using upsert semantics.

🧹 Nitpick comments (2)

src/server/services/runtimeModelProbe.ts (1)
81-105: Prune per-site prompt state.

sitePromptUsage is process-global and never evicted, so a long-lived worker keeps one Set per site forever. A small TTL/LRU cache, or a rolling index scoped to the current probe window, would avoid unbounded growth.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/services/runtimeModelProbe.ts` around lines 81 - 105, The
sitePromptUsage Map in this file (used by getRandomPrompt and populated with
selected prompts from PROBE_PROMPTS) grows unbounded because entries are never
evicted; replace it with a bounded cache or TTL/LRU (or implement a rolling
index per-site keyed by siteId) to prune old site entries: update references to
sitePromptUsage and getRandomPrompt to use the new cache API (evict
least-recently-used or expire entries after the probe window) and ensure you
still track per-site used prompts set semantics and reset behavior when all
prompts are exhausted.
src/server/routes/api/settings.ts (1)
1839-1861: Extract the backup-import job body out of the route.

Lines 1852-1859 make this handler own the queued workflow plus its side effects (applyImportedSettingToRuntime and reloadBackupWebdavScheduler). Please move that logic into a service/helper and keep the route limited to request parsing and HTTP response mapping.

As per coding guidelines, "Route files in src/server/routes/** are adapters, not owners. They may register Fastify endpoints, parse request context, and delegate but must not own protocol conversion, retry policy, stream lifecycle, billing, or persistence."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/routes/api/settings.ts` around lines 1839 - 1861, The route
currently embeds the background job body and side effects (calling importBackup,
applyImportedSettingToRuntime, and reloadBackupWebdavScheduler) inside the
startBackgroundTask callback; extract that logic into a dedicated service
function (e.g., performBackupImport or createBackupImportJob) that accepts the
parsed backupData and returns the import result, moves the loop that calls
applyImportedSettingToRuntime and the conditional reloadBackupWebdavScheduler
call into that service, and export/import that helper into the route; then
change the startBackgroundTask call in the route to pass the service function
(or a thin wrapper that just invokes it) so the route only handles request
parsing and delegates the queued work to the new service while preserving the
task metadata
(type/title/dedupeKey/notifyOnFailure/successMessage/failureMessage) and return
value behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CODE_WIKI.md`:
- Around line 35-46: Replace all file:///workspace URI links in CODE_WIKI.md
with repository-relative paths so they work on GitHub and in published docs;
update the entries referencing index.ts, config.ts, db/, middleware/, routes/,
services/, transformers/, proxy-core/ and the src/web files (main.tsx, App.tsx)
to use ./src/server/index.ts, ./src/server/config.ts, ./src/server/... and
./src/web/main.tsx, ./src/web/App.tsx (or the appropriate relative path)
everywhere instead of file:///workspace/...; ensure markdown link syntax remains
valid and run a quick search/replace for the file:///workspace prefix to catch
all occurrences.

In `@docs/configuration.md`:
- Line 388: The Docker image reference is inconsistent: "image:
metapi/metapi:latest" appears here while CODE_WIKI.md uses
"1467078763/metapi:latest"; update the occurrence of "image:
metapi/metapi:latest" to the canonical "1467078763/metapi:latest" (or vice versa
if your canonical choice is the other string) so both documents use the exact
same image tag; ensure you change the exact token "image: metapi/metapi:latest"
to match the canonical image reference and run a quick grep to verify no other
mismatches remain.

In `@src/server/db/index.ts`:
- Around line 65-68: The DB pool is using long acquisition timeouts and an
unbounded queue (connectTimeout, connectionTimeoutMillis set to 300000 and
queueLimit: 0), which lets requests pile up during DB pressure; tighten those
acquisition timeouts, set a bounded queueLimit (e.g., non-zero small value), and
add query-level timeouts instead of relying on acquisition timeouts—either by
setting Postgres session statement_timeout or MySQL per-connection query timeout
and/or enforcing timeouts at the call sites where queries are executed.

In `@src/server/db/longRunningOperations.test.ts`:
- Around line 20-35: The test's simulated workload inside
simulateLongRunningOperation (the for-loops inserting into test_long_running and
the intermittent setTimeout delays) doesn't respect the requested seconds
argument, so change simulateLongRunningOperation to measure a real deadline:
record the start time, perform the batched inserts as currently implemented but
after the loop (or between batches) compute remaining = seconds - elapsed and
await a single sleep for Math.max(0, remaining * 1000) before returning; ensure
any per-batch short sleeps remain but do not replace the final wait so the
function reliably runs for at least the requested duration.
- Around line 119-144: These tests call simulateLongRunningOperation (and then
db.execute) for 10–60+ seconds but lack Vitest timeouts, so add explicit
timeouts: either call vi.setTimeout(70000) at the top of this test file/suite or
set per-test timeouts for the three tests (the "should handle operations lasting
more than 60 seconds without timeout", "should handle multiple long-running
operations sequentially", and "should maintain database connection after long
operations") so each allows at least 70_000 ms; ensure the timeout change
surrounds tests that call simulateLongRunningOperation and db.execute so they
won't be aborted.

In `@src/server/routes/api/settings.ts`:
- Around line 1839-1869: The current use of a static dedupeKey ('backup-import')
in startBackgroundTask causes different uploads to reuse an in-flight import
(startBackgroundTask, dedupeKey 'backup-import'), losing the new backupData;
change the logic to either compute a fingerprint of backupData and include it in
the dedupeKey (e.g., 'backup-import:'+fingerprint) or detect when
startBackgroundTask returns reused=true and compare the payload fingerprint to
the running task: if they differ, reject the request with HTTP 409 via
reply.code(409). Ensure the response still returns jobId/task.id when
legitimately reused by identical payloads and only proceed to call
importBackup/reloadBackupWebdavScheduler when a new task is created.

In `@src/server/services/backupService.longRunning.test.ts`:
- Around line 1-2: The test imports join from the wrong module; replace the
import of join from 'node:fs' with an import from 'node:path' so the symbol used
in this file (join) is imported from the correct module (change the import
statement that currently destructures mkdtempSync, rmSync, join to instead
import mkdtempSync and rmSync from 'node:fs' and import join from 'node:path' or
update the combined import to reference 'node:path' for join).

In `@src/server/services/backupService.ts`:
- Around line 1514-1540: coercePreferencesSection currently returns { settings:
[] } for any record even when settings is missing; update it so it only returns
a PreferencesBackupSection when a valid settings payload exists: (1) check that
input.settings is present and is either an array or an object — if not, return
null; (2) keep the existing array/object parsing and filtering (using
EXCLUDED_SETTING_KEYS and key trimming) but after building settings, if
settings.length === 0 return null; otherwise return { settings }. Reference the
function coercePreferencesSection and the variables settingsRaw,
EXCLUDED_SETTING_KEYS to locate where to add the presence checks and the final
empty-array guard.
- Around line 2121-2124: The code is overwriting the per-import "new" counters
(stats.newSites, stats.newAccounts, stats.newTokens) with section totals;
instead of assigning to those new* fields, store the section totals in separate
fields (e.g., stats.totalSites, stats.totalAccounts, stats.totalTokens) or
remove the overwrite entirely so the incremented new* counters remain unchanged;
update the lines that currently set stats.newSites = section.sites.length,
stats.newAccounts = section.accounts.length, and stats.newTokens =
section.accountTokens.length to use the new total* field names or delete them as
appropriate.
- Around line 24-29: The keyset pagination loop in batchQueryAll()/the while
(hasMore) block builds queries using gt(table.id, lastId) but does not include
deterministic ordering; add an explicit orderBy(asc(table.id)) to the query
constructed in both branches (the initial dbInstance.select().from(table) and
the dbInstance.select().from(table).where(gt(table.id, lastId))) before applying
.limit(batchSize). This ensures rows are consistently ordered by table.id so the
lastId cursor does not skip or duplicate rows during pagination.

In `@src/server/services/modelAvailabilityProbeService.ts`:
- Around line 538-541: The test reset helper
__resetModelAvailabilityProbeExecutionStateForTests currently clears
probeAccountLeases and probeSiteLeases but leaves siteProbeRecords intact,
causing per-site throttle state to leak between tests; update that function to
also clear the siteProbeRecords data structure (reset its entries or call its
clear method) so tests start with a fresh frequency limiter state alongside
probeAccountLeases and probeSiteLeases.
- Around line 345-353: The site quota is being consumed by
checkSiteProbeFrequency(siteId) even when the account lease isn't acquired;
change the flow so the lease is acquired first
(tryAcquireProbeAccountLease(accountId, siteId)) and only then call
checkSiteProbeFrequency(siteId) so that siteProbeRecords mutation happens only
for probes that actually proceed, or alternatively modify
checkSiteProbeFrequency to be non-mutating (or to return a decision without
updating siteProbeRecords) and perform the mutation only after a successful
tryAcquireProbeAccountLease; update the logic around
buildSkippedProbeAccountResult to reflect the correct skip reason when the lease
cannot be acquired.

In `@test-long-running-backup.js`:
- Around line 118-121: The test harness imports TypeScript sources from plain
JavaScript which will fail under Node — either convert the harness to TypeScript
or point imports at compiled JS: rename the file to test-long-running-backup.ts
and run via tsx so imports of migrate, db/index, backupService (importBackup,
exportBackup) and backgroundTaskService (startBackgroundTask) work, or keep the
.js harness and change those import paths to the built JS artifacts (e.g., dist/
versions) so db, schema, importBackup, exportBackup and startBackgroundTask
resolve at runtime.

In `@test-long-running.js`:
- Around line 36-54: The current for-loop based insertion uses only intermittent
1s sleeps and finishes before the requested runtime; change to a deadline-driven
loop using startTime (or compute endTime = startTime + seconds*1000) so inserts
continue until Date.now() >= endTime, or after the existing batch loop add an
await sleep for the remaining time before checking duration >= seconds; update
the insertion block that calls db.execute and the progress logging that uses
startTime (and the intermittent setTimeout sleep) to operate inside a
while(Date.now() < endTime) loop (or equivalent deadline check) and keep the
periodic await new Promise(resolve => setTimeout(resolve, 1000)) between batches
so the script actually runs for the requested duration.
- Around line 84-120: runTests currently only logs PASS/FAIL but never sets a
non-zero exit code; update runTests to track failures (e.g., a boolean like
hasFailure) and set it to true when test1Result is falsy, when any entry in
results is falsy, or when the db.execute check throws; at the end of runTests,
if hasFailure is true call process.exit(1) (or throw an Error) so CI fails;
additionally change the top-level caller (the runTests().catch(...)) to call
process.exit(1) in its catch handler to ensure unexpected exceptions also exit
non-zero. Use the function/variables runTests, test1Result, results, and
db.execute to locate where to add the checks and exit logic.

---

Outside diff comments:
In `@src/server/services/backupService.ts`:
- Around line 1651-1711: The code updates/inserts sites by identity
(buildSiteIdentityKey) but then continues to persist child rows using the
backup's raw numeric IDs (e.g., section.siteApiEndpoints.map(... siteId:
row.siteId ...)), which can point to wrong parents; fix by building an
importedId→persistedId map as you process each site (when checking existingSite
and after tx.insert or tx.update of schema.sites capture the actual persisted
id), store mapping keyed by the backup site.id, then before calling
batchInsertHelper (and for other child blocks dealing with
accountId/tokenId/routeId) replace row.siteId (and other parent ids) with the
mapped persisted id (or resolve to a newly inserted id) and use that mapped id
for updates/inserts and FK checks so children reference the correct local
records.
- Around line 1476-1498: The coercion in coerceAccountsSection() is too
permissive and is treating legacy nested objects with a non-empty accounts array
as a v2 AccountsBackupSection, which bypasses the legacy converter; update the
coercion logic (used by detectAccountsSection(), coerceAccountsSection(), and
before buildAllApiHubV2AccountsSection()/importAccountsSection()) to perform a
strict structural check for v2 account rows (e.g., require v2-specific fields)
and explicitly reject or return undefined for objects that contain legacy-only
keys like site_url or account_info so the legacy converter
(buildAllApiHubV2AccountsSection()) still runs instead of swallowing legacy
data.
- Around line 1931-2116: The runtime tables (runtimeState.modelAvailability,
tokenAvailability, siteAnnouncements, downstreamApiKeys, proxyLogs, checkinLogs)
are being blindly inserted back into the live DB causing duplicates and PK
conflicts; update the backup import to either (a) remove existing runtime rows
for the affected scope then insert, or (b) perform upserts/ON CONFLICT merges so
existing rows are replaced/merged. Concretely, in the transaction blocks that
write to schema.modelAvailability, schema.tokenModelAvailability,
schema.siteAnnouncements, schema.downstreamApiKeys, schema.proxyLogs and
schema.checkinLogs, replace the plain batchInsertHelper/tx.insert calls with
logic that either deletes matching rows first (using importedIndexes lookups to
limit scope) or uses an upsert API (onConflict/merge) to handle primary-key
conflicts and to ensure manual entries (isManual flag) are preserved/overwritten
as intended; also ensure proxyLogs preserves unique id handling (do not reuse
ids into an occupied DB) by generating new ids or using upsert semantics.

---

Nitpick comments:
In `@src/server/routes/api/settings.ts`:
- Around line 1839-1861: The route currently embeds the background job body and
side effects (calling importBackup, applyImportedSettingToRuntime, and
reloadBackupWebdavScheduler) inside the startBackgroundTask callback; extract
that logic into a dedicated service function (e.g., performBackupImport or
createBackupImportJob) that accepts the parsed backupData and returns the import
result, moves the loop that calls applyImportedSettingToRuntime and the
conditional reloadBackupWebdavScheduler call into that service, and
export/import that helper into the route; then change the startBackgroundTask
call in the route to pass the service function (or a thin wrapper that just
invokes it) so the route only handles request parsing and delegates the queued
work to the new service while preserving the task metadata
(type/title/dedupeKey/notifyOnFailure/successMessage/failureMessage) and return
value behavior.

In `@src/server/services/runtimeModelProbe.ts`:
- Around line 81-105: The sitePromptUsage Map in this file (used by
getRandomPrompt and populated with selected prompts from PROBE_PROMPTS) grows
unbounded because entries are never evicted; replace it with a bounded cache or
TTL/LRU (or implement a rolling index per-site keyed by siteId) to prune old
site entries: update references to sitePromptUsage and getRandomPrompt to use
the new cache API (evict least-recently-used or expire entries after the probe
window) and ensure you still track per-site used prompts set semantics and reset
behavior when all prompts are exhausted.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5dd6408c-a676-4bc5-a98c-d6aaa60fa80d

📥 Commits

Reviewing files that changed from the base of the PR and between c6a28c7 and 0387668.

📒 Files selected for processing (28)

.monkeycode/specs/db-batch-optimization/tasklist.md
.trae/specs/database-timeout-optimization/checklist.md
.trae/specs/database-timeout-optimization/spec.md
.trae/specs/database-timeout-optimization/tasks.md
.trae/specs/import-issue-analysis/checklist.md
.trae/specs/import-issue-analysis/spec.md
.trae/specs/import-issue-analysis/tasks.md
.trae/specs/model-probe-optimization/checklist.md
.trae/specs/model-probe-optimization/spec.md
.trae/specs/model-probe-optimization/tasks.md
.trae/specs/timeout-optimization/checklist.md
.trae/specs/timeout-optimization/spec.md
.trae/specs/timeout-optimization/tasks.md
CODE_WIKI.md
LONG_RUNNING_OPERATIONS_TEST_PLAN.md
docs/configuration.md
docs/operations.md
src/server/config.ts
src/server/db/index.ts
src/server/db/longRunningOperations.test.ts
src/server/routes/api/settings.ts
src/server/services/backupService.longRunning.test.ts
src/server/services/backupService.ts
src/server/services/modelAvailabilityProbeService.ts
src/server/services/runtimeModelProbe.ts
src/web/api.ts
test-long-running-backup.js
test-long-running.js

coderabbitai · 2026-04-10T02:40:50Z

+- [index.ts](file:///workspace/src/server/index.ts) - Fastify 服务启动和初始化
+- [config.ts](file:///workspace/src/server/config.ts) - 环境变量解析和配置
+- `db/` - 数据库 Schema、连接和迁移
+- `middleware/` - 认证等中间件
+- `routes/` - API 路由和代理路由
+- `services/` - 业务服务层
+- `transformers/` - 协议转换层
+- `proxy-core/` - 代理核心逻辑
+
+#### `src/web/` - Web 前端
+- [main.tsx](file:///workspace/src/web/main.tsx) - Vite 入口
+- [App.tsx](file:///workspace/src/web/App.tsx) - 路由和页面装配


⚠️ Potential issue | 🟡 Minor

Use repository-relative links instead of file:///workspace URIs.

Line 35 and similar entries use local filesystem links, which will break in GitHub and published docs. Please switch to repo-relative paths (e.g., ./src/server/index.ts).

🔧 Suggested documentation fix pattern

-- [index.ts](file:///workspace/src/server/index.ts) - Fastify 服务启动和初始化 -- [config.ts](file:///workspace/src/server/config.ts) - 环境变量解析和配置 +- [index.ts](./src/server/index.ts) - Fastify 服务启动和初始化 +- [config.ts](./src/server/config.ts) - 环境变量解析和配置

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@CODE_WIKI.md` around lines 35 - 46, Replace all file:///workspace URI links in CODE_WIKI.md with repository-relative paths so they work on GitHub and in published docs; update the entries referencing index.ts, config.ts, db/, middleware/, routes/, services/, transformers/, proxy-core/ and the src/web files (main.tsx, App.tsx) to use ./src/server/index.ts, ./src/server/config.ts, ./src/server/... and ./src/web/main.tsx, ./src/web/App.tsx (or the appropriate relative path) everywhere instead of file:///workspace/...; ensure markdown link syntax remains valid and run a quick search/replace for the file:///workspace prefix to catch all occurrences.

coderabbitai · 2026-04-10T02:40:50Z

+# docker-compose.yml 示例
+services:
+  metapi:
+    image: metapi/metapi:latest


⚠️ Potential issue | 🟡 Minor

Unify Docker image naming across docs.

Line 388 uses metapi/metapi:latest, but CODE_WIKI.md (Line 391) uses 1467078763/metapi:latest. Please standardize to one canonical image reference across documents to avoid deployment confusion.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/configuration.md` at line 388, The Docker image reference is inconsistent: "image: metapi/metapi:latest" appears here while CODE_WIKI.md uses "1467078763/metapi:latest"; update the occurrence of "image: metapi/metapi:latest" to the canonical "1467078763/metapi:latest" (or vice versa if your canonical choice is the other string) so both documents use the exact same image tag; ensure you change the exact token "image: metapi/metapi:latest" to match the canonical image reference and run a quick grep to verify no other mismatches remain.

coderabbitai · 2026-04-10T02:40:50Z

+    waitForConnections: true,
+    connectionLimit: 10,
+    queueLimit: 0,
+    connectTimeout: 300000, // 5 minutes


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify whether codebase currently relies on connection-acquire timeout knobs # without corresponding query/operation timeout controls. rg -n --type=ts -C2 \ "connectTimeout|connectionTimeoutMillis|waitForConnections|queueLimit|statement_timeout|query_timeout|max_execution_time|AbortSignal|timeout"

Repository: cita-777/metapi

Length of output: 50371

🌐 Web query:

In mysql2 PoolOptions, does connectTimeoutlimit query execution time or only connection establishment? In node-postgres PoolConfig, what is the difference betweenconnectionTimeoutMillis and query/statement timeout controls?

💡 Result:

mysql2 PoolOptions.connectTimeout

connectTimeout only limits how long the driver will wait for the initial connection/handshake to be established—it is not a cap on how long a SQL query is allowed to run. This is the same semantics as the mysqljs/mysql option MySQL2 is compatible with (“timeout … during the initial connection”).[1]

In pooled usage, note this is separate from “waiting to get a connection from the pool” time; mysqljs/mysql distinguishes pool acquisition timeout (acquireTimeout) from connectTimeout because acquiring a pooled connection may not involve creating a new TCP connection.[1]

node-postgres PoolConfig.connectionTimeoutMillis vs query/statement timeouts

connectionTimeoutMillis (PoolConfig): how long the pool will wait when connecting a new client (or, practically, when trying to satisfy pool.connect() by creating a new backend connection) before erroring. It’s about getting a connection, not about executing SQL.[2]

Query/statement timeout controls (Client/connection-level):

statement_timeout: server-side limit—PostgreSQL cancels a statement that runs too long.[3][4]

query_timeout: client-side (node-postgres) limit—how long the client’s query call is allowed to wait before timing out (it’s distinct from statement_timeout).[3]

So: connectionTimeoutMillis protects you from hangs/pressure while obtaining/establishing connections; statement_timeout / query_timeout protect you from long-running queries (server-side vs client-side behavior respectively).[2][3][4]

Sources: [1] [2] [3] [4]

🏁 Script executed:

# Check for query-level timeout configurations in database-related code rg -n --type=ts "statement_timeout|query_timeout|maxQueryTime|requestTimeout|timeout.*query|query.*timeout" src/server/db/ src/server/routes/api/ --max-count=20

Repository: cita-777/metapi

Length of output: 41

🏁 Script executed:

# Check the actual database operations to see if timeouts are applied at call sites rg -n --type=ts "db\.(select|insert|update|delete|query|execute)" src/server/routes/ src/server/services/ -A 2 | head -100

Repository: cita-777/metapi

Length of output: 8210

🏁 Script executed:

# Look for timeout patterns in DB query execution and any long-running operation handling rg -n --type=ts "longRunning|executeQuery|withTimeout" src/server/db/ -A 3 | head -50

Repository: cita-777/metapi

Length of output: 41

🏁 Script executed:

# Check the actual content of the longRunningOperations test that was mentioned cat -n src/server/db/longRunningOperations.test.ts | head -130

Repository: cita-777/metapi

Length of output: 4946

🏁 Script executed:

# Check the complete db/index.ts file to see all pool configurations cat -n src/server/db/index.ts | head -100

Repository: cita-777/metapi

Length of output: 4009

🏁 Script executed:

# Search for any statement or query timeout settings applied at the query level rg -n --type=ts "statement_timeout|queryTimeout|query_timeout|SET.*timeout" src/server/

Repository: cita-777/metapi

Length of output: 41

Connection-acquire timeouts are being used as if they were execution timeouts.

Lines 68 and 84 set connectTimeout (MySQL) and connectionTimeoutMillis (Postgres) to 5 minutes, but these only protect connection establishment/acquisition, not long-running query duration. Combined with queueLimit: 0 (line 67), this allows unbounded queue buildup during database pressure or outages, prolonging request stalls rather than failing fast. No query-level timeout controls (statement_timeout for Postgres, connection-level query timeout for MySQL) are configured.

Consider setting a bounded queue limit and keeping acquisition timeouts tighter, while adding query/operation-level timeout controls at the call site or via database session settings.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/server/db/index.ts` around lines 65 - 68, The DB pool is using long acquisition timeouts and an unbounded queue (connectTimeout, connectionTimeoutMillis set to 300000 and queueLimit: 0), which lets requests pile up during DB pressure; tighten those acquisition timeouts, set a bounded queueLimit (e.g., non-zero small value), and add query-level timeouts instead of relying on acquisition timeouts—either by setting Postgres session statement_timeout or MySQL per-connection query timeout and/or enforcing timeouts at the call sites where queries are executed.

coderabbitai · 2026-04-10T02:40:50Z

+    // 分批次插入数据，每批次1000条，共插入50000条
+    for (let i = 0; i < 50; i++) {
+      const values = [];
+      for (let j = 0; j < 1000; j++) {
+        const data = `test_data_${i}_${j}_${Math.random()}`;
+        values.push(`('${data}')`);
+      }
+      await db.execute(`
+        INSERT INTO test_long_running (data) VALUES ${values.join(', ')}
+      `);
+
+      // 模拟处理时间
+      if (i % 10 === 0) {
+        await new Promise(resolve => setTimeout(resolve, 1000));
+      }
+    }


⚠️ Potential issue | 🟠 Major

The simulated workload doesn't honor the requested duration.

Each branch only adds about 5 seconds of deliberate delay, so simulateLongRunningOperation(60) mostly measures machine speed instead of the seconds argument and will be flaky or fail on fast CI. Drive the helper from a real deadline, or sleep the remaining target time before returning.

Also applies to: 49-63, 76-90

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/server/db/longRunningOperations.test.ts` around lines 20 - 35, The test's simulated workload inside simulateLongRunningOperation (the for-loops inserting into test_long_running and the intermittent setTimeout delays) doesn't respect the requested seconds argument, so change simulateLongRunningOperation to measure a real deadline: record the start time, perform the batched inserts as currently implemented but after the loop (or between batches) compute remaining = seconds - elapsed and await a single sleep for Math.max(0, remaining * 1000) before returning; ensure any per-batch short sleeps remain but do not replace the final wait so the function reliably runs for at least the requested duration.

coderabbitai · 2026-04-10T02:40:50Z

+  it('should handle operations lasting more than 60 seconds without timeout', async () => {
+    // 测试超过60秒的操作
+    const completed = await simulateLongRunningOperation(60);
+    expect(completed).toBe(true);
+  });
+
+  it('should handle multiple long-running operations sequentially', async () => {
+    // 测试多个长时间运行的操作
+    const results = [];
+    for (let i = 0; i < 3; i++) {
+      const completed = await simulateLongRunningOperation(10);
+      results.push(completed);
+    }
+
+    // 所有操作都应该成功完成
+    expect(results.every(r => r)).toBe(true);
+  });
+
+  it('should maintain database connection after long operations', async () => {
+    // 执行长时间操作
+    await simulateLongRunningOperation(30);
+
+    // 验证连接仍然可用
+    const result = await db.execute('SELECT 1');
+    expect(result).toBeDefined();
+  });


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail fd -HI '^(vitest|vite)\.config\.(ts|mts|cts|js|mjs|cjs)$|^package\.json$' | while read -r file; do echo "== $file ==" rg -n -C2 'testTimeout|hookTimeout|vitest|defineConfig' "$file" || true done

Repository: cita-777/metapi

Length of output: 1776

🏁 Script executed:

cat -n vitest.config.ts

Repository: cita-777/metapi

Length of output: 777

🏁 Script executed:

cat -n src/server/db/longRunningOperations.test.ts

Repository: cita-777/metapi

Length of output: 5389

Add explicit timeouts to these Vitest test cases.

The test suite has no global testTimeout configured in vitest.config.ts, and the three test cases at lines 119-144 have no individual timeout overrides. With Vitest's default 10-second timeout, the tests expecting 10, 30, and 60+ seconds of operation will abort before assertions run. Set a suite or per-test timeout of at least 70 seconds (accounting for test overhead and database operations), or enable a global timeout in the Vitest config.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/server/db/longRunningOperations.test.ts` around lines 119 - 144, These tests call simulateLongRunningOperation (and then db.execute) for 10–60+ seconds but lack Vitest timeouts, so add explicit timeouts: either call vi.setTimeout(70000) at the top of this test file/suite or set per-test timeouts for the three tests (the "should handle operations lasting more than 60 seconds without timeout", "should handle multiple long-running operations sequentially", and "should maintain database connection after long operations") so each allows at least 70_000 ms; ensure the timeout change surrounds tests that call simulateLongRunningOperation and db.execute so they won't be aborted.

coderabbitai · 2026-04-10T02:40:50Z

+      if (!checkSiteProbeFrequency(siteId)) {
+        results.push(buildSkippedProbeAccountResult({
+          accountId,
+          siteId: context.site.id,
+          message: 'model availability probe frequency limit exceeded for site',
+        }));
+        continue;
+      }
+      if (!tryAcquireProbeAccountLease(accountId, siteId)) {


⚠️ Potential issue | 🟠 Major

Only consume the site quota after the lease is acquired.

checkSiteProbeFrequency(siteId) mutates siteProbeRecords, but it runs before tryAcquireProbeAccountLease(). Any skipped attempt caused by an existing account/site probe still burns the site's 5-minute budget, so duplicate/manual triggers can exhaust the quota without actually probing anything.

Suggested fix

- if (!checkSiteProbeFrequency(siteId)) { - results.push(buildSkippedProbeAccountResult({ - accountId, - siteId: context.site.id, - message: 'model availability probe frequency limit exceeded for site', - })); - continue; - } if (!tryAcquireProbeAccountLease(accountId, siteId)) { results.push(buildSkippedProbeAccountResult({ accountId, siteId: context.site.id, message: 'model availability probe already running for account or site', })); continue; } + if (!checkSiteProbeFrequency(siteId)) { + releaseProbeAccountLease(accountId, siteId); + results.push(buildSkippedProbeAccountResult({ + accountId, + siteId: context.site.id, + message: 'model availability probe frequency limit exceeded for site', + })); + continue; + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (!checkSiteProbeFrequency(siteId)) {

results.push(buildSkippedProbeAccountResult({

accountId,

siteId: context.site.id,

message: 'model availability probe frequency limit exceeded for site',

}));

continue;

}

if (!tryAcquireProbeAccountLease(accountId, siteId)) {

if (!tryAcquireProbeAccountLease(accountId, siteId)) {

results.push(buildSkippedProbeAccountResult({

accountId,

siteId: context.site.id,

message: 'model availability probe already running for account or site',

}));

continue;

}

if (!checkSiteProbeFrequency(siteId)) {

releaseProbeAccountLease(accountId, siteId);

results.push(buildSkippedProbeAccountResult({

accountId,

siteId: context.site.id,

message: 'model availability probe frequency limit exceeded for site',

}));

continue;

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/server/services/modelAvailabilityProbeService.ts` around lines 345 - 353, The site quota is being consumed by checkSiteProbeFrequency(siteId) even when the account lease isn't acquired; change the flow so the lease is acquired first (tryAcquireProbeAccountLease(accountId, siteId)) and only then call checkSiteProbeFrequency(siteId) so that siteProbeRecords mutation happens only for probes that actually proceed, or alternatively modify checkSiteProbeFrequency to be non-mutating (or to return a decision without updating siteProbeRecords) and perform the mutation only after a successful tryAcquireProbeAccountLease; update the logic around buildSkippedProbeAccountResult to reflect the correct skip reason when the lease cannot be acquired.

coderabbitai · 2026-04-10T02:40:50Z

 export function __resetModelAvailabilityProbeExecutionStateForTests(): void {
  probeAccountLeases.clear();
+  probeSiteLeases.clear();
 }


⚠️ Potential issue | 🟡 Minor

Reset the frequency limiter state in tests too.

__resetModelAvailabilityProbeExecutionStateForTests() clears the new lease sets, but siteProbeRecords survives. Tests that call the reset helper can still inherit the previous run's per-site throttle and fail nondeterministically.

Suggested fix

export function __resetModelAvailabilityProbeExecutionStateForTests(): void { probeAccountLeases.clear(); probeSiteLeases.clear(); + siteProbeRecords.clear(); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/server/services/modelAvailabilityProbeService.ts` around lines 538 - 541, The test reset helper __resetModelAvailabilityProbeExecutionStateForTests currently clears probeAccountLeases and probeSiteLeases but leaves siteProbeRecords intact, causing per-site throttle state to leak between tests; update that function to also clear the siteProbeRecords data structure (reset its entries or call its clear method) so tests start with a fresh frequency limiter state alongside probeAccountLeases and probeSiteLeases.

coderabbitai · 2026-04-10T02:40:50Z

+    await import('./src/server/db/migrate.ts');
+    const { db, schema } = await import('./src/server/db/index.ts');
+    const { importBackup, exportBackup } = await import('./src/server/services/backupService.ts');
+    const { startBackgroundTask } = await import('./src/server/services/backgroundTaskService.ts');


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Locate the test harness file find . -name "test-long-running-backup.js" -type f

Repository: cita-777/metapi

Length of output: 88

🏁 Script executed:

# Check package.json for scripts and test setup cat package.json | head -100

Repository: cita-777/metapi

Length of output: 4638

🏁 Script executed:

# Look for build outputs or TypeScript loader setup ls -la | grep -E "(tsconfig|esbuild|webpack|vitest|jest|tsx)"

Repository: cita-777/metapi

Length of output: 485

🏁 Script executed:

# Search for how other test files are structured fd -e test.ts -e test.js | head -20

Repository: cita-777/metapi

Length of output: 815

🏁 Script executed:

# Read the test harness file to see its full structure and shebang cat -n test-long-running-backup.js

Repository: cita-777/metapi

Length of output: 10924

🏁 Script executed:

# Check if there's any documentation or runner script fd -e md -e txt | xargs grep -l "test-long-running-backup" 2>/dev/null | head -5

Repository: cita-777/metapi

Length of output: 41

🏁 Script executed:

# Check build output directory structure ls -la dist/server 2>/dev/null | head -20 || echo "dist/server not found or empty"

Repository: cita-777/metapi

Length of output: 41

🏁 Script executed:

# Look for any tsconfig or loader configuration that might handle .ts imports in .js files cat tsconfig.json

Repository: cita-777/metapi

Length of output: 766

🏁 Script executed:

# Check if there's any loader configuration or environment setup grep -r "loader\|--experimental\|NODE_OPTIONS" . --include="*.json" --include="*.js" --include="*.ts" --include=".env*" 2>/dev/null | head -20

Repository: cita-777/metapi

Length of output: 790

🏁 Script executed:

# Check if build outputs exist in dist/server ls -la dist/server 2>/dev/null | head -20 || echo "dist/server does not exist or is empty"

Repository: cita-777/metapi

Length of output: 41

🏁 Script executed:

# Verify the actual issue: try to understand the intended execution path grep -r "test-long-running" . --include="*.json" --include="*.md" --include="*.sh" 2>/dev/null

Repository: cita-777/metapi

Length of output: 41

This harness must either be renamed to .ts and run with tsx, or import built .js artifacts.

The file is plain JavaScript but imports TypeScript sources without a configured loader. Plain Node.js will fail with MODULE_NOT_FOUND at line 118. Either rename to test-long-running-backup.ts and run as tsx test-long-running-backup.ts, or update imports to use compiled JavaScript artifacts from dist/.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test-long-running-backup.js` around lines 118 - 121, The test harness imports TypeScript sources from plain JavaScript which will fail under Node — either convert the harness to TypeScript or point imports at compiled JS: rename the file to test-long-running-backup.ts and run via tsx so imports of migrate, db/index, backupService (importBackup, exportBackup) and backgroundTaskService (startBackgroundTask) work, or keep the .js harness and change those import paths to the built JS artifacts (e.g., dist/ versions) so db, schema, importBackup, exportBackup and startBackgroundTask resolve at runtime.

coderabbitai · 2026-04-10T02:40:50Z

+    // 分批次插入数据
+    console.log('Inserting test data...');
+    for (let i = 0; i < 50; i++) {
+      const values = [];
+      for (let j = 0; j < 1000; j++) {
+        const data = `test_data_${i}_${j}_${Math.random()}`;
+        values.push(`('${data}')`);
+      }
+
+      await db.execute(`
+        INSERT INTO test_long_running (data) VALUES ${values.join(', ')}
+      `);
+
+      if (i % 10 === 0) {
+        const currentTime = (Date.now() - startTime) / 1000;
+        console.log(`Progress: ${i * 1000} records inserted (${currentTime.toFixed(2)} seconds)`);
+        // 模拟处理时间
+        await new Promise(resolve => setTimeout(resolve, 1000));
+      }


⚠️ Potential issue | 🟠 Major

This script doesn't actually run for the requested duration.

The loop only injects about 5 seconds of delay, so the 60/30/10 second targets depend on insert speed rather than the seconds argument. Use a deadline-based loop, or sleep the remaining time before evaluating duration >= seconds.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test-long-running.js` around lines 36 - 54, The current for-loop based insertion uses only intermittent 1s sleeps and finishes before the requested runtime; change to a deadline-driven loop using startTime (or compute endTime = startTime + seconds*1000) so inserts continue until Date.now() >= endTime, or after the existing batch loop add an await sleep for the remaining time before checking duration >= seconds; update the insertion block that calls db.execute and the progress logging that uses startTime (and the intermittent setTimeout sleep) to operate inside a while(Date.now() < endTime) loop (or equivalent deadline check) and keep the periodic await new Promise(resolve => setTimeout(resolve, 1000)) between batches so the script actually runs for the requested duration.

coderabbitai · 2026-04-10T02:40:50Z

+async function runTests() {
+  console.log('=== Long Running Database Operations Test ===');
+  console.log(`Database dialect: ${runtimeDbDialect}`);
+  console.log('===========================================');
+
+  // 测试超过60秒的操作
+  console.log('\nTest 1: Operation lasting more than 60 seconds');
+  const test1Result = await simulateLongRunningOperation(60);
+
+  // 测试多个长时间运行的操作
+  console.log('\nTest 2: Multiple long-running operations');
+  const results = [];
+  for (let i = 0; i < 3; i++) {
+    console.log(`\nRunning operation ${i + 1}/3`);
+    const result = await simulateLongRunningOperation(10);
+    results.push(result);
+  }
+
+  // 测试数据库连接稳定性
+  console.log('\nTest 3: Database connection stability');
+  await simulateLongRunningOperation(30);
+  try {
+    const result = await db.execute('SELECT 1');
+    console.log('✅ SUCCESS: Database connection is still stable');
+  } catch (error) {
+    console.error('❌ ERROR: Database connection failed after long operation:', error);
+  }
+
+  console.log('\n=== Test Summary ===');
+  console.log(`Test 1 (60+ seconds): ${test1Result ? 'PASS' : 'FAIL'}`);
+  console.log(`Test 2 (Multiple operations): ${results.every(r => r) ? 'PASS' : 'FAIL'}`);
+  console.log('Test 3 (Connection stability): Check logs above');
+  console.log('==================');
+}
+
+// 运行测试
+runTests().catch(console.error);


⚠️ Potential issue | 🟠 Major

Fail the process when any scenario fails.

runTests() only prints PASS/FAIL; it never sets a non-zero exit code when Line 91, Line 98, or the connection-stability check fails. That makes the script unreliable for CI or automation because regressions still exit successfully.

🛠️ Suggested fix

async function runTests() { console.log('=== Long Running Database Operations Test ==='); console.log(`Database dialect: ${runtimeDbDialect}`); console.log('==========================================='); // 测试超过60秒的操作 console.log('\nTest 1: Operation lasting more than 60 seconds'); const test1Result = await simulateLongRunningOperation(60); // 测试多个长时间运行的操作 console.log('\nTest 2: Multiple long-running operations'); const results = []; for (let i = 0; i < 3; i++) { console.log(`\nRunning operation ${i + 1}/3`); const result = await simulateLongRunningOperation(10); results.push(result); } + const test2Passed = results.every((r) => r); // 测试数据库连接稳定性 console.log('\nTest 3: Database connection stability'); - await simulateLongRunningOperation(30); + const test3OperationResult = await simulateLongRunningOperation(30); + let test3ConnectionOk = false; try { - const result = await db.execute('SELECT 1'); + await db.execute('SELECT 1'); + test3ConnectionOk = true; console.log('✅ SUCCESS: Database connection is still stable'); } catch (error) { console.error('❌ ERROR: Database connection failed after long operation:', error); } + const test3Passed = test3OperationResult && test3ConnectionOk; console.log('\n=== Test Summary ==='); console.log(`Test 1 (60+ seconds): ${test1Result ? 'PASS' : 'FAIL'}`); - console.log(`Test 2 (Multiple operations): ${results.every(r => r) ? 'PASS' : 'FAIL'}`); - console.log('Test 3 (Connection stability): Check logs above'); + console.log(`Test 2 (Multiple operations): ${test2Passed ? 'PASS' : 'FAIL'}`); + console.log(`Test 3 (Connection stability): ${test3Passed ? 'PASS' : 'FAIL'}`); console.log('=================='); + + if (!test1Result || !test2Passed || !test3Passed) { + process.exitCode = 1; + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async function runTests() {

console.log('=== Long Running Database Operations Test ===');

console.log(`Database dialect: ${runtimeDbDialect}`);

console.log('===========================================');

// 测试超过60秒的操作

console.log('\nTest 1: Operation lasting more than 60 seconds');

const test1Result = await simulateLongRunningOperation(60);

// 测试多个长时间运行的操作

console.log('\nTest 2: Multiple long-running operations');

const results = [];

for (let i = 0; i < 3; i++) {

console.log(`\nRunning operation ${i + 1}/3`);

const result = await simulateLongRunningOperation(10);

results.push(result);

}

// 测试数据库连接稳定性

console.log('\nTest 3: Database connection stability');

await simulateLongRunningOperation(30);

try {

const result = await db.execute('SELECT 1');

console.log('✅ SUCCESS: Database connection is still stable');

} catch (error) {

console.error('❌ ERROR: Database connection failed after long operation:', error);

}

console.log('\n=== Test Summary ===');

console.log(`Test 1 (60+ seconds): ${test1Result ? 'PASS' : 'FAIL'}`);

console.log(`Test 2 (Multiple operations): ${results.every(r => r) ? 'PASS' : 'FAIL'}`);

console.log('Test 3 (Connection stability): Check logs above');

console.log('==================');

}

// 运行测试

runTests().catch(console.error);

async function runTests() {

console.log('=== Long Running Database Operations Test ===');

console.log(`Database dialect: ${runtimeDbDialect}`);

console.log('===========================================');

// 测试超过60秒的操作

console.log('\nTest 1: Operation lasting more than 60 seconds');

const test1Result = await simulateLongRunningOperation(60);

// 测试多个长时间运行的操作

console.log('\nTest 2: Multiple long-running operations');

const results = [];

for (let i = 0; i < 3; i++) {

console.log(`\nRunning operation ${i + 1}/3`);

const result = await simulateLongRunningOperation(10);

results.push(result);

}

const test2Passed = results.every((r) => r);

// 测试数据库连接稳定性

console.log('\nTest 3: Database connection stability');

const test3OperationResult = await simulateLongRunningOperation(30);

let test3ConnectionOk = false;

try {

await db.execute('SELECT 1');

test3ConnectionOk = true;

console.log('✅ SUCCESS: Database connection is still stable');

} catch (error) {

console.error('❌ ERROR: Database connection failed after long operation:', error);

}

const test3Passed = test3OperationResult && test3ConnectionOk;

console.log('\n=== Test Summary ===');

console.log(`Test 1 (60+ seconds): ${test1Result ? 'PASS' : 'FAIL'}`);

console.log(`Test 2 (Multiple operations): ${test2Passed ? 'PASS' : 'FAIL'}`);

console.log(`Test 3 (Connection stability): ${test3Passed ? 'PASS' : 'FAIL'}`);

console.log('==================');

if (!test1Result || !test2Passed || !test3Passed) {

process.exitCode = 1;

}

}

// 运行测试

runTests().catch(console.error);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test-long-running.js` around lines 84 - 120, runTests currently only logs PASS/FAIL but never sets a non-zero exit code; update runTests to track failures (e.g., a boolean like hasFailure) and set it to true when test1Result is falsy, when any entry in results is falsy, or when the db.execute check throws; at the end of runTests, if hasFailure is true call process.exit(1) (or throw an Error) so CI fails; additionally change the top-level caller (the runTests().catch(...)) to call process.exit(1) in its catch handler to ensure unexpected exceptions also exit non-zero. Use the function/variables runTests, test1Result, results, and db.execute to locate where to add the checks and exit logic.

cita-777 · 2026-04-10T03:04:14Z

请处理 coderabbitai 给出的comments，如果 coderabbitai 给出了错误建议请在对应位置reply

cita-777 · 2026-04-10T03:04:58Z

以及pr #460 是否需要关闭

MonkeyCode-AI and others added 16 commits April 9, 2026 03:42

feat: 分析排查30s超时问题

e0c4210

Co-authored-by: traeagent <traeagent@users.noreply.github.com>

feat: 分析排查30s超时问题

ce08eb8

Co-authored-by: traeagent <traeagent@users.noreply.github.com>

feat: 分析排查30s超时问题

91645bf

Co-authored-by: traeagent <traeagent@users.noreply.github.com>

feat: 分析排查30s超时问题

8f9c7fe

Co-authored-by: traeagent <traeagent@users.noreply.github.com>

Merge pull request #1 from chenzhenguo/trae/solo-agent-UyC04x

c570ec5

feat: 优化超时处理并提升备份服务性能

feat: 分析排查30s超时问题

757344f

Co-authored-by: traeagent <traeagent@users.noreply.github.com>

Merge branch 'main' into trae/solo-agent-UyC04x

ed9a98d

feat: 分析排查30s超时问题

bf722d3

Co-authored-by: traeagent <traeagent@users.noreply.github.com>

feat: 批量测活功能核心逻辑总结

b9a081a

Co-authored-by: traeagent <traeagent@users.noreply.github.com>

feat: 批量测活功能核心逻辑总结

08286e5

Co-authored-by: traeagent <traeagent@users.noreply.github.com>

feat: 分析导入功能提示和代码

c67b9b0

Co-authored-by: traeagent <traeagent@users.noreply.github.com>

feat: 分析导入功能提示和代码

b18eab9

Co-authored-by: traeagent <traeagent@users.noreply.github.com>

feat: 分析导入功能提示和代码

b0f00c3

Co-authored-by: traeagent <traeagent@users.noreply.github.com>

chore: 删除临时测试文件

0387668

github-actions Bot added area: db Database and schema related changes area: docs Docs and README changes area: web Web UI changes size: XXL 2000 or more lines changed labels Apr 10, 2026

coderabbitai Bot reviewed Apr 10, 2026

View reviewed changes

Conversation

chenzhenguo commented Apr 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

mysql2 PoolOptions.connectTimeout

node-postgres PoolConfig.connectionTimeoutMillis vs query/statement timeouts

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

cita-777 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cita-777 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chenzhenguo commented Apr 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 10, 2026 •

edited

Loading

mysql2 `PoolOptions.connectTimeout`

node-postgres `PoolConfig.connectionTimeoutMillis` vs query/statement timeouts

cita-777 commented Apr 10, 2026 •

edited

Loading