fix(vt): sync scan status from AV engines when Code Insight unavailable#591
fix(vt): sync scan status from AV engines when Code Insight unavailable#591Shuai-DaiDai wants to merge 1 commit intoopenclaw:mainfrom
Conversation
When VirusTotal returns scan results with AV engine stats but no Code Insight AI analysis, the skill status was stuck on 'Pending'. This fix adds fallback logic to check last_analysis_stats (malicious/suspicious/harmless/undetected) to determine scan status. Functions updated: - pollPendingScans: Check AV engines before requesting rescan - backfillPendingScans: Check AV engines before marking as no results - rescanActiveSkills: Check AV engines before keeping as pending - backfillActiveSkillsVTCache: Check AV engines before skipping Fixes #33435
|
Someone is attempting to deploy a commit to the Amantus Machina Team on Vercel. A member of the Team first needs to authorize it. |
Greptile SummaryThis PR adds AV engine stats ( Key findings:
Confidence Score: 3/5
Last reviewed commit: b244574 |
| } else if (stats.harmless > 0 || stats.undetected > 0) { | ||
| // No detections and some harmless/undetected engines = clean | ||
| status = 'clean' | ||
| } |
There was a problem hiding this comment.
Inconsistency with fetchResults "clean" determination
The PR description states this logic "matches the existing logic in fetchResults function," but it does not. The fetchResults function only marks a file clean when harmless > 0:
// fetchResults (line 303) — existing reference implementation
} else if (stats.harmless > 0) {
status = 'clean'
}The new code additionally accepts undetected > 0 as sufficient for "clean":
} else if (stats.harmless > 0 || stats.undetected > 0) {In VirusTotal, undetected means an engine ran but produced no verdict — it did not classify the file as harmless. A file where every engine returns undetected (e.g., 0 harmless, 64 undetected) would be published as "clean" by the polling functions but remain "pending" via fetchResults. This creates a real inconsistency: the stored vtAnalysis.status in the DB would be "clean", but any fresh call to fetchResults for UI display would return "pending" — which could confuse debugging and monitoring.
This same divergence is also present at the corresponding fallback blocks in backfillPendingScans (~line 795), rescanActiveSkills (~line 924), and backfillActiveSkillsVTCache (~line 1258).
If broadening the clean criteria to include undetected > 0 is intentional, fetchResults should be updated to match.
Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/vt.ts
Line: 579-582
Comment:
**Inconsistency with `fetchResults` "clean" determination**
The PR description states this logic "matches the existing logic in `fetchResults` function," but it does not. The `fetchResults` function only marks a file clean when `harmless > 0`:
```typescript
// fetchResults (line 303) — existing reference implementation
} else if (stats.harmless > 0) {
status = 'clean'
}
```
The new code additionally accepts `undetected > 0` as sufficient for "clean":
```typescript
} else if (stats.harmless > 0 || stats.undetected > 0) {
```
In VirusTotal, `undetected` means an engine ran but produced no verdict — it did not classify the file as harmless. A file where every engine returns `undetected` (e.g., 0 harmless, 64 undetected) would be published as "clean" by the polling functions but remain "pending" via `fetchResults`. This creates a real inconsistency: the stored `vtAnalysis.status` in the DB would be `"clean"`, but any fresh call to `fetchResults` for UI display would return `"pending"` — which could confuse debugging and monitoring.
This same divergence is also present at the corresponding fallback blocks in `backfillPendingScans` (~line 795), `rescanActiveSkills` (~line 924), and `backfillActiveSkillsVTCache` (~line 1258).
If broadening the clean criteria to include `undetected > 0` is intentional, `fetchResults` should be updated to match.
How can I resolve this? If you propose a fix, please make it concise.| if (stats) { | ||
| if (stats.malicious > 0) { | ||
| status = 'malicious' | ||
| } else if (stats.suspicious > 0) { | ||
| status = 'suspicious' | ||
| } else if (stats.harmless > 0 || stats.undetected > 0) { | ||
| // No detections and some harmless/undetected engines = clean | ||
| status = 'clean' | ||
| } | ||
| } |
There was a problem hiding this comment.
Duplicated stats-to-status logic across 4 functions
The same 10-line block for deriving status from last_analysis_stats is copy-pasted identically into pollPendingScans, backfillPendingScans, rescanActiveSkills, and backfillActiveSkillsVTCache. Extracting it into a small helper would reduce duplication and make any future threshold changes (e.g., updating the undetected logic) a single-line fix:
function statusFromEngineStats(
stats: VTFileResponse['data']['attributes']['last_analysis_stats'],
): string | null {
if (!stats) return null
if (stats.malicious > 0) return 'malicious'
if (stats.suspicious > 0) return 'suspicious'
if (stats.harmless > 0 || stats.undetected > 0) return 'clean'
return null
}This also makes the divergence from fetchResults easier to spot and reason about.
Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/vt.ts
Line: 574-583
Comment:
**Duplicated stats-to-status logic across 4 functions**
The same 10-line block for deriving `status` from `last_analysis_stats` is copy-pasted identically into `pollPendingScans`, `backfillPendingScans`, `rescanActiveSkills`, and `backfillActiveSkillsVTCache`. Extracting it into a small helper would reduce duplication and make any future threshold changes (e.g., updating the `undetected` logic) a single-line fix:
```typescript
function statusFromEngineStats(
stats: VTFileResponse['data']['attributes']['last_analysis_stats'],
): string | null {
if (!stats) return null
if (stats.malicious > 0) return 'malicious'
if (stats.suspicious > 0) return 'suspicious'
if (stats.harmless > 0 || stats.undetected > 0) return 'clean'
return null
}
```
This also makes the divergence from `fetchResults` easier to spot and reason about.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b244574131
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| updated++ | ||
| continue |
There was a problem hiding this comment.
Keep polling Code Insight after engine-only fallback
This branch treats AV engine stats as a terminal verdict and continues without requesting reanalysis, which means the hash drops out of the pending poll path once vtAnalysis.status is set to clean/suspicious/malicious. getPendingScanSkillsInternal explicitly skips those final VT statuses (convex/skills.ts lines 1878-1884), so we never pick up a later code_insight result for the same version. In cases where Code Insight arrives later with a stricter verdict than the initial engine stats (for example, engines clean but Code Insight suspicious), the skill remains misclassified until the next daily rescan.
Useful? React with 👍 / 👎.
Problem
When VirusTotal scan completes with AV engine results (e.g., 0/64 detections, "Benign") but Code Insight AI analysis is not yet available, skill scan status was stuck on "Pending" indefinitely.
Fixes #33435
Root Cause
The VT polling functions only checked for "code_insight" AI analysis results and ignored the "last_analysis_stats" AV engine scan results:
Solution
Added fallback logic to check "last_analysis_stats" when Code Insight is unavailable:
This matches the existing logic in "fetchResults" function.
Testing