Skip to content

fix(vt): sync scan status from AV engines when Code Insight unavailable#591

Open
Shuai-DaiDai wants to merge 1 commit intoopenclaw:mainfrom
Shuai-DaiDai:fix/vt-pending-av-engines-fallback
Open

fix(vt): sync scan status from AV engines when Code Insight unavailable#591
Shuai-DaiDai wants to merge 1 commit intoopenclaw:mainfrom
Shuai-DaiDai:fix/vt-pending-av-engines-fallback

Conversation

@Shuai-DaiDai
Copy link

Problem

When VirusTotal scan completes with AV engine results (e.g., 0/64 detections, "Benign") but Code Insight AI analysis is not yet available, skill scan status was stuck on "Pending" indefinitely.

Fixes #33435

Root Cause

The VT polling functions only checked for "code_insight" AI analysis results and ignored the "last_analysis_stats" AV engine scan results:

  • "pollPendingScans" - would request rescan instead of using engine stats
  • "backfillPendingScans" - would skip skills with no Code Insight
  • "rescanActiveSkills" - would mark as pending instead of using engine stats
  • "backfillActiveSkillsVTCache" - would skip skills with no Code Insight

Solution

Added fallback logic to check "last_analysis_stats" when Code Insight is unavailable:

  • If "malicious > 0" → status = "malicious"
  • If "suspicious > 0" → status = "suspicious"
  • If "harmless > 0 || undetected > 0" → status = "clean"

This matches the existing logic in "fetchResults" function.

Testing

  • Test with skill that has engine stats but no Code Insight
  • Verify status updates to clean/malicious/suspicious correctly
  • Confirm pending skills are processed faster

When VirusTotal returns scan results with AV engine stats but no Code Insight
AI analysis, the skill status was stuck on 'Pending'. This fix adds fallback
logic to check last_analysis_stats (malicious/suspicious/harmless/undetected)
to determine scan status.

Functions updated:
- pollPendingScans: Check AV engines before requesting rescan
- backfillPendingScans: Check AV engines before marking as no results
- rescanActiveSkills: Check AV engines before keeping as pending
- backfillActiveSkillsVTCache: Check AV engines before skipping

Fixes #33435
@vercel
Copy link
Contributor

vercel bot commented Mar 3, 2026

Someone is attempting to deploy a commit to the Amantus Machina Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 3, 2026

Greptile Summary

This PR adds AV engine stats (last_analysis_stats) as a fallback verdict source across four VT polling/backfill functions (pollPendingScans, backfillPendingScans, rescanActiveSkills, backfillActiveSkillsVTCache) when VirusTotal's Code Insight AI analysis is unavailable. The approach is sound and directly fixes the "stuck on Pending" issue described in #33435.

Key findings:

  • Logic inconsistency with fetchResults: The PR description claims the new logic "matches the existing logic in fetchResults", but the new code broadens the "clean" condition to harmless > 0 || undetected > 0, while fetchResults only uses harmless > 0. A file with only undetected engine results (no engine explicitly said "harmless") would be published as "clean" by the polling functions but appear as "pending" in fetchResults, creating a DB/UI status inconsistency. If broadening to undetected is intentional, fetchResults should be updated to match.
  • Code duplication: The identical 10-line stats-to-status parsing block is copy-pasted four times. Extracting it into a shared helper would make future threshold changes safer and easier to reason about.

Confidence Score: 3/5

  • The PR resolves a real user-facing bug but introduces a subtle divergence from the reference implementation that could cause status inconsistencies between stored DB state and UI display.
  • The core logic is directionally correct and the four modified code paths are structurally consistent with one another. However, the undocumented broadening of the "clean" criterion to include undetected > 0 (vs. the reference fetchResults that only uses harmless > 0) is a meaningful behavioral difference that could lead to confusing discrepancies between what is stored in the DB and what fetchResults returns for the same file. This warrants discussion before merging.
  • convex/vt.ts — specifically the else if (stats.harmless > 0 || stats.undetected > 0) condition in all four new fallback blocks, and the corresponding condition in fetchResults at line 303.

Last reviewed commit: b244574

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +579 to +582
} else if (stats.harmless > 0 || stats.undetected > 0) {
// No detections and some harmless/undetected engines = clean
status = 'clean'
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistency with fetchResults "clean" determination

The PR description states this logic "matches the existing logic in fetchResults function," but it does not. The fetchResults function only marks a file clean when harmless > 0:

// fetchResults (line 303) — existing reference implementation
} else if (stats.harmless > 0) {
  status = 'clean'
}

The new code additionally accepts undetected > 0 as sufficient for "clean":

} else if (stats.harmless > 0 || stats.undetected > 0) {

In VirusTotal, undetected means an engine ran but produced no verdict — it did not classify the file as harmless. A file where every engine returns undetected (e.g., 0 harmless, 64 undetected) would be published as "clean" by the polling functions but remain "pending" via fetchResults. This creates a real inconsistency: the stored vtAnalysis.status in the DB would be "clean", but any fresh call to fetchResults for UI display would return "pending" — which could confuse debugging and monitoring.

This same divergence is also present at the corresponding fallback blocks in backfillPendingScans (~line 795), rescanActiveSkills (~line 924), and backfillActiveSkillsVTCache (~line 1258).

If broadening the clean criteria to include undetected > 0 is intentional, fetchResults should be updated to match.

Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/vt.ts
Line: 579-582

Comment:
**Inconsistency with `fetchResults` "clean" determination**

The PR description states this logic "matches the existing logic in `fetchResults` function," but it does not. The `fetchResults` function only marks a file clean when `harmless > 0`:

```typescript
// fetchResults (line 303) — existing reference implementation
} else if (stats.harmless > 0) {
  status = 'clean'
}
```

The new code additionally accepts `undetected > 0` as sufficient for "clean":

```typescript
} else if (stats.harmless > 0 || stats.undetected > 0) {
```

In VirusTotal, `undetected` means an engine ran but produced no verdict — it did not classify the file as harmless. A file where every engine returns `undetected` (e.g., 0 harmless, 64 undetected) would be published as "clean" by the polling functions but remain "pending" via `fetchResults`. This creates a real inconsistency: the stored `vtAnalysis.status` in the DB would be `"clean"`, but any fresh call to `fetchResults` for UI display would return `"pending"` — which could confuse debugging and monitoring.

This same divergence is also present at the corresponding fallback blocks in `backfillPendingScans` (~line 795), `rescanActiveSkills` (~line 924), and `backfillActiveSkillsVTCache` (~line 1258).

If broadening the clean criteria to include `undetected > 0` is intentional, `fetchResults` should be updated to match.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +574 to +583
if (stats) {
if (stats.malicious > 0) {
status = 'malicious'
} else if (stats.suspicious > 0) {
status = 'suspicious'
} else if (stats.harmless > 0 || stats.undetected > 0) {
// No detections and some harmless/undetected engines = clean
status = 'clean'
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated stats-to-status logic across 4 functions

The same 10-line block for deriving status from last_analysis_stats is copy-pasted identically into pollPendingScans, backfillPendingScans, rescanActiveSkills, and backfillActiveSkillsVTCache. Extracting it into a small helper would reduce duplication and make any future threshold changes (e.g., updating the undetected logic) a single-line fix:

function statusFromEngineStats(
  stats: VTFileResponse['data']['attributes']['last_analysis_stats'],
): string | null {
  if (!stats) return null
  if (stats.malicious > 0) return 'malicious'
  if (stats.suspicious > 0) return 'suspicious'
  if (stats.harmless > 0 || stats.undetected > 0) return 'clean'
  return null
}

This also makes the divergence from fetchResults easier to spot and reason about.

Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/vt.ts
Line: 574-583

Comment:
**Duplicated stats-to-status logic across 4 functions**

The same 10-line block for deriving `status` from `last_analysis_stats` is copy-pasted identically into `pollPendingScans`, `backfillPendingScans`, `rescanActiveSkills`, and `backfillActiveSkillsVTCache`. Extracting it into a small helper would reduce duplication and make any future threshold changes (e.g., updating the `undetected` logic) a single-line fix:

```typescript
function statusFromEngineStats(
  stats: VTFileResponse['data']['attributes']['last_analysis_stats'],
): string | null {
  if (!stats) return null
  if (stats.malicious > 0) return 'malicious'
  if (stats.suspicious > 0) return 'suspicious'
  if (stats.harmless > 0 || stats.undetected > 0) return 'clean'
  return null
}
```

This also makes the divergence from `fetchResults` easier to spot and reason about.

How can I resolve this? If you propose a fix, please make it concise.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b244574131

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +607 to +608
updated++
continue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep polling Code Insight after engine-only fallback

This branch treats AV engine stats as a terminal verdict and continues without requesting reanalysis, which means the hash drops out of the pending poll path once vtAnalysis.status is set to clean/suspicious/malicious. getPendingScanSkillsInternal explicitly skips those final VT statuses (convex/skills.ts lines 1878-1884), so we never pick up a later code_insight result for the same version. In cases where Code Insight arrives later with a stricter verdict than the initial engine stats (for example, engines clean but Code Insight suspicious), the skill remains misclassified until the next daily rescan.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant