Skip to content

IA-1: anomaly_root_cause agent returns "unknown" instead of structured fault type (Llama 3.1 8B baseline) #10

@adris-misra

Description

@adris-misra

Description

The anomaly_root_cause agent, when invoked from IABENCH-v1 task IA-1
with Ollama + Llama 3.1 8B, returns "unknown" as the predicted fault
type for every synthetic-data scenario. F1 is 0.000 not because
predictions are wrong, but because the agent does not emit a recognizable
fault label at all.

Surfaced by the PR 1 health-check run on the IABENCH-v1 foundation PR.

Steps to reproduce

  1. Check out main branch with PR 1 merged
  2. Ensure Ollama is running locally with llama3.1:8b pulled
  3. In PowerShell: $env:PYTHONPATH = "."
  4. Run: industrial-agents bench --suite all --provider ollama --model llama3.1:8b
  5. Open benchmarks/results/iabench_all_llama3.1_8b.json
  6. Inspect the IA-1 task's details[] array — all entries show
    predicted: "unknown"

Expected behaviour

For each anomaly scenario, the agent should return a recognizable
structured fault type (e.g., bearing-wear, hydraulic-leak,
filter-clog) that can be canonicalized and compared against ground
truth. Some predictions may be wrong, but the agent should at minimum
emit a fault label from the taxonomy, producing F1 > 0.0.

Actual behaviour

All three scenarios returned predicted: "unknown":

Asset Signal Ground truth Prediction
motor_01 vibration_rms bearing-wear unknown
press_01 clamp_pressure_bar hydraulic-leak unknown
hydraulic_01 filter_dp_bar filter-clog unknown

Result: F1=0.000. Run is flagged reliable: true (no exceptions, just
genuine "unknown" responses from the LLM).

Likely causes:

  • Agent's prompt template doesn't constrain output to the structured
    fault taxonomy
  • Llama 3.1 8B may not be capable of reliable structured output for this
    task
  • Response parser may be too strict in extracting the label

Suggested next step: inspect the agent's raw LLM output for one scenario,
then either tighten the prompt with a JSON schema constraint, add few-shot
examples, or document Llama 3.1 8B as insufficient for IA-1.

Reference: benchmarks/results/iabench_all_llama3.1_8b.json (2026-06-07
UTC run)

Framework version

v0.1.0-pre (bench/iabench-foundation @ 967baf6)

LLM provider

ollama

Environment

Windows 11, Python 3.12, Llama 3.1 8B (Q4_K_M quant via ollama pull)

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions