Skip to content

Forgingalex/agentic-inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

agentic-inference

This repository investigates how iterative inference steps, self-reflection, and feedback loops affect model reasoning performance.

Why Multi-Step Inference Matters

Single-pass inference forces models to produce answers in one forward pass, which can lead to errors when complex reasoning is required. Multi-step inference allows models to refine their thinking by evaluating intermediate outputs and adjusting their approach. This is particularly relevant for tasks that benefit from self-correction, such as mathematical problem-solving where an initial attempt might contain calculation errors or logical missteps.

Agents

BasicAgent

A baseline agent that queries the model once and returns the answer without any retries or reflection. This serves as the single-pass inference baseline.

ReflexiveAgent

An agent that attempts a solution, evaluates whether the output seems correct using simple heuristics (checking for uncertainty keywords, very short responses, or question repetition), and retries with the previous output as context if uncertain. Limited to a maximum of 3 attempts.

Running Evaluation

  1. Run the evaluation script to compare both agents on the dataset:

    python benchmarks/evaluate.py

    This generates benchmarks/results.json with accuracy, inference step counts, and per-question results.

  2. Generate the visualization:

    python plots/plot_results.py

    This creates plots/reasoning_improvement.png showing the accuracy comparison.

Performance Observations

The reflexive agent typically uses more inference steps (1-3 per question) compared to the basic agent (always 1 step). Whether this translates to improved accuracy depends on the model's ability to self-correct and the effectiveness of the uncertainty detection heuristics. The trade-off between computational cost (more steps) and potential accuracy gains is a key consideration in evaluating these approaches.

About

A set of experiments showing how adding self-reflection and iterative inference steps can improve the reasoning ability of small language models on simple tasks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages