agentic-inference

This repository investigates how iterative inference steps, self-reflection, and feedback loops affect model reasoning performance.

Why Multi-Step Inference Matters

Single-pass inference forces models to produce answers in one forward pass, which can lead to errors when complex reasoning is required. Multi-step inference allows models to refine their thinking by evaluating intermediate outputs and adjusting their approach. This is particularly relevant for tasks that benefit from self-correction, such as mathematical problem-solving where an initial attempt might contain calculation errors or logical missteps.

Agents

BasicAgent

A baseline agent that queries the model once and returns the answer without any retries or reflection. This serves as the single-pass inference baseline.

ReflexiveAgent

An agent that attempts a solution, evaluates whether the output seems correct using simple heuristics (checking for uncertainty keywords, very short responses, or question repetition), and retries with the previous output as context if uncertain. Limited to a maximum of 3 attempts.

Running Evaluation

Run the evaluation script to compare both agents on the dataset:
```
python benchmarks/evaluate.py
```
This generates benchmarks/results.json with accuracy, inference step counts, and per-question results.
Generate the visualization:
```
python plots/plot_results.py
```
This creates plots/reasoning_improvement.png showing the accuracy comparison.

Performance Observations

The reflexive agent typically uses more inference steps (1-3 per question) compared to the basic agent (always 1 step). Whether this translates to improved accuracy depends on the model's ability to self-correct and the effectiveness of the uncertainty detection heuristics. The trade-off between computational cost (more steps) and potential accuracy gains is a key consideration in evaluating these approaches.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agentic-inference

Why Multi-Step Inference Matters

Agents

BasicAgent

ReflexiveAgent

Running Evaluation

Performance Observations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agents		agents
benchmarks		benchmarks
models		models
plots		plots
tasks		tasks
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

agentic-inference

Why Multi-Step Inference Matters

Agents

BasicAgent

ReflexiveAgent

Running Evaluation

Performance Observations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages