ThinkBrake: Mitigating Overthinking in Tool Reasoning

Example of overthinking in tool reasoning: the model reaches a correct solution but continues reasoning and produces an incorrect final output.

Small reasoning models (SRMs) often overthink during tool use: they reach a correct tool-argument configuration, then continue reasoning and overwrite it with an incorrect final call. ThinkBrake is a training-free decoding heuristic that addresses this issue by monitoring the log-probability margin between </think> and the current top token at sentence boundaries, triggering early termination when the margin becomes small.

Installation

Quick Start

1. Clone the repository

https://github.com/holi-lab/ThinkBrake.git
cd think-brake

2. Install dependencies

Using uv (recommended):

bash install.sh

Or manually:

uv init . -p 3.10
uv venv -p 3.10

source .venv/bin/activate
pip install -e .
pip install bfcl-eval vllm flashinfer-python

Usage

Environment Setup

Set the output directory for experiment results:

export THINK_BRAKE_PROJECT_ROOT=/path/to/your/outputs

Running Experiments

1. Generate Predictions

Run the generation script with your desired model and test categories:

python scripts/generate.py \
  --model Qwen/Qwen3-4B-Thinking-2507 \
  --test-category non_live live \
  --temperature 0.7 \
  --gpu-memory-utilization 0.95

Available test categories:

non_live: Simple function calling tasks
live: Live function calling tasks
single_turn: All single-turn categories
Individual categories: simple_python, simple_java, simple_javascript, multiple, parallel, parallel_multiple, live_simple, live_multiple, live_parallel, live_parallel_multiple

2. Evaluate Results

Evaluate the generated predictions:

python scripts/evaluate.py \
  --model Qwen/Qwen3-4B-Thinking-2507 \
  --test-category non_live live \
  --threshold 0.25

Using Shell Scripts

For convenience, you can use the provided shell scripts:

# Generate predictions
bash run_generate.sh

# Evaluate results
bash run_evaluate.sh

Edit these scripts to customize other parameters.

Supported Models

Currently supported models:

Qwen3-4B-Thinking-2507: Qwen/Qwen3-4B-Thinking-2507
Qwen3-0.6B: Qwen/Qwen3-0.6B
Qwen3-1.7B: Qwen/Qwen3-1.7B
Qwen3-8B: Qwen/Qwen3-8B

Citation

@article{oh2025thinkbrake,
  title={ThinkBrake: Mitigating Overthinking in Tool Reasoning}, 
  author={Minjae Oh and Sangjun Song and Seungkyu Lee and Sungmin Jo and Yohan Jo},
  year={2025},
  eprint={2510.00546},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2510.00546}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
constants		constants
misc		misc
model		model
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
run_evaluate.sh		run_evaluate.sh
run_generate.sh		run_generate.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ThinkBrake: Mitigating Overthinking in Tool Reasoning

Installation

Quick Start

1. Clone the repository

2. Install dependencies

Or manually:

Usage

Environment Setup

Running Experiments

1. Generate Predictions

2. Evaluate Results

Using Shell Scripts

Supported Models

Citation

About

Uh oh!

Releases

Packages

Languages

License

holi-lab/ThinkBrake-v1

Folders and files

Latest commit

History

Repository files navigation

ThinkBrake: Mitigating Overthinking in Tool Reasoning

Installation

Quick Start

1. Clone the repository

2. Install dependencies

Or manually:

Usage

Environment Setup

Running Experiments

1. Generate Predictions

2. Evaluate Results

Using Shell Scripts

Supported Models

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages