CombiBench

CombiBench is the first benchmark focused on combinatorial problems, based on the formal language Lean 4. CombiBench is a manually produced benchmark, including 100 combinatorial mathematics problems of varying difficulty and knowledge levels. It aims to provide a benchmark for evaluating the combinatorial mathematics capabilities of automated theorem proving systems to advance the field. For problems that require providing a solution first and then proving its correctness, we have referred to the style of PutnamBench.

We are hosting a leaderboard and will readily receive evaluation results which are accompanied by a preprint or publication. Please reach out privately at [email protected] with any requests for additions to the leaderboard.

Statistics

We collected all combinatorics problems from the official IMO problems since 2000, except for one problem that relies on a figure. And We selected problems through random sampling from 14 chapters in the book, choosing 3 problems from each chapter, ensuring that the 42 problems are evenly distributed across all 14 chapters. We randomly selected 10 simple combinatorics problems at the middle school level from a mathematics problem collection website hackmath. Then, we randomly collected 12 problems from other mathematics competitions.

Source	Count
Hackmath	10
Brualdi's book	42
IMO	36
APMO	2
Balticway	1
EGMO	1
IMO-Shortlist	4
IZHO	2
BXMO	1
USAMO	1

Note : The complete proofs of Problem 3 and Problem 5 from IMO 2024 have already been formalized in mathlib4/Archive/Imo2024Q3 and mathlib4/Archive/Imo2024Q5. Therefore, we directly refer to the statements of these problems, along with the necessary definitions used in the statements. We are very grateful to Joseph Myers, the author of these two problems. We also appreciate his suggestions on the formalization of our problems.

Requirements

This project requires python >= 3.10, Lean>=4.15.0.

Install the Python dependencies:

pip install -e .

or

uv venv .venv --prompt combibench
source .venv/bin/activate
uv sync

Usage

Setup a Lean Server

Follow https://github.com/project-numina/kimina-lean-server to configure a lean server and get a custom url and API_KEY.

Setup a LLM API Key

We support API interfaces such as OpenAI, Antropic, TogetherAI, and Google GenerativeAI.

Configuration

Refer to evaluation/config/template.json5 to configure the dataset, lean server, llm server, generation parameters, prompt, and parallel parameters.

Run Evaluation

To run one-stage Fine-Eval:

python evaluation/online_one_stage.py --config evaluation/config/template.json5

To run two-stage Fine-Eval:

python evaluation/online_two_stage.py --config evaluation/config/template.json5

Note that both evaluation methods are compatible with theorem proving tasks and fill-in-the-blank tasks.

🙌 Contributing

Contributions are welcome! If anyone notices any mistakes, please raise an issue on the repository and we will address it.

📝 License

This project is licensed under the MIT License. See the LICENSE file for full details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
evaluation		evaluation
lean		lean
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CombiBench.pdf		CombiBench.pdf
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CombiBench

Statistics

Requirements

Usage

Setup a Lean Server

Setup a LLM API Key

Configuration

Run Evaluation

🙌 Contributing

📝 License

About

Releases

Packages

Contributors 2

Languages

License

MoonshotAI/CombiBench

Folders and files

Latest commit

History

Repository files navigation

CombiBench

Statistics

Requirements

Usage

Setup a Lean Server

Setup a LLM API Key

Configuration

Run Evaluation

🙌 Contributing

📝 License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages