Tree-of-Thought Retrieval: Enhancing Multi-Hop Question-Answering Beyond Chain-of-Thought

This repository provides the implementation of Tree-of-Thought Retrieval (ToTR) and Self-Consistency Retrieval (SCR), two novel frameworks designed to enhance retrieval-augmented generation for knowledge-intensive multi-hop question answering tasks. By exploring diverse reasoning paths, these methods aim to improve retrieval coverage and robustness, addressing limitations in existing approaches.

Instructions for Reproducing the Paper Results

This section describes how to reproduce the results presented in the paper in detail.

Important: Reproducing the results from scratch involves downloading the corpora used by the datasets, setting up a vector database (Elasticsearch), ingesting the corpora into the vector database, and running inference on relatively large LMs, which may take up to a few days in total. Moreover, you need to download and run the LLMs and, thus, need to make sure you have sufficient disk storage and GPU memory to store the models. If these constraints prevent you from running the code, we encourage you to download raw predictions and results and inspect them instead. You can use the following command to download the results and skip to step 7. You may also need to install necessary dependencies first (see Installation).

bash scripts/download_results.sh

To reproduce the paper results,

Install the dependencies following the Installation section. You also need to install the vllm dependency group, as our config files assume that you use vLLM's OpenAI-compatible server.
Set up a vector database (Elasticsearch) following the Preparing Database for RAG section. We recommend you to build Elasticsearch indices only for the datasets to be used, namely, hotpotqa, multihoprag, and musique.
Run Elasticsearch in the background. (This should already be done when preparing the database.)
Run vLLM's OpenAI-compatible server in the server using the following command:
```
bash scripts/serve_vllm.sh
```
Run the benchmark code following the Running benchmarks section. Note that in order to use a different LM, you have to modify the scripts/serve_vllm.sh file to use the desired LM and change the model name in benchmark/bench.py. The prediction and performance results will be saved to the results directory.
You may also run the code for the ablation study using the following command
```
python benchmark/ablation.py --verbose
```
Note that you also need to modify the model name in the benchmark/ablation.py manually for different models to be evaluated.
Run the Python notebook, notebooks/results.ipynb, to generate plots for the paper results. Note that since ToTR, SCR, and ReAct employ temperature sampling and asynchronous execution, it is very complicated to obtain deterministic results. Therefore, your reproduced results may be slightly different from the ones in the paper.

Installation

Clone this repository.

git clone https://github.com/kaiitunnz/totr.git
cd totr

Create and activate a Conda environment

conda create -n totr python=3.11 -y
conda activate totr

Install Poetry for dependency management.
```
pip install poetry
```
Run the following command to install all the dependencies.
```
poetry install --with lint,vllm
```
Note that lint is optional but strongly recommended for code linting and formatting. On the other hand, you can remove vllm if you do not plan to use vLLM as the LLM server.
Install SpaCy.
```
python -m spacy download en_core_web_sm
```
Set up a RAG database following the Preparing Database for RAG section.

Preparing Database for RAG

Install Elasticsearch 7.10 (source: IRCoT). See the following options:

MacOS (Homebrew)

# source: https://www.elastic.co/guide/en/elasticsearch/reference/current/brew.html
brew tap elastic/tap
brew install elastic/tap/elasticsearch-full # if it doesn't work: try 'brew untap elastic/tap' first: untap>tap>install.

To run the server,

brew services start elastic/tap/elasticsearch-full # to start the server
brew services stop elastic/tap/elasticsearch-full # to stop the server

MacOS (wget)

# source: https://www.elastic.co/guide/en/elasticsearch/reference/current/targz.html
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-darwin-x86_64.tar.gz
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-darwin-x86_64.tar.gz.sha512
shasum -a 512 -c elasticsearch-7.10.2-darwin-x86_64.tar.gz.sha512
tar -xzf elasticsearch-7.10.2-darwin-x86_64.tar.gz
rm elasticsearch-7.10.2-linux-x86_64.tar.gz elasticsearch-7.10.2-linux-x86_64.tar.gz.sha512

To run the server,

cd elasticsearch-7.10.2/
./bin/elasticsearch # start the server
pkill -f elasticsearch # to stop the server

Linux (wget)

# source: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/targz.html
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-linux-x86_64.tar.gz
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-linux-x86_64.tar.gz.sha512
shasum -a 512 -c elasticsearch-7.10.2-linux-x86_64.tar.gz.sha512
tar -xzf elasticsearch-7.10.2-linux-x86_64.tar.gz
rm elasticsearch-7.10.2-linux-x86_64.tar.gz elasticsearch-7.10.2-linux-x86_64.tar.gz.sha512

To run the server,

cd elasticsearch-7.10.2/
./bin/elasticsearch # start the server
pkill -f elasticsearch # to stop the server

Download the datasets using the following command. The downloaded files will be stored in the raw_data directory.
```
bash scripts/download_raw_data.sh
```
Build indices for the downloaded datasets in Elasticsearch. First, ensure that Elasticsearch is running in the background. Then, run the following command:
```
python -m totr.retriever.build_es_index --all
```
You can also choose to build indices for specific datasets. For example, using the following command:
```
python -m totr.retriever.build_es_index --datasets hotpotqa multihoprag musique
```

Project Structure

benchmark: Benchmarking tasks, utility code, and baselines' implementation.
configs: Configuration files for different models, datasets, and systems.
datasets: Dataset-related files.
notebooks: Useful notebooks, for example, results.ipynb, which generates plots for the paper results.
prompts: Collection of prompts used in the experiments.
results: Benchmark results.
scripts: Utility scripts for developing and running experiments.
src: ToTR's implementation.
tests: Various test cases. Currently contains only basic tests without unit testing. It can be a reference for how different functions and classes are used.

Utility scripts

scripts/format.sh: Script for code linting, formatting, type-checking, and spell-checking.

Usage:
```
bash scripts/format.sh --all
```
scripts/serve_tgi.sh: Script for starting the HuggingFace TGI server.

Usage:
```
bash scripts/serve_tgi.sh
```
scripts/serve_vllm.sh: Script for serving the vLLM server.

Usage:
```
bash scripts/serve_vllm.sh
```
scripts/sbatch_vllm.sh: Script for serving the vLLM server on Slurm cluster. See this section for example usage.

Running LLM server on Slurm cluster

Log in to your Slurm login node.

Clone this repository and set up the environment with the following commands:

# Clone this repository
git clone https://github.com/kaiitunnz/totr.git
cd totr

# Create and activate a Conda environment
conda create -n totr python=3.11 -y
conda activate totr

# Install the dependencies for running an LLM server
pip install poetry
poetry install --only vllm

Submit a batch job using the following command. You may need to set appropriate arguments for sbatch in scripts/sbatch_vllm.sh.
```
sbatch scripts/sbatch_vllm.sh
```
Check your allocated node with the following command:
```
squeue -u $USER
```
Log out from the Slurm login node and start ssh tunneling with the following command:
```
ssh -L 8010:<gpu-node>:8010 <user>@<login-node-address>
```
or in the background with the following command (in this case, you need to kill the process by yourself):
```
ssh -fN -L 8010:<gpu-node>:8010 <user>@<login-node-address>
```
Now you can run benchmarking scripts or connect to the LLM server from your local host at the following address: http://localhost:8010.
To stop the LLM server before it is timed out, run the following command with the appropriate job-id obtained from the squeue command.
```
scancel <job-id>
```

Running benchmarks

Run the following command:

python benchmark/bench.py --verbose --test

The benchmark results will be saved to the results directory. You may omit the --testflag if you want to perform validation instead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tree-of-Thought Retrieval: Enhancing Multi-Hop Question-Answering Beyond Chain-of-Thought

Table of Contents

Instructions for Reproducing the Paper Results

Installation

Preparing Database for RAG

Project Structure

Utility scripts

Running LLM server on Slurm cluster

Running benchmarks

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
benchmark		benchmark
configs		configs
datasets		datasets
notebooks		notebooks
prompts		prompts
results		results
scripts		scripts
src/totr		src/totr
tests		tests
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Tree-of-Thought Retrieval: Enhancing Multi-Hop Question-Answering Beyond Chain-of-Thought

Table of Contents

Instructions for Reproducing the Paper Results

Installation

Preparing Database for RAG

Project Structure

Utility scripts

Running LLM server on Slurm cluster

Running benchmarks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages