ClueEval: Murder Mysteries as LLM Evals

ClueEval is a project designed to evaluate the reasoning capabilities of Large Language Models (LLMs) by challenging them to solve generated mystery stories.

⭐ If you find this tool useful, please consider starring the repository on GitHub! ⭐

Purpose

ClueEval creates mystery stories that theoretically test deductive reasoning abilities in solving.

How It Works

Story Generation: The system randomly generates a basic mystery, including a killer and victim, in story/random_details.py. Then, story/writer.py uses an LLM to create a unique murder mystery, giving each character their own story and perspective.
Narrative Creation: A detailed narrative is generated, including both the true events and misleading information.
Clue Assembly: The system compiles a set of clues, some relevant to solving the mystery and others serving as red herrings.
Prose: The set of clues is turned into prose.
Evaluation: Whodunnit? The clues contained in the prose should be enough to figure it out. These are fair play mysteries.

How to Run

Ensure you have Python 3.10+ installed on your system.

Clone this repository:

git clone https://github.com/yourusername/ClueEval.git
cd ClueEval

Install the required dependencies:
```
pip install -r requirements.txt
```
Set up your OpenAI API key as an environment variable:
```
export OPENAI_API_KEY='your-api-key-here'
```
Run the main script (interactive mode):
```
python main.py --interactive_mode
```
This will give you a mystery to solve. Read it and decide who you think is the killer!
Run the main script (generation mode):
```
python main.py 10
```
This will generate 10 mysteries, and store them in generated_questions. Beware that each generation will take a couple of minutes, as there is a lot of back and forth with an LLM.
Run lm_eval: For information on running the CLUE evaluation task using lm_eval, please refer to the README in the lm_eval/tasks/clue_eval/ directory.

Project Structure

story/: Contains the core logic for story generation and processing.
utils/: Utility functions, including GPT API interactions.
config/: Configuration files, including prompts and element lists.
lm_eval/: Contains the CLUE evaluation task and results. See the README in lm_eval/tasks/clue_eval/ for detailed information on running the evaluation.

Contributing

We welcome contributions to ClueEval! Please feel free to submit issues, feature requests, or pull requests.

Acknowledgments

Inspired by golden age mystery authors.
Narrative generation uses OpenAI's GPT models.
Anthropic Claude wrote most of the code, although I did some of the work too. I take responsibility for all the bugs.

Happy mystery solving!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ClueEval: Murder Mysteries as LLM Evals

Purpose

How It Works

How to Run

Project Structure

Contributing

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
config		config
generated_questions		generated_questions
lm_eval/tasks/clue_eval		lm_eval/tasks/clue_eval
story		story
story_creation_logs		story_creation_logs
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

qemqemqem/ClueEval

Folders and files

Latest commit

History

Repository files navigation

ClueEval: Murder Mysteries as LLM Evals

Purpose

How It Works

How to Run

Project Structure

Contributing

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages