A fine-tuned language model for generating game dialogue in the style of Horizon Dawn.
Demo shows the dialogue generation process with different character and scene inputs
This project uses a fine-tuned GPT-2 small model (124M parameters) to generate game dialogue for different scenes and characters. It includes:
- Data processing for game dialogues
- Model fine-tuning on a dataset of game dialogues
- A FastAPI web API for serving the model
- A simple web interface for generating dialogue
HorizonDawn-Dialogue-Generator/
├── api/
│ ├── main.py # FastAPI application
│ └── dialogue_routes.py # API routes for dialogue generation
├── data/
│ ├── raw/ # Raw JSON dialogue files
│ ├── processed/ # Processed CSV files
│ └── process_data.py # Data processing script
├── models/
│ ├── train.py # Training script
│ ├── dialogue_generator_small/ # Smaller model checkpoint
│ └── dialogue_generator_full/ # Full model checkpoint
├── web/
│ ├── static/ # CSS, JS files
│ └── templates/ # HTML templates
├── requirements.txt # Project dependencies
└── run.py # Script to run the application
- Python 3.8+
- PyTorch
- Transformers
- FastAPI
- Uvicorn
-
Clone the repository:
git clone https://github.com/yourusername/HorizonDawn-Dialogue-Generator.git cd HorizonDawn-Dialogue-Generator -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
- Place your JSON dialogue files in the
data/raw/directory - Run the processing script:
This will create structured CSV files in
cd data python process_data.pydata/processed/
- Train the full model:
cd models python train.py
-
Start the FastAPI server:
python run.py
Or run it directly with uvicorn:
uvicorn api.main:app --reload
-
Access the web interface by opening your browser and navigating to:
http://localhost:8000 -
Access the API documentation at:
http://localhost:8000/docs
You can make POST requests to the API endpoint:
curl -X POST "http://localhost:8000/api/generate_dialogue" \
-H "Content-Type: application/json" \
-d '{"scene":"Forest Encounter", "character":"Aloy", "length":200}'- Create a testing script (example):
import torch from transformers import AutoModelForCausalLM, AutoTokenizer def generate_dialogue(scene_name, model_path="models/dialogue_generator_full"): # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path) # Create prompt prompt = f"Generate dialogue for scene '{scene_name}':" # Generate text inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate( **inputs, max_length=200, temperature=0.8, do_sample=True, top_p=0.92, no_repeat_ngram_size=2 ) # Decode and print result = tokenizer.decode(outputs[0], skip_special_tokens=True) return result # Example usage dialogue = generate_dialogue("Forest Encounter") print(dialogue)
- Base: GPT-2 (124M parameters)
- Training: 2 epochs on 20 examples
- Use case: Quick testing and development
- Base: GPT-2 (124M parameters)
- Training: 5 epochs on 100 examples
- Use case: Production-ready dialogue generation
- Input: "Generate dialogue for scene 'Forest Encounter':"
- Output: [Model-generated dialogue based on the scene prompt]
- The model is trained using causal language modeling
- Data processing handles multiple JSON formats for flexibility
- Compatible with Apple Silicon's MPS acceleration
- Handles dialogue formatting with proper speaker attribution
- Implement web interface for easy dialogue generation
- Add support for larger models (GPT-2 Medium/Large)
- Expand training dataset with more diverse dialogue examples
This project is licensed under the MIT License. See the LICENSE file for more details.
