GPT-from-scratch

This project implements a Bigram Language Model from scratch using PyTorch. The model is trained to generate text based on a given dataset.

Project Structure

train.py: Script to train the language model.
model.py: Defines the architecture of the Bigram Language Model.
data_loader.py: Handles data loading and preprocessing.
config.py: Contains configuration settings and hyperparameters.
main.py: Entry point to start the training process.
logs/: Directory to store training logs and hyperparameter tuning results.
experiments/: Directory to store experiment scripts and configurations.
checkpoints/: Directory to save model checkpoints during training.

Requirements

Python 3.7+
PyTorch
Other dependencies listed in requirements.txt

Installation

Clone the repository:

git clone https://github.com/tedoaba/GPT-from-scratch.git
cd GPT-from-scratch

Create a virtual environment and activate it:

python -m venv .venv
source .venv/Scripts/activate

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Prepare your dataset:
- Place your text data in a file named data.txt in the project root directory.
Configure the model:
- Adjust the hyperparameters and settings in config.py as needed.
Train the model:
```
python main.py
```
Generate text:
- After training, the model will generate text based on the learned patterns from the dataset.
Tune hyperparameters:
- Run the hyperparameter tuning script to find the best parameters for training:
```
python experiments/hyperparameter_tuning.py
```
- The best hyperparameters will be saved in logs/hyperparameters/best_hyperparameters.txt

Configuration

The config.py file contains various settings and hyperparameters for the model:

batch_size: Number of samples per batch.
block_size: Length of the context window for training.
max_iters: Maximum number of training iterations.
eval_interval: Interval for evaluating the model on validation data.
learning_rate: Learning rate for the optimizer.
device: Device to run the model on (cuda or cpu).
eval_iters: Number of iterations for evaluation.
n_embed: Size of the embedding vectors.
num_head: Number of attention heads.
n_layer: Number of layers in the model.
dropout: Dropout rate for regularization.
seed: Random seed for reproducibility.
data_path: Dataset path for training.

Model Architecture

The model consists of the following components:

Embedding Layers: Token and position embeddings.
Attention Mechanism: Multi-head self-attention.
Feedforward Network: Two-layer feedforward network.
Stacked Blocks: Multiple layers of attention and feedforward blocks.
Output Layer: Linear layer to predict the next token.

Training

The training process involves:

Loading and preprocessing the data.
Initializing the model and optimizer.
Iteratively training the model and evaluating its performance.
Generating text from the trained model.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

Inspired by the GPT architecture and various tutorials on language modeling with PyTorch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPT-from-scratch

Project Structure

Requirements

Installation

Usage

Configuration

Model Architecture

Training

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
checkpoints		checkpoints
data		data
experiments		experiments
logs		logs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

License

tedoaba/GPT-from-scratch

Folders and files

Latest commit

History

Repository files navigation

GPT-from-scratch

Project Structure

Requirements

Installation

Usage

Configuration

Model Architecture

Training

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages