Skip to content

Latest commit

 

History

History
8 lines (5 loc) · 629 Bytes

README.md

File metadata and controls

8 lines (5 loc) · 629 Bytes

GPT

Toy implementation of gpt2. Because language models are cool.

Due to compute restraints I can not train the full size GPT-2 model. The largest one I could train is a 352M variant (can be run with the train.sh script) and it converged to a loss of 3.1, which is alright.

image

The total footprint of the code is quite small so it is fairly easy to modifie. train.py exposes a cli to set all the hyperparameters of the model to make it easy to train and iterate on model.