diff --git a/DEVELOPERS.md b/DEVELOPERS.md
index 0992c7026..d23c9d05f 100644
--- a/DEVELOPERS.md
+++ b/DEVELOPERS.md
@@ -30,6 +30,7 @@ Download the model data
 python3 download_model.py 124M
 python3 download_model.py 355M
 python3 download_model.py 774M
+python3 download_model.py 1558M
 ```
 
 ## Docker Installation
diff --git a/Dockerfile.cpu b/Dockerfile.cpu
index c923234a3..b6e4f9496 100644
--- a/Dockerfile.cpu
+++ b/Dockerfile.cpu
@@ -8,3 +8,4 @@ RUN pip3 install -r requirements.txt
 RUN python3 download_model.py 124M
 RUN python3 download_model.py 355M
 RUN python3 download_model.py 774M
+RUN python3 download_model.py 1558M
diff --git a/Dockerfile.gpu b/Dockerfile.gpu
index e59880e5d..5ac049aff 100644
--- a/Dockerfile.gpu
+++ b/Dockerfile.gpu
@@ -17,3 +17,4 @@ RUN pip3 install -r requirements.txt
 RUN python3 download_model.py 124M
 RUN python3 download_model.py 355M
 RUN python3 download_model.py 774M
+RUN python3 download_model.py 1558M
diff --git a/README.md b/README.md
index 1b2d5e81a..048b4c659 100644
--- a/README.md
+++ b/README.md
@@ -2,11 +2,11 @@
 
 # gpt-2
 
-Code from the paper ["Language Models are Unsupervised Multitask Learners"](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).
+Code and models from the paper ["Language Models are Unsupervised Multitask Learners"](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).
 
-We have currently released small (124M parameter), medium (355M parameter), and large (774M parameter) versions of GPT-2<sup>*</sup>, with only the full model as of yet unreleased.  We have also [released a dataset](https://github.com/openai/gpt-2-output-dataset) for researchers to study their behaviors.
+You can read about GPT-2 and its staged release in our [original blog post](https://blog.openai.com/better-language-models/), [6 month follow-up post](https://openai.com/blog/gpt-2-6-month-follow-up/), and [final post](https://www.openai.com/blog/gpt-2-1-5b-release/).
 
-You can read about GPT-2 and release decisions in our [original blog post](https://blog.openai.com/better-language-models/) and [6 month follow-up post](https://openai.com/blog/gpt-2-6-month-follow-up/).
+We have also [released a dataset](https://github.com/openai/gpt-2-output-dataset) for researchers to study their behaviors.
 
 <sup>*</sup> *Note that our original parameter counts were wrong due to an error (in our previous blog posts and paper).  Thus you may have seen small referred to as 117M and medium referred to as 345M.*