Now with the --number-lines command line argument, it'd be nice to check systematically how much the score increases if more data is used for training. And also what's the upper bound on the amount of data we can use before GCP crashes with a memory error.