In lines 165-170 of engine_pretraining.py, why is the loss2 calculated based on y and r? This is inconsistent with the content of the paper.