I've notices that matrix factorisation is using multithreading with threads that are called one after each other. I also did some time measurements and seems that the number of threads (1 or multiple) has no impact on the performance. Am I missing something or is the multithreading just an old code that remained due to the attempt to parallelise mini-batch SGD?