A collection of project for MLOPS beginners
- https://github.com/visenger/awesome-mlops
- https://github.com/kelvins/awesome-mlops
- https://github.com/techiescamp/how-to-mlops
- A framework for continuous delivery and automation of machine learning (Google) https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf
- 探索推荐系统中的大模型训练策略与解决方案: https://zhuanlan.zhihu.com/p/637952505
- Paper: https://www.cs.cmu.edu/~muli/file/ps.pdf
- PS-lite: https://github.com/dmlc/ps-lite
- 一文讲清 NCCL 集合通信原理与优化 https://zhuanlan.zhihu.com/p/720502061
- NVIDIA Collective Communication Library (NCCL) Documentation https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/index.html
- 了解 NCCL 调优以加速 GPU 之间的通信 https://developer.nvidia.com/zh-cn/blog/understanding-nccl-tuning-to-accelerate-gpu-to-gpu-communication/
https://mlops-for-all.github.io/en/docs/introduction/why_kubernetes