Course website: https://hanlab.mit.edu/courses/2023-fall-65940
Early NAS methods using RNN-based controllers
- Neural Architecture Search with Reinforcement Learning, ICLR 2017
- Learning Transferable Architectures for Scalable Image Recognition NASNet, CVPR 2018
- MnasNet: Platform-Aware Neural Architecture Search for Mobile
Differentiable NAS methods
- DARTS: Differentiable Architecture Search, ICLR 2019
- ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware, ICLR 2019
- FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search, CVPR 2019
- Single Path One-Shot Neural Architecture Search with Uniform Sampling, ECCV 2020
State of the art (used in the lab's notebook)
MCUNets
- MCUNet: Tiny Deep Learning on IoT Devices
- MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning
COCO datasets
Inverted MobileNet blocks
Efficiency constraints in real world
- Visual Wake Words with TensorFlow Lite Micro TensorFlow Blog
- MCUNet: Tiny Deep Learning on IoT Devices [Lin et al., NeurIPS 2020]
- On-Device Training Under 256KB Memory [Lin et al., NeurIPS 2022]
- Parallel Computing Tutorial
- What Is Multithreading In OS? Understanding The Details
- Multithreading Models in Operating System
- Stanford CS 149: PARALLEL COMPUTING
- An image is worth 16x16 words
- Segment Anything Model (SAM)
- Segment Anything Model 2 (SAM 2)
- Project page | Blog | Paper | Code
- EfficientViT: multi-scale linear attention
- Flamingo: a Visual Language Model for Few-Shot Learning [Alayrac et al., 2022]
- PaLM-E: An Embodied Multimodal Language Model [Driess et al., 2022]
- Generative Adversarial Networks [Goodfellow et al., 2014]
- Overview of GAN Structure
- Tutorial
- Denoising Diffusion Probabilistic Models, [Ho et al., NeuriPS 2020]
- Generative Modeling by Estimating Gradients of the Data Distribution
- CMU 16-726
- Deep Unsupervised Learning using Nonequilibrium Thermodynamics
- High-Resolution Image Synthesis with Latent Diffusion Models
- Adding Conditional Control to Text-to-Image Diffusion Models [Zhang et al., ICCV 2023]
- Classifier-Free Diffusion Guidance [Ho & Salimans, 2021]
- High-Resolution Image Synthesis with Latent Diffusion Models [Rombach et al., CVPR 2022]
- Denoising Diffusion Implicit Models [Song et al., ICLR 2021]
- On Distillation of Guided Diffusion Models [Meng et al., CVPR 2023]
- Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models [Li et al., NeurIPS 2022]
- Q-Diffusion: Quantizing Diffusion Models [Li et al., ICCV 2023]
- Scaling Distributed Machine Learning with the Parameter Server. Mu Li et al. 2014
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
- DeepSpeed: Extreme-scale model training for everyone
- Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning [Zheng et al. 2022]
- Sparse communication for distributed gradient descent [Alham Fikri et al 2017]
- Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training [Lin et al 2017]
- Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes [Sun et al 2019]
- PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization [Vogels et al 2019]
- signSGD with Majority Vote is Communication Efficient and Fault Tolerant [Bernstein et al 2019]
- ATOMO: Communication-efficient Learning via Atomic Sparsification [Wang et al 2018]
- 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs [Frank 2014]
- Scalable distributed DNN training using commodity GPU cloud computing [Nikko 2015]
- TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
- Delayed Gradient Averaging: Tolerate the Communication Latency in Federated Learning. [Zhu 2021]