GitHub - copyrightly/EfficientML: Course material of MIT's EfficientML https://hanlab.mit.edu/courses/2023-fall-65940

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
Lab5		Lab5
Demo_Pruning.ipynb		Demo_Pruning.ipynb
Lab0.ipynb		Lab0.ipynb
Lab1.ipynb		Lab1.ipynb
Lab2.ipynb		Lab2.ipynb
Lab3.ipynb		Lab3.ipynb
Lab4.ipynb		Lab4.ipynb
README.md		README.md

Repository files navigation

Course website: https://hanlab.mit.edu/courses/2023-fall-65940

List of papers:

Neural Architecture Search(NAS):

Early NAS methods using RNN-based controllers

Neural Architecture Search with Reinforcement Learning, ICLR 2017
Learning Transferable Architectures for Scalable Image Recognition NASNet, CVPR 2018
MnasNet: Platform-Aware Neural Architecture Search for Mobile

Differentiable NAS methods

DARTS: Differentiable Architecture Search, ICLR 2019
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware, ICLR 2019
FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search, CVPR 2019
Single Path One-Shot Neural Architecture Search with Uniform Sampling, ECCV 2020

State of the art (used in the lab's notebook)

Once-for-All: Train One Network and Specialize it for Efficient Deployment

MCUNets

COCO datasets

Microsoft COCO: Common Objects in Context

Inverted MobileNet blocks

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Efficiency constraints in real world

Visual Wake Words with TensorFlow Lite Micro TensorFlow Blog

TinyEngine and Parallel Processing:

MCUNet: Tiny Deep Learning on IoT Devices [Lin et al., NeurIPS 2020]
On-Device Training Under 256KB Memory [Lin et al., NeurIPS 2022]
Parallel Computing Tutorial
What Is Multithreading In OS? Understanding The Details
Multithreading Models in Operating System
Stanford CS 149: PARALLEL COMPUTING

Vision Transformer:

An image is worth 16x16 words
Segment Anything Model (SAM)
- Project page
Segment Anything Model 2 (SAM 2)
- Project page | Blog | Paper | Code
EfficientViT: multi-scale linear attention
Flamingo: a Visual Language Model for Few-Shot Learning [Alayrac et al., 2022]
PaLM-E: An Embodied Multimodal Language Model [Driess et al., 2022]

GAN, Video, and Point Cloud

Generative Adversarial Networks [Goodfellow et al., 2014]
Overview of GAN Structure

Diffusion Model

Tutorial
Denoising Diffusion Probabilistic Models, [Ho et al., NeuriPS 2020]
Generative Modeling by Estimating Gradients of the Data Distribution
CMU 16-726
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
High-Resolution Image Synthesis with Latent Diffusion Models
Adding Conditional Control to Text-to-Image Diffusion Models [Zhang et al., ICCV 2023]
Classifier-Free Diffusion Guidance [Ho & Salimans, 2021]
High-Resolution Image Synthesis with Latent Diffusion Models [Rombach et al., CVPR 2022]
Denoising Diffusion Implicit Models [Song et al., ICLR 2021]
On Distillation of Guided Diffusion Models [Meng et al., CVPR 2023]
Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models [Li et al., NeurIPS 2022]
Q-Diffusion: Quantizing Diffusion Models [Li et al., ICCV 2023]

Distributed Training

Scaling Distributed Machine Learning with the Parameter Server. Mu Li et al. 2014
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
DeepSpeed: Extreme-scale model training for everyone
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning [Zheng et al. 2022]
Sparse communication for distributed gradient descent [Alham Fikri et al 2017]
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training [Lin et al 2017]
Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes [Sun et al 2019]
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization [Vogels et al 2019]
signSGD with Majority Vote is Communication Efficient and Fault Tolerant [Bernstein et al 2019]
ATOMO: Communication-efficient Learning via Atomic Sparsification [Wang et al 2018]
1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs [Frank 2014]
Scalable distributed DNN training using commodity GPU cloud computing [Nikko 2015]
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
Delayed Gradient Averaging: Tolerate the Communication Latency in Federated Learning. [Zhu 2021]

About

Course material of MIT's EfficientML https://hanlab.mit.edu/courses/2023-fall-65940

Report repository

Releases

No releases published

Packages

No packages published

Languages