From 42d4613d5a3127ae5ebff21e88603459d1f0f9e0 Mon Sep 17 00:00:00 2001
From: Hao Li <haoli@cs.umd.edu>
Date: Mon, 13 Jun 2016 01:43:03 -0700
Subject: [PATCH] split deep_learning.md into serveral parts

---
 deep_learning.md | 113 -----------------------------------------------
 dl_cnn.md        |  54 ++++++++++++++++++++++
 dl_opt.md        |  21 +++++++++
 dl_sys.md        |  44 ++++++++++++++++++
 4 files changed, 119 insertions(+), 113 deletions(-)
 delete mode 100644 deep_learning.md
 create mode 100644 dl_cnn.md
 create mode 100644 dl_opt.md
 create mode 100644 dl_sys.md

diff --git a/deep_learning.md b/deep_learning.md
deleted file mode 100644
index 5ecf660..0000000
--- a/deep_learning.md
+++ /dev/null
@@ -1,113 +0,0 @@
-#Deep Learning
-
-### Distributed System
-- Google  
-	2015 [TensorFlow:Large-Scale Machine Learning on Heterogeneous Distributed Systems](http://download.tensorflow.org/paper/whitepaper2015.pdf) [[slides 1]](http://static.googleusercontent.com/media/research.google.com/en//people/jeff/BayLearn- 2015.pdf) [[slides 2]](http://vision.stanford.edu/teaching/cs231n/slides/jon_talk.pdf)  		
-	2012 NIPS [Large Scale Distributed Deep Networks](http://static.googleusercontent.com/media/research.google.com/en/us/archive/large_deep_networks_nips2012.pdf) (DistBelief)  
-	2014 NIPSW [Techniques and Systems for Training Large Neural Networks Quickly](http://stanford.edu/~rezab/nips2014workshop/slides/jeff.pdf)  
-- Microsoft 	
-	2014 OSDI [Project Adam: Building an Efficient and Scalable Deep Learning Training System](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdf)  
-- Baidu / Stanford  
- 	2015 arXiv [Deep Speech 2: End-to-End Speech Recognition in English and Mandarin](http://arxiv.org/abs/1512.02595)  
-	2015 arXiv [Deep Image: Scaling up Image Recognition](http://arxiv.org/abs/1501.02876)  
-	2013 ICML [Deep learning with COTS HPC systems](http://jmlr.org/proceedings/papers/v28/coates13.pdf)  
-- Yahoo!  
-	2015 [Large Scale Distributed Deep Learning on Hadoop Clusters](http://yahoohadoop.tumblr.com/post/129872361846/large-scale-distributed-deep-learning-on-hadoop)  
-- DMLC		
-	2015 NIPSW [MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems](http://www.cs.cmu.edu/~muli/file/mxnet-learning-sys.pdf) [[GTC'16  Tutorial](http://www.cs.cmu.edu/~muli/file/mxnet_gtc16.pdf)]    
-	2014 NIPSW [Minerva: A Scalable and Highly Efficient Training Platform for Deep Learnin](http://stanford.edu/~rezab/nips- 2014workshop/submits/minerva.pdf)  
-	2014 ICLR [Purine: A bi-graph based deep learning framework](http://arxiv.org/abs/1412.6249)  
-- Petuum  
-	2016 EuroSys [STRADS: A Distributed Framework for Scheduled Model Parallel Machine Learning](http://www.istc-cc.cmu.edu/publications/papers/2016/strads-kim-eurosys16.pdf)   
-
-### Parallization
-- 2015 arXiv [Fast Algorithms for Convolutional Neural Networks](http://arxiv.org/abs/1509.09308) (Winograd) [Blog](http://www.nervanasys.com/winograd/)    
-- 2015 Intel [Single Node Caffe Scoring and Training on Intel® Xeon E5-Series Processors](https://software.intel.com/en-us/articles/single-node-caffe-scoring-and-training-on-intel-xeon-e5-series-processors)  
-- 2015 arXiv [Caffe con Troll: Shallow Ideas to Speed Up Deep Learning](http://arxiv.org/abs/1504.04343)  
-- 2015 ICMLW [Massively Parallel Methods for Deep Reinforcement Learning](https://8109f4a4-a-62cb3a1a-s-sites.googlegroups.com/site/deeplearning- 2015/1.pdf?attachauth=ANoY7cocCvmoqZlkfUFQkSwV8fULURfVSzDdFv0dyk8uU1ztfeCHFIK4Kb6JoEQ3iZLUiYBynddwePUhd-3ssJZkANn-PXFU7m1U_wE5Eb4eHbZj3YR41bLF1AEr5T5EDth97i9DdkipHses1XTMDu_wpw8zs0-RGb7WVQRF8ZOhvG1AW47CRkAI8X0iv-oLtWy9fGSSa-JR9JpSwFUtjt_0_UXu4BUUwg==&attredirects=0)  
-- 2015 arXiv [Convolutional Neural Networks at Constrained Time Cost](http://arxiv.org/pdf/1412.1710v1.pdf)  
-- 2014 arXiv [One weird trick for parallelizing convolutional neural networks](http://arxiv.org/pdf/1404.5997v2.pdf)  
-- 2014 NIPS [On the Computational Efficiency of Training Neural Networks](http://papers.nips.cc/paper/5267-on-the-computational-efficiency-of-training-neural-networks.pdf)  
-- 2011 NIPSW [Improving the speed of neural networks on CPUs](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37631.pdf)
-
-
-### CNNs Architectures
-
-##### ImageNet Records  
-- 2016 arXiv [Identity Mappings in Deep Residual Networks](http://arxiv.org/pdf/1603.05027v1.pdf)    
-- 2016 arXiv [Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](http://arxiv.org/abs/1602.07261) (Inception V4)  
-- 2015 arXiv [Deep Residual Learning for Image Recognition](http://arxiv.org/abs/1512.03385) (ResNet)     
-- 2015 arXiv [Rethinking the Inception Architecture for Computer Vision](http://arxiv.org/abs/1512.00567) (Inception V3)  
-- 2015 ICML [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](http://jmlr.org/proceedings/papers/v37/ioffe15.pdf) (Inception V2)  
-- 2015 ICCV [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](http://research.microsoft.com/en-us/um/people/kahe/publications/iccv15imgnet.pdf) (PReLU)  
-- 2015 ICLR [Very Deep Convolutional Networks For Large-scale Image Recognition](http://arxiv.org/abs/1409.1556) (VGG)  
-- 2015 CVPR [Going Deeper with Convolutions](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43022.pdf) (GoogleNet/Inception V1)   
-- 2012 NIPS [ImageNet Classification with Deep Convolutional Neural Networks](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) (AlexNet)  
-
-##### Arichitecture Design
- 
-- 2016 arXiv [Benefits of depth in neural networks](http://arxiv.org/abs/1602.04485)  
-- 2016 AAAI [On the Depth of Deep Neural Networks: A Theoretical View](http://arxiv.org/abs/1506.05232)  
-- 2016 arXiv [SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size](http://arxiv.org/abs/1602.07360)  
-- 2015 CVPR [Convolutional Neural Networks at Constrained Time Cost](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/He_Convolutional_Neural_Networks_2015_CVPR_paper.pdf)  
-- 2015 ICLR [FitNets: Hints for Thin Deep Nets](http://arxiv.org/pdf/1412.6550v4.pdf)  
-- 2014 NIPS [Do Deep Nets Really Need to be Deep?](http://papers.nips.cc/paper/5484-do-deep-nets-really-need-to-be-deep.pdf)  
-- 2014 ICLRW [Understanding Deep Architectures using a Recursive Convolutional Network](http://arxiv.org/abs/1312.1847)  
-- 2014 ECCV [Visualizing and Understanding Convolutional Networks](https://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf)  
-- 2009 ICCV [What is the Best Multi-Stage Architecture for Object Recognition?](http://yann.lecun.com/exdb/publis/pdf/jarrett-iccv-09.pdf)   
-- 1994 T-NN [SVD-NET: An Algorithm that Automatically Selects Network Structure](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=286929)  
-
-##### Network Binarization
-- 2016 arXiv [XNOR-Net: ImageNet Classification Using Binary
-Convolutional Neural Networks](http://arxiv.org/pdf/1603.05279v1.pdf)  
-- 2016 arXiv [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](http://arxiv.org/abs/1602.02830)  
-- 2015 NIPS [BinaryConnect: Training Deep Neural Networks with binary weights during propagations](https://papers.nips.cc/paper/5647-binaryconnect-training-deep-neural-networks-with-binary-weights-during-propagations.pdf)  
-   
-##### Model Compression / Parameter Pruning
-- 2016 ICLR [Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding](http://arxiv.org/abs/1510.00149)  
-- 2016 ICLR [Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications](http://arxiv.org/abs/1511.06530)
-- 2015 arXiv [Training CNNs with Low-Rank Filters for Efficient Image Classification](http://arxiv.org/abs/1511.06744)  
-- 2015 arXiv [Structured Pruning of Deep Convolutional Neural Networks](http://arxiv.org/abs/1512.08571)  
-- 2015 arXiv [Data-free parameter pruning for Deep Neural Networks](http://arxiv.org/abs/1507.06149)  
-- 2015 ICCV [An exploration of parameter redundancy in deep networks with circulant projections](http://felixyu.org/pdf/ICCV15_circulant.pdf)  
-- 2015 ICML [Compressing Neural Networks with the Hashing Trick](http://jmlr.org/proceedings/papers/v37/chenc15.pdf)  
-- 2015 NIPS [Learning both Weights and Connections for Efficient Neural Networks](http://arxiv.org/abs/1506.02626)  
-- 2014 arXiv [Compressing deep convolutional networks
-using vector quantization](http://arxiv.org/abs/1412.6115)  
-- 2014 NIPSW [Distilling the Knowledge in a Neural Network](https://fb56552f-a-62cb3a1a-s-sites.googlegroups.com/site/deeplearningworkshopnips- 2014/65.pdf?attachauth=ANoY7cr8J-eqASFdYZeOQK8d9aGCtxzQpaVNCcjKgt1THV7e9FKNuTlrH4QCPmgMg2jynAz3ehjOU_2q9SMsnBYZq3_Jlxf1NnWcBejaVZi4vNHZ41H2DK8R-MJsk3MqfMDXOfEPxhAAOwUBH7oE-EtEKDoYa-16eqZ5djaoT4VXdir383rikNv6YF68dhm84kw04VCzH5XpA_8ucgW3iBr77bkjaYvNvC6YsUuC3PyVEPIusOZaM94%3D&attredirects=0)   
-- 1989 NIPS [Optimal Brain Damage](http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf)  
-
-<!--##### Other
-- 2013 PAMI [Invariant Scattering Convolution Networks](http://www.di.ens.fr/data/publications/papers/pami-final.pdf)  -->
-
-### Optimization 
-- 2016 ICLR [Data-Dependent Path Normalization in Neural Networks](http://arxiv.org/pdf/1511.06747v4.pdf)  
-- 2016 Blog [An overview of gradient descent optimization algorithms](http://sebastianruder.com/optimizing-gradient-descent/index.html)  
-- 2015 arXiv [Adding Gradient Noise Improves Learning for Very Deep Networks](http://arxiv.org/abs/1511.06807)      
-- 2015 DL Summer School [Non-Smooth, Non-Finite, and Non-Convex Optimization](http://www.iro.umontreal.ca/~memisevr/dlss- 2015/- 2015_DLSS_NonSmoothNonFiniteNonConvex.pdf)  
-- 2015 NIPS [Training Very Deep Networks](http://papers.nips.cc/paper/5850-training-very-deep-networks.pdf)  
-- 2015 NIPS [Deep learning with Elastic Averaging SGD](https://www.cs.nyu.edu/~zsx/nips- 2015.pdf) (EASGD)  
-- 2015 CVPR [Convolutional Neural Networks at Constrained Time Cost](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/He_Convolutional_Neural_Networks_2015_CVPR_paper.pdf)  
-- 2015 ICMLW [Highway Networks](http://arxiv.org/pdf/1505.00387v2.pdf)  
-- 2015 ICLR [Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging](http://arxiv.org/pdf/1409.1556v6.pdf)  
-- 2015 ICLR [Adam: A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980) (Adam)   
-- 2015 AISTATS [Deeply-Supervised Nets](http://jmlr.org/proceedings/papers/v38/lee15a.pdf)  
-- 2014 JMLR [Dropout: A Simple Way to Prevent Neural Networks from
-Overfitting](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf) (Dropout)    
-- 2014 NIPS [Identifying and attacking the saddle point problem in high-dimensional non-convex optimization](http://papers.nips.cc/paper/5486-identifying-and-attacking-the-saddle-point-problem-in-high-dimensional-non-convex-optimization.pdf)  
-- 2014 OSLW [On the Computational Complexity of Deep Learning](http://lear.inrialpes.fr/workshop/osl- 2015/slides/osl- 2015_shalev_shwartz.pdf)  
-- 2013 ICML [On the importance of initialization and momentum in deep learning](http://www.cs.utoronto.ca/~ilya/pubs/- 2013/1051_2.pdf)  
-- 2011 ICML [On optimization methods for deep learning](http://ai.stanford.edu/~quocle/LeNgiCoaLahProNg11.pdf)  
-- 2010 AISTATS [Understanding the difficulty of training deep feedforward neural networks](http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf)  
-
-### RNN
-- 2016 arXiv [Architectural Complexity Measures of Recurrent Neural Networks](http://arxiv.org/abs/1602.08210)  
-- 2015 Andrej's Blog [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)  
-- 2015 ICML [An Empirical Exploration of Recurrent Network Architectures](http://jmlr.org/proceedings/papers/v37/jozefowicz15.pdf)  
-- 2013 ICML [On the difficulty of training recurrent neural networks](http://www.jmlr.org/proceedings/papers/v28/pascanu13.pdf)  
-
-### Applications  
-- 2015 CVPR [FaceNet: A Unified Embedding for Face Recognition and Clustering](http://arxiv.org/abs/1503.03832)  
-- 2012 CVPR [Towards Good Practice in Large-Scale Learning for Image Classification](http://hal.inria.fr/docs/00/69/00/14/PDF/cvpr2012.pdf)  
-- 2012 ICML [Building High-level Features Using Large Scale Unsupervised Learning](http://static.googleusercontent.com/media/research.google.com/en/us/archive/unsupervised_icml2012.pdf)  
-
diff --git a/dl_cnn.md b/dl_cnn.md
new file mode 100644
index 0000000..9963fda
--- /dev/null
+++ b/dl_cnn.md
@@ -0,0 +1,54 @@
+## Convolutional Nerual Netowrks
+---
+
+##### ImageNet Records  
+- 2016 arXiv [Identity Mappings in Deep Residual Networks](http://arxiv.org/pdf/1603.05027v1.pdf)    
+- 2016 arXiv [Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](http://arxiv.org/abs/1602.07261) (Inception V4)  
+- 2015 arXiv [Deep Residual Learning for Image Recognition](http://arxiv.org/abs/1512.03385) (ResNet)     
+- 2015 arXiv [Rethinking the Inception Architecture for Computer Vision](http://arxiv.org/abs/1512.00567) (Inception V3)  
+- 2015 ICML [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](http://jmlr.org/proceedings/papers/v37/ioffe15.pdf) (Inception V2)  
+- 2015 ICCV [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](http://research.microsoft.com/en-us/um/people/kahe/publications/iccv15imgnet.pdf) (PReLU)  
+- 2015 ICLR [Very Deep Convolutional Networks For Large-scale Image Recognition](http://arxiv.org/abs/1409.1556) (VGG)  
+- 2015 CVPR [Going Deeper with Convolutions](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43022.pdf) (GoogleNet/Inception V1)   
+- 2012 NIPS [ImageNet Classification with Deep Convolutional Neural Networks](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) (AlexNet)  
+
+##### Arichitecture Design
+ 
+- 2016 arXiv [Benefits of depth in neural networks](http://arxiv.org/abs/1602.04485)  
+- 2016 AAAI [On the Depth of Deep Neural Networks: A Theoretical View](http://arxiv.org/abs/1506.05232)  
+- 2016 arXiv [SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size](http://arxiv.org/abs/1602.07360)  
+- 2015 CVPR [Convolutional Neural Networks at Constrained Time Cost](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/He_Convolutional_Neural_Networks_2015_CVPR_paper.pdf)  
+- 2015 ICLR [FitNets: Hints for Thin Deep Nets](http://arxiv.org/pdf/1412.6550v4.pdf)  
+- 2014 NIPS [Do Deep Nets Really Need to be Deep?](http://papers.nips.cc/paper/5484-do-deep-nets-really-need-to-be-deep.pdf)  
+- 2014 ICLRW [Understanding Deep Architectures using a Recursive Convolutional Network](http://arxiv.org/abs/1312.1847)  
+- 2014 ECCV [Visualizing and Understanding Convolutional Networks](https://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf)  
+- 2009 ICCV [What is the Best Multi-Stage Architecture for Object Recognition?](http://yann.lecun.com/exdb/publis/pdf/jarrett-iccv-09.pdf)   
+- 1994 T-NN [SVD-NET: An Algorithm that Automatically Selects Network Structure](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=286929)  
+
+##### Network Binarization
+- 2016 arXiv [XNOR-Net: ImageNet Classification Using Binary
+Convolutional Neural Networks](http://arxiv.org/pdf/1603.05279v1.pdf)  
+- 2016 arXiv [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](http://arxiv.org/abs/1602.02830)  
+- 2015 NIPS [BinaryConnect: Training Deep Neural Networks with binary weights during propagations](https://papers.nips.cc/paper/5647-binaryconnect-training-deep-neural-networks-with-binary-weights-during-propagations.pdf)  
+   
+##### Model Compression / Parameter Pruning
+- 2016 ICLR [Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding](http://arxiv.org/abs/1510.00149)  
+- 2016 ICLR [Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications](http://arxiv.org/abs/1511.06530)
+- 2015 arXiv [Training CNNs with Low-Rank Filters for Efficient Image Classification](http://arxiv.org/abs/1511.06744)  
+- 2015 arXiv [Structured Pruning of Deep Convolutional Neural Networks](http://arxiv.org/abs/1512.08571)  
+- 2015 arXiv [Data-free parameter pruning for Deep Neural Networks](http://arxiv.org/abs/1507.06149)  
+- 2015 ICCV [An exploration of parameter redundancy in deep networks with circulant projections](http://felixyu.org/pdf/ICCV15_circulant.pdf)  
+- 2015 ICML [Compressing Neural Networks with the Hashing Trick](http://jmlr.org/proceedings/papers/v37/chenc15.pdf)  
+- 2015 NIPS [Learning both Weights and Connections for Efficient Neural Networks](http://arxiv.org/abs/1506.02626)  
+- 2014 arXiv [Compressing deep convolutional networks
+using vector quantization](http://arxiv.org/abs/1412.6115)  
+- 2014 NIPSW [Distilling the Knowledge in a Neural Network](https://fb56552f-a-62cb3a1a-s-sites.googlegroups.com/site/deeplearningworkshopnips- 2014/65.pdf?attachauth=ANoY7cr8J-eqASFdYZeOQK8d9aGCtxzQpaVNCcjKgt1THV7e9FKNuTlrH4QCPmgMg2jynAz3ehjOU_2q9SMsnBYZq3_Jlxf1NnWcBejaVZi4vNHZ41H2DK8R-MJsk3MqfMDXOfEPxhAAOwUBH7oE-EtEKDoYa-16eqZ5djaoT4VXdir383rikNv6YF68dhm84kw04VCzH5XpA_8ucgW3iBr77bkjaYvNvC6YsUuC3PyVEPIusOZaM94%3D&attredirects=0)   
+- 1989 NIPS [Optimal Brain Damage](http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf)  
+
+<!--##### Other
+- 2013 PAMI [Invariant Scattering Convolution Networks](http://www.di.ens.fr/data/publications/papers/pami-final.pdf)  -->
+
+#### Applications  
+- 2015 CVPR [FaceNet: A Unified Embedding for Face Recognition and Clustering](http://arxiv.org/abs/1503.03832)  
+- 2012 CVPR [Towards Good Practice in Large-Scale Learning for Image Classification](http://hal.inria.fr/docs/00/69/00/14/PDF/cvpr2012.pdf)  
+- 2012 ICML [Building High-level Features Using Large Scale Unsupervised Learning](http://static.googleusercontent.com/media/research.google.com/en/us/archive/unsupervised_icml2012.pdf)  
\ No newline at end of file
diff --git a/dl_opt.md b/dl_opt.md
new file mode 100644
index 0000000..9432846
--- /dev/null
+++ b/dl_opt.md
@@ -0,0 +1,21 @@
+## Optimization 
+---
+
+- 2016 ICLR [Data-Dependent Path Normalization in Neural Networks](http://arxiv.org/pdf/1511.06747v4.pdf)  
+- 2016 Blog [An overview of gradient descent optimization algorithms](http://sebastianruder.com/optimizing-gradient-descent/index.html)  
+- 2015 arXiv [Adding Gradient Noise Improves Learning for Very Deep Networks](http://arxiv.org/abs/1511.06807)      
+- 2015 DL Summer School [Non-Smooth, Non-Finite, and Non-Convex Optimization](http://www.iro.umontreal.ca/~memisevr/dlss- 2015/- 2015_DLSS_NonSmoothNonFiniteNonConvex.pdf)  
+- 2015 NIPS [Training Very Deep Networks](http://papers.nips.cc/paper/5850-training-very-deep-networks.pdf)  
+- 2015 NIPS [Deep learning with Elastic Averaging SGD](https://www.cs.nyu.edu/~zsx/nips- 2015.pdf) (EASGD)  
+- 2015 CVPR [Convolutional Neural Networks at Constrained Time Cost](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/He_Convolutional_Neural_Networks_2015_CVPR_paper.pdf)  
+- 2015 ICMLW [Highway Networks](http://arxiv.org/pdf/1505.00387v2.pdf)  
+- 2015 ICLR [Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging](http://arxiv.org/pdf/1409.1556v6.pdf)  
+- 2015 ICLR [Adam: A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980) (Adam)   
+- 2015 AISTATS [Deeply-Supervised Nets](http://jmlr.org/proceedings/papers/v38/lee15a.pdf)  
+- 2014 JMLR [Dropout: A Simple Way to Prevent Neural Networks from
+Overfitting](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf) (Dropout)    
+- 2014 NIPS [Identifying and attacking the saddle point problem in high-dimensional non-convex optimization](http://papers.nips.cc/paper/5486-identifying-and-attacking-the-saddle-point-problem-in-high-dimensional-non-convex-optimization.pdf)  
+- 2014 OSLW [On the Computational Complexity of Deep Learning](http://lear.inrialpes.fr/workshop/osl- 2015/slides/osl- 2015_shalev_shwartz.pdf)  
+- 2013 ICML [On the importance of initialization and momentum in deep learning](http://www.cs.utoronto.ca/~ilya/pubs/- 2013/1051_2.pdf)  
+- 2011 ICML [On optimization methods for deep learning](http://ai.stanford.edu/~quocle/LeNgiCoaLahProNg11.pdf)  
+- 2010 AISTATS [Understanding the difficulty of training deep feedforward neural networks](http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf)  
diff --git a/dl_sys.md b/dl_sys.md
new file mode 100644
index 0000000..c77b462
--- /dev/null
+++ b/dl_sys.md
@@ -0,0 +1,44 @@
+##Deep Learning Systems
+---
+
+## Deep Learning Software / Frameworks
+- **[Caffe](http://caffe.berkeleyvision.org/)**  
+	2015 [Large Scale Distributed Deep Learning on Hadoop Clusters](http://yahoohadoop.tumblr.com/post/129872361846/large-scale-distributed-deep-learning-on-hadoop)  
+	2014 MM [Caffe: Convolutional Architecture for Fast Feature Embedding](http://arxiv.org/abs/1408.5093)  
+- **[CNTK](https://www.cntk.ai/)** 	
+	2014 MSR-TR [An introduction to computational networks and the computational network toolkit](http://research.microsoft.com/apps/pubs/?id=226641)  
+	2014 OSDI [Project Adam: Building an Efficient and Scalable Deep Learning Training System](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdf)  	 
+- **[MXNet](http://mxnet.dmlc.ml/en/latest/)**  
+	2016 arXiv [Training Deep Nets with Sublinear Memory Cost](https://arxiv.org/abs/1604.06174)   	
+	2015 NIPSW [MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems](http://www.cs.cmu.edu/~muli/file/mxnet-learning-sys.pdf) [[GTC'16  Tutorial](http://www.cs.cmu.edu/~muli/file/mxnet_gtc16.pdf)]    
+	2014 NIPSW [Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning](http://stanford.edu/~rezab/nips- 2014workshop/submits/minerva.pdf)  
+	2014 ICLR [Purine: A bi-graph based deep learning framework](http://arxiv.org/abs/1412.6249)  
+- **[Neon](https://www.nervanasys.com/technology/neon/)**   
+	2015 arXiv [Fast Algorithms for Convolutional Neural Networks](http://arxiv.org/abs/1509.09308) (Winograd) [[Blog]](http://www.nervanasys.com/winograd/)    
+- **[SparkNet](https://github.com/amplab/SparkNet)**  
+	2016 arXiv [SparkNet: Training Deep Networks in Spark](http://arxiv.org/abs/1511.06051)
+- **[TensorFlow](https://www.tensorflow.org/)**  
+	2016 arXiv [TensorFlow: A system for large-scale machine learning](http://arxiv.org/abs/1605.08695)  
+	2015 [TensorFlow:Large-Scale Machine Learning on Heterogeneous Distributed Systems](http://download.tensorflow.org/paper/whitepaper2015.pdf) [[slides 1]](http://static.googleusercontent.com/media/research.google.com/en//people/jeff/BayLearn- 2015.pdf) [[slides 2]](http://vision.stanford.edu/teaching/cs231n/slides/jon_talk.pdf)  		
+	2012 NIPS [Large Scale Distributed Deep Networks](http://static.googleusercontent.com/media/research.google.com/en/us/archive/large_deep_networks_nips2012.pdf) (DistBelief)  
+	2014 NIPSW [Techniques and Systems for Training Large Neural Networks Quickly](http://stanford.edu/~rezab/nips2014workshop/slides/jeff.pdf)  
+- **[Theano]()**  
+	2016 arXiv [Theano: A Python framework for fast computation of mathematical expressions](http://arxiv.org/abs/1605.02688)  
+- **[Torch](http://torch.ch/)**  
+	2011 NIPSW [Torch7: A Matlab-like Environment for Machine Learning](http://cs.nyu.edu/~koray/files/2011_torch7_nipsw.pdf)  
+<!--- **Petuum**  
+	2016 EuroSys [STRADS: A Distributed Framework for Scheduled Model Parallel Machine Learning](http://www.istc-cc.cmu.edu/publications/papers/2016/strads-kim-eurosys16.pdf)   -->
+	
+## Speicific System 
+ - 2015 arXiv [Deep Speech 2: End-to-End Speech Recognition in English and Mandarin](http://arxiv.org/abs/1512.02595)  
+- 2015 arXiv [Deep Image: Scaling up Image Recognition](http://arxiv.org/abs/1501.02876)  
+- 2013 ICML [Deep learning with COTS HPC systems](http://jmlr.org/proceedings/papers/v28/coates13.pdf)	
+
+## Parallization
+- 2015 Intel [Single Node Caffe Scoring and Training on Intel® Xeon E5-Series Processors](https://software.intel.com/en-us/articles/single-node-caffe-scoring-and-training-on-intel-xeon-e5-series-processors)  
+- 2015 arXiv [Caffe con Troll: Shallow Ideas to Speed Up Deep Learning](http://arxiv.org/abs/1504.04343)  
+- 2015 ICMLW [Massively Parallel Methods for Deep Reinforcement Learning](https://8109f4a4-a-62cb3a1a-s-sites.googlegroups.com/site/deeplearning- 2015/1.pdf?attachauth=ANoY7cocCvmoqZlkfUFQkSwV8fULURfVSzDdFv0dyk8uU1ztfeCHFIK4Kb6JoEQ3iZLUiYBynddwePUhd-3ssJZkANn-PXFU7m1U_wE5Eb4eHbZj3YR41bLF1AEr5T5EDth97i9DdkipHses1XTMDu_wpw8zs0-RGb7WVQRF8ZOhvG1AW47CRkAI8X0iv-oLtWy9fGSSa-JR9JpSwFUtjt_0_UXu4BUUwg==&attredirects=0)  
+- 2015 arXiv [Convolutional Neural Networks at Constrained Time Cost](http://arxiv.org/pdf/1412.1710v1.pdf)  
+- 2014 arXiv [One weird trick for parallelizing convolutional neural networks](http://arxiv.org/pdf/1404.5997v2.pdf)  
+- 2014 NIPS [On the Computational Efficiency of Training Neural Networks](http://papers.nips.cc/paper/5267-on-the-computational-efficiency-of-training-neural-networks.pdf)  
+- 2011 NIPSW [Improving the speed of neural networks on CPUs](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37631.pdf)