Summary of Deep Neural Network Optimization on Resource-Constraint Devices
11 Feb 2018Low-Rank Decomposition
- Sparse Convolutional Neural Network
Prune
- Learning both Weights and Connections for Efficient Neural Networks
Classical
- Dynamic Network Surgery for Efficient DNNs
Classical
- Pruning Convolutional Neural Networks for Resource Efficient Inference
Structure Prunning
- Pruning Filters for Efficient ConvNets
看filter绝对值去
- Data-free Parameter Pruning for Deep Neural Networks
Quantization (Binarization, Ternarization)
- Towards the Limit of Network Quantization
Clustering + Approximate Hessian
-
Trained Ternary Quantization
-
DoReFa-Net
-
Training and Inference with Integers in Deep Neural Network (WAGE)
-
XNOR-Net
- Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or -1
Classical
- BinaryConnect: Training Deep Neural Networks with Binary Weights during Propagations
Classical
-
Loss-Aware Binaration of Deep Networks
-
Ternary Wight Networks
-
Incremental Network Quantization
- Extremely Low Bit Neural Network: Squeeze the Last Bit out with Admm
Optimization from Software
- Faster CNNs with Direct Sparse Convolutions and Guided Prunning
从内存角度出发,通过底层操作加速卷积操作,且有对useful sparsity的推导和证明,对比实验很充分。
- SBNet: Sparse Blocks Networks for Fast Inference
思想很简单,把非稀疏的input聚在一起处理。因为Dense Convolution已经被优化得很好,但工程性极强。
Optimize Convolution
-
Fast Algorithms for Convolutional Neural Networks
-
Enabling Sparse Winograd Convolution by Native Prunning
Image Compression (Quantization)
- Learning Convolutional Networks for Content-weighted Image Compression.
Miscellaneous
-
Compressing Neural Network with the Hashing Trick
-
Interleaved Group Convolutions for Deep Neural Networks
按channel切开input feature做卷积(早期AlexNet分组卷积思想),再把来自不同的partition的feature maps组一起做第二次卷积