[Paper Reading] Model Compression and Acceleration for Deep Neural Networks

Ya-Liang Allen Chang
3 min readMay 23, 2019

Problem Definition

A summary of different approaches for network compression and acceleration

Methods

Four categories:

  1. Parameter pruning and sharing: The parameter pruning and sharing-based methods explore the redundancy in the model parameters and try to remove the redundant and noncritical ones.
  2. Low-rank factorization: Low-rank factorization-based techniques use matrix/tensor decomposition to estimate the informative parameters of the deep convolutional neural networks (CNNs).
  3. Transferred/compact convolutional filters: The transferred/compact convolutional filters-based approaches design special structural convolutional filters to reduce the storage and computation complexity.
  4. Knowledge distillation (KD): The KD methods learn a distilled model and train a more compact neural network to reproduce the output of a larger network

Parameter pruning and sharing

Quantization and binarization

Network quantization compresses the original network by reducing the number of bits required to represent each weight

Drawbacks

  • The accuracy of such binary nets is significantly lowered when dealing with large CNNs such as GoogleNet
  • These binary nets is that existing binarization
    schemes are based on simple matrix approximations and ignore
    the effect of binarization on the accuracy loss

Pruning and sharing

Drawbacks

  • Pruning with l1 or l2 regularization requires more iterations to converge.
  • All pruning criteria require manual setup of sensitivity for layers, which demands fine-tuning of the parameters and could be cumbersome for
    some applications.

Designing the structural matrix

Drawbacks

  • The structural constraint will cause loss in accuracy since the constraint
    might bring bias to the model.
  • How to find a proper structural matrix is difficult. There is no theoretical way from which to derive it.

Low-rank factorization and sparsity

Drawbacks

  • The implementation is not that easy since it involves
    a decomposition operation, which is computationally expensive
  • Current methods perform low-rank approximation layer by layer, and thus cannot perform global parameter compression, which is important as different layers hold different information
  • Factorization requires extensive model retraining to achieve convergence when compared to the original model.

Transferred/compact convolutional filters

Drawbacks

  • These methods can achieve competitive performance for wide/flat architectures (like VGGNet) but not narrow/special ones (like
    GoogleNet and ResNet)
  • The transfer assumptions sometimes are too strong to guide the algorithm, making the results unstable on some data sets.

KD

Drawbacks

  • Can only be applied to classification tasks with softmax loss function, which hinders its usage.
  • The model assumptions sometimes are too strict to make the performance
    competitive with other types of approaches.

Other types of approaches

Benchmarks, evaluation, and databases

Discussion and challenges

Technique challenges

Possible solutions

Reference

Cheng, Yu, et al. “Model compression and acceleration for deep neural networks: The principles, progress, and challenges.” IEEE Signal Processing Magazine 35.1 (2018): 126–136.

--

--