[Paper Reading] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Ya-Liang Allen Chang
3 min readMay 23, 2019

--

Problem Definition

To develop a model for mobile device and embedded application

Contribution

  • Designed an efficient network architecture and a set of two hyper-parameters in order to build very small, low latency models that can be easily matched to the design requirements for mobile and embedded vision applications.

Related Work

  • Two categories: compressing pretrained networks or training small networks directly
  • Depthwise separable convolutions (subsequently used by Inception models)
  • Flattened networks
  • Factorized Networks
  • Xception network (scale up depthwise separable filters)
  • Squeezenet (bottleneck)
  • Atructured transform networks
  • Deep fried convnets
  • Shrinking, factorizing or compressing pretrained networks
  • Compression based on product quantization, hashing, and pruning, vector quantization and Huffman coding
  • Additionally various factorizations
  • Distillation (uses a larger network to teach a smaller network)
  • Low bit networks

MobileNet Architecture

Depthwise Separable Convolution

Network Structure and Training

Width Multiplier: Thinner Models

Resolution Multiplier: Reduced Representation

Experiments

Model Choices

Model Shrinking Hyperparameters

Fine Grained Recognition

Large Scale Geolocalizaton

Face Attributes

Object Detection

Face Embeddings

Conclusion

They proposed a new model architecture called MobileNets based on depthwise separable convolutions. They investigated some of the important design decisions leading to an efficient model

Reference

Howard, Andrew G., et al. “Mobilenets: Efficient convolutional neural networks for mobile vision applications.” arXiv preprint arXiv:1704.04861 (2017).

--

--

No responses yet