[Paper Reading] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Ya-Liang Allen Chang

3 min readMay 23, 2019

Problem Definition

To develop a model for mobile device and embedded application

Contribution

Designed an efficient network architecture and a set of two hyper-parameters in order to build very small, low latency models that can be easily matched to the design requirements for mobile and embedded vision applications.

Related Work

Two categories: compressing pretrained networks or training small networks directly
Depthwise separable convolutions (subsequently used by Inception models)
Flattened networks
Factorized Networks
Xception network (scale up depthwise separable filters)
Squeezenet (bottleneck)
Atructured transform networks
Deep fried convnets
Shrinking, factorizing or compressing pretrained networks
Compression based on product quantization, hashing, and pruning, vector quantization and Huffman coding
Additionally various factorizations
Distillation (uses a larger network to teach a smaller network)
Low bit networks

MobileNet Architecture

Depthwise Separable Convolution

Network Structure and Training

Width Multiplier: Thinner Models

Resolution Multiplier: Reduced Representation

Experiments

Model Choices

Model Shrinking Hyperparameters

Fine Grained Recognition

Large Scale Geolocalizaton

Face Attributes

Object Detection

Face Embeddings

Conclusion

They proposed a new model architecture called MobileNets based on depthwise separable convolutions. They investigated some of the important design decisions leading to an efficient model

Reference

Howard, Andrew G., et al. “Mobilenets: Efficient convolutional neural networks for mobile vision applications.” arXiv preprint arXiv:1704.04861 (2017).