[Paper Reading] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
3 min readMay 23, 2019
Problem Definition
To develop a model for mobile device and embedded application
Contribution
- Designed an efficient network architecture and a set of two hyper-parameters in order to build very small, low latency models that can be easily matched to the design requirements for mobile and embedded vision applications.
Related Work
- Two categories: compressing pretrained networks or training small networks directly
- Depthwise separable convolutions (subsequently used by Inception models)
- Flattened networks
- Factorized Networks
- Xception network (scale up depthwise separable filters)
- Squeezenet (bottleneck)
- Atructured transform networks
- Deep fried convnets
- Shrinking, factorizing or compressing pretrained networks
- Compression based on product quantization, hashing, and pruning, vector quantization and Huffman coding
- Additionally various factorizations
- Distillation (uses a larger network to teach a smaller network)
- Low bit networks
MobileNet Architecture
Depthwise Separable Convolution
Network Structure and Training
Width Multiplier: Thinner Models
Resolution Multiplier: Reduced Representation
Experiments
Model Choices
Model Shrinking Hyperparameters
Fine Grained Recognition
Large Scale Geolocalizaton
Face Attributes
Object Detection
Face Embeddings
Conclusion
They proposed a new model architecture called MobileNets based on depthwise separable convolutions. They investigated some of the important design decisions leading to an efficient model
Reference
Howard, Andrew G., et al. “Mobilenets: Efficient convolutional neural networks for mobile vision applications.” arXiv preprint arXiv:1704.04861 (2017).