[Paper Reading] Ranjan et al., “Deep Learning for Understanding Faces: Machines May Be Just as Good, or Better, than Humans”, IEEE Signal Processing Magazine, 2018

3 min readMar 13, 2019

Problem Definition

Provide an overview of deep-learning methods used for face recognition

What we can learn from faces

Three modules are typically needed for automatic face identification and verification system:

A face detector to localize faces in images or videos (should be robust with varying pose, illumination and scale) and give the precise face bounding box
A fiducial point detector to localize the important facial landmarks for face alignment
A feature descriptor that encodes the identity information is extracted from the aligned face

Given the face representations, similarity scores are then obtained between them using a metric. If this metric is lower than a threshold, it signifies that the two faces are from the same subject.

The performance of CNN-based methods becomes higher and higher due to the availability of a large number of annotated unconstrained face data sets.

Face detection in unconstrained images

Region based

Faster R-CNN

Sliding-window based: DP2MFD, DDFD

Single-shot detector (SSD)

Finding crucial facial keypoints and head orientation

Model based: AAM, ASM, and CLM…

Cascaded regression based: CCL…

Face identification and verification

Robust feature learning for faces using deep learning

Discriminative metric learning for faces

Implementation

Training data sets for face recognition

Performance summary

Facial attributes

MTL for facial analysis

Open Issues

Face detection: illumination, facial expression, viewpoints, occlusions, blur and low resolution
Fiducial detection: make alignment system more robust to the challenges, including extreme pose, low illumination, and small, blurry face images; encode more abstract information such as identity, pose, and attributes.
Face identification/verification: under memory constraints to choose informative pairs or triplets and train the network end to end using online methods (e.g., stochastic gradient descent) on large-scale data sets; to incorporate full motion video processing in deep networks for enabling video-based face analytics.

Conclusions

This paper presents an overview of recent developments in designing an automatic face recognition system

Reference

Ranjan, Rajeev, et al. “Deep learning for understanding faces: Machines may be just as good, or better, than humans.” IEEE Signal Processing Magazine 35.1 (2018): 66–83.