[Paper Reading] Ranjan et al., “Deep Learning for Understanding Faces: Machines May Be Just as Good, or Better, than Humans”, IEEE Signal Processing Magazine, 2018
Problem Definition
- Provide an overview of deep-learning methods used for face recognition
What we can learn from faces
Three modules are typically needed for automatic face identification and verification system:
- A face detector to localize faces in images or videos (should be robust with varying pose, illumination and scale) and give the precise face bounding box
- A fiducial point detector to localize the important facial landmarks for face alignment
- A feature descriptor that encodes the identity information is extracted from the aligned face
Given the face representations, similarity scores are then obtained between them using a metric. If this metric is lower than a threshold, it signifies that the two faces are from the same subject.
The performance of CNN-based methods becomes higher and higher due to the availability of a large number of annotated unconstrained face data sets.
Face detection in unconstrained images
Region based
Faster R-CNN
Sliding-window based: DP2MFD, DDFD
Single-shot detector (SSD)
Finding crucial facial keypoints and head orientation
Model based: AAM, ASM, and CLM…
Cascaded regression based: CCL…
Face identification and verification
Robust feature learning for faces using deep learning
Discriminative metric learning for faces
Implementation
Training data sets for face recognition
Performance summary
Facial attributes
MTL for facial analysis
Open Issues
- Face detection: illumination, facial expression, viewpoints, occlusions, blur and low resolution
- Fiducial detection: make alignment system more robust to the challenges, including extreme pose, low illumination, and small, blurry face images; encode more abstract information such as identity, pose, and attributes.
- Face identification/verification: under memory constraints to choose informative pairs or triplets and train the network end to end using online methods (e.g., stochastic gradient descent) on large-scale data sets; to incorporate full motion video processing in deep networks for enabling video-based face analytics.
Conclusions
This paper presents an overview of recent developments in designing an automatic face recognition system
Reference
Ranjan, Rajeev, et al. “Deep learning for understanding faces: Machines may be just as good, or better, than humans.” IEEE Signal Processing Magazine 35.1 (2018): 66–83.