[Papaer Reading] Learning From Noisy Large-Scale Datasets With Minimal Supervision, Veit et al., CVPR 2017
2 min readMar 7, 2019
Problem Definition
- Labels in large-scale datasets are often noisy but a small subset of clean annotations are available
- The key is to use clean annotations to reduce noises in large-scale datasets
Contributions
- Introduce a semi-supervised learning framework for multilabel image classification that facilitates small sets of clean
annotations in conjunction with massive sets of noisy annotations - Benchmark on the Open Images Dataset
- Demonstrate the proposed learning approach is more effective
in leveraging small labeled data than traditional fine-tuning
Method
- Goal: to design an efficient and effective approach to leverage the quality of the labels in the clean subset V and the size of the whole dataset T
- Label cleaning network g: learns a mapping from the noisy labels y to the human verified labels v, conditional on the input image
- Image classifier h: learns to annotate images by imitating the first classifier g by using g’s predictions as ground truth targets
Experiments
- Dataset: Open Images dataset, training set 9M images with 78M annotations, 6012 classes, not evenly distributed
- Evaluation: mean average precision (MAP) for class and for all
Results
Summary
- The authors propose an effective way to clean the Open Images dataset by multitasking
Questions
- Why using linear layers?
- What’s the performance on other datasets?
- How much does the percentage of clean annotations affect the result?
Reference
Veit, Andreas, et al. “Learning from noisy large-scale datasets with minimal supervision.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.