[Papaer Reading] Learning From Noisy Large-Scale Datasets With Minimal Supervision, Veit et al., CVPR 2017

Ya-Liang Allen Chang
2 min readMar 7, 2019

--

Problem Definition

  • Labels in large-scale datasets are often noisy but a small subset of clean annotations are available
  • The key is to use clean annotations to reduce noises in large-scale datasets

Contributions

  • Introduce a semi-supervised learning framework for multilabel image classification that facilitates small sets of clean
    annotations in conjunction with massive sets of noisy annotations
  • Benchmark on the Open Images Dataset
  • Demonstrate the proposed learning approach is more effective
    in leveraging small labeled data than traditional fine-tuning

Method

  • Goal: to design an efficient and effective approach to leverage the quality of the labels in the clean subset V and the size of the whole dataset T
  • Label cleaning network g: learns a mapping from the noisy labels y to the human verified labels v, conditional on the input image
  • Image classifier h: learns to annotate images by imitating the first classifier g by using g’s predictions as ground truth targets

Experiments

  • Dataset: Open Images dataset, training set 9M images with 78M annotations, 6012 classes, not evenly distributed
  • Evaluation: mean average precision (MAP) for class and for all

Results

Summary

  • The authors propose an effective way to clean the Open Images dataset by multitasking

Questions

  • Why using linear layers?
  • What’s the performance on other datasets?
  • How much does the percentage of clean annotations affect the result?

Reference

Veit, Andreas, et al. “Learning from noisy large-scale datasets with minimal supervision.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

--

--

No responses yet