[Papaer Reading] Learning From Noisy Large-Scale Datasets With Minimal Supervision, Veit et al., CVPR 2017

Ya-Liang Allen Chang

2 min readMar 7, 2019

--

Problem Definition

Labels in large-scale datasets are often noisy but a small subset of clean annotations are available
The key is to use clean annotations to reduce noises in large-scale datasets

Contributions

Introduce a semi-supervised learning framework for multilabel image classification that facilitates small sets of clean
annotations in conjunction with massive sets of noisy annotations
Benchmark on the Open Images Dataset
Demonstrate the proposed learning approach is more effective
in leveraging small labeled data than traditional fine-tuning

Method

Goal: to design an efficient and effective approach to leverage the quality of the labels in the clean subset V and the size of the whole dataset T
Label cleaning network g: learns a mapping from the noisy labels y to the human verified labels v, conditional on the input image
Image classifier h: learns to annotate images by imitating the first classifier g by using g’s predictions as ground truth targets

Experiments

Dataset: Open Images dataset, training set 9M images with 78M annotations, 6012 classes, not evenly distributed
Evaluation: mean average precision (MAP) for class and for all

Results

Summary

The authors propose an effective way to clean the Open Images dataset by multitasking

Questions

Why using linear layers?
What’s the performance on other datasets?
How much does the percentage of clean annotations affect the result?

Reference

Veit, Andreas, et al. “Learning from noisy large-scale datasets with minimal supervision.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

Machine Learning

Written by Ya-Liang Allen Chang

No responses yet

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams