Boosting Semi-Supervised Face Recognition with Noise Robustness

by   Yuchi Liu, et al.

Although deep face recognition benefits significantly from large-scale training data, a current bottleneck is the labelling cost. A feasible solution to this problem is semi-supervised learning, exploiting a small portion of labelled data and large amounts of unlabelled data. The major challenge, however, is the accumulated label errors through auto-labelling, compromising the training. This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling. Specifically, we introduce a multi-agent method, named GroupNet (GN), to endow our solution with the ability to identify the wrongly labelled samples and preserve the clean samples. We show that GN alone achieves the leading accuracy in traditional supervised face recognition even when the noisy labels take over 50% of the training data. Further, we develop a semi-supervised face recognition solution, named Noise Robust Learning-Labelling (NRoLL), which is based on the robust training ability empowered by GN. It starts with a small amount of labelled data and consequently conducts high-confidence labelling on a large amount of unlabelled data to boost further training. The more data is labelled by NRoLL, the higher confidence is with the label in the dataset. To evaluate the competitiveness of our method, we run NRoLL with a rough condition that only one-fifth of the labelled MSCeleb is available and the rest is used as unlabelled data. On a wide range of benchmarks, our method compares favorably against the state-of-the-art methods.


page 1

page 3


Data Programming using Semi-Supervision and Subset Selection

The paradigm of data programming <cit.> has shown a lot of promise in us...

Training Object Detectors With Noisy Data

The availability of a large quantity of labelled training data is crucia...

Trust Your Model: Iterative Label Improvement and Robust Training by Confidence Based Filtering and Dataset Partitioning

State-of-the-art, high capacity deep neural networks not only require la...

Deep Learning Classification With Noisy Labels

Deep Learning systems have shown tremendous accuracy in image classifica...

Scalable Teacher Forcing Network for Semi-Supervised Large Scale Data Streams

The large-scale data stream problem refers to high-speed information flo...

Semi-supervised Object Detection via Virtual Category Learning

Due to the costliness of labelled data in real-world applications, semi-...

Learning Improved Representations by Transferring Incomplete Evidence Across Heterogeneous Tasks

Acquiring ground truth labels for unlabelled data can be a costly proced...