A Light CNN for Deep Face Representation with Noisy Labels
Convolution neural network (CNN) has significantly pushed forward the development of face recognition and analysis techniques. Current CNN models tend to be deeper and larger to better fit large amounts of training data. When training data are from internet, their labels are often ambiguous and inaccurate. This paper presents a light CNN framework to learn a compact embedding on the large-scale face data with massive noisy labels. First, we introduce the concept of maxout activation into each convolutional layer of CNN, which results in a Max-Feature-Map (MFM). Different from Rectified Linear Unit that suppresses a neuron by a threshold (or bias), MFM suppresses a neuron by a competitive relationship. MFM can not only separate noisy signals and informative signals but also plays a role of feature selection. Second, a network of five convolution layers and four Network in Network (NIN) layers are implemented to reduce the number of parameters and improve performance. Lastly, a semantic bootstrapping method is accordingly designed to make the prediction of the models be better consistent with noisy labels. Experimental results show that the proposed framework can utilize large-scale noisy data to learn a light model in terms of both computational cost and storage space. The learnt single model with a 256-D representation achieves state-of-the-art results on five face benchmarks without fine-tuning. The light CNN model is released on https://github.com/AlfredXiangWu/face_verification_experiment.
READ FULL TEXT