Sum-Product Networks for Robust Automatic Speaker Recognition
The performance of a speaker recognition system degrades considerably in the presence of noise. One approach to significantly increase robustness is to use the marginal probability density function of the spectral features that reliably represent speech. Current state-of-the-art speaker recognition systems employ non-probabilistic models, such as convolutional neural networks (CNNs), which cannot use marginalisation. As an alternative, we propose the use of sum-product networks (SPNs), a deep probabilistic graphical model which is compatible with marginalisation. SPN speaker models are evaluated here on real-world non-stationary and coloured noise sources at multiple SNR levels. In terms of speaker recognition accuracy, SPN speaker models employing marginalisation are more robust than recent CNN-based speaker recognition systems that pre-process the noisy speech. Additionally, the SPN speaker models consist of significantly fewer parameters than that of the CNN-based speaker recognition systems. The results presented in this work show that SPN speaker models are a robust, parameter-efficient alternative for speaker recognition. Availability: The SPN speaker recognition system is available at: https://github.com/anicolson/SPN-Spk-Rec
READ FULL TEXT