Reliable Visualization for Deep Speaker Recognition

04/08/2022
by   Pengqi Li, et al.
0

In spite of the impressive success of convolutional neural networks (CNNs) in speaker recognition, our understanding to CNNs' internal functions is still limited. A major obstacle is that some popular visualization tools are difficult to apply, for example those producing saliency maps. The reason is that speaker information does not show clear spatial patterns in the temporal-frequency space, which makes it hard to interpret the visualization results, and hence hard to confirm the reliability of a visualization tool. In this paper, we conduct an extensive analysis on three popular visualization methods based on CAM: Grad-CAM, Score-CAM and Layer-CAM, to investigate their reliability for speaker recognition tasks. Experiments conducted on a state-of-the-art ResNet34SE model show that the Layer-CAM algorithm can produce reliable visualization, and thus can be used as a promising tool to explain CNN-based speaker models. The source code and examples are available in our project page: http://project.cslt.org/.

READ FULL TEXT
research
04/30/2018

How convolutional neural network see the world - A survey of convolutional neural network visualization methods

Nowadays, the Convolutional Neural Networks (CNNs) have achieved impress...
research
10/26/2019

Sum-Product Networks for Robust Automatic Speaker Recognition

The performance of a speaker recognition system degrades considerably in...
research
05/07/2020

AutoSpeech: Neural Architecture Search for Speaker Recognition

Speaker recognition systems based on Convolutional Neural Networks (CNNs...
research
05/25/2023

Visualizing data augmentation in deep speaker recognition

Visualization is of great value in understanding the internal mechanisms...
research
08/30/2021

RSKNet-MTSP: Effective and Portable Deep Architecture for Speaker Verification

The convolutional neural network (CNN) based approaches have shown great...
research
04/08/2022

Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion

Recent research showed that an autoencoder trained with speech of a sing...

Please sign up or login with your details

Forgot password? Click here to reset