Multi-stream Convolutional Neural Network with Frequency Selection for Robust Speaker Verification

12/21/2020
by   Wei Yao, et al.
11

Speaker verification aims to verify whether an input speech corresponds to the claimed speaker, and conventionally, this kind of system is deployed based on single-stream scenario, wherein the feature extractor operates in full frequency range. In this paper, we hypothesize that machine can learn enough knowledge to do classification task when listening to partial frequency range instead of full frequency range, which is so called frequency selection technique, and further propose a novel framework of multi-stream Convolutional Neural Network (CNN) with this technique for speaker verification tasks. The proposed framework accommodates diverse temporal embeddings generated from multiple streams to enhance the robustness of acoustic modeling. For the diversity of temporal embeddings, we consider feature augmentation with frequency selection, which is to manually segment the full-band of frequency into several sub-bands, and the feature extractor of each stream can select which sub-bands to use as target frequency domain. Different from conventional single-stream solution wherein each utterance would only be processed for one time, in this framework, there are multiple streams processing it in parallel. The input utterance for each stream is pre-processed by a frequency selector within specified frequency range, and post-processed by mean normalization. The normalized temporal embeddings of each stream will flow into a pooling layer to generate fused embeddings. We conduct extensive experiments on VoxCeleb dataset, and the experimental results demonstrate that multi-stream CNN significantly outperforms single-stream baseline with 20.53 improvement in minimum Decision Cost Function (minDCF).

READ FULL TEXT

page 1

page 2

page 5

page 6

page 7

page 8

page 9

page 10

research
12/31/2020

Generalized Operating Procedure for Deep Learning: an Unconstrained Optimal Design Perspective

Deep learning (DL) has brought about remarkable breakthrough in processi...
research
06/24/2019

Self Multi-Head Attention for Speaker Recognition

Most state-of-the-art Deep Learning (DL) approaches for speaker recognit...
research
05/26/2022

DT-SV: A Transformer-based Time-domain Approach for Speaker Verification

Speaker verification (SV) aims to determine whether the speaker's identi...
research
03/13/2018

Deep CNN based feature extractor for text-prompted speaker recognition

Deep learning is still not a very common tool in speaker verification fi...
research
06/19/2019

Spatial Pyramid Encoding with Convex Length Normalization for Text-Independent Speaker Verification

In this paper, we propose a new pooling method called spatial pyramid en...
research
10/22/2020

Graph Attention Networks for Speaker Verification

This work presents a novel back-end framework for speaker verification u...
research
05/21/2020

Multistream CNN for Robust Acoustic Modeling

This paper presents multistream CNN, a novel neural network architecture...

Please sign up or login with your details

Forgot password? Click here to reset