RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification

04/17/2019
by   Jee-weon Jung, et al.
0

Recently, direct modeling of raw waveforms using deep neural networks has been widely studied for a number of tasks in audio domains. In speaker verification, however, utilization of raw waveforms is in its preliminary phase, requiring further investigation. In this study, we explore end-to-end deep neural networks that input raw waveforms to improve various aspects: front-end speaker embedding extraction including model architecture, pre-training scheme, additional objective functions, and back-end classification. Adjustment of model architecture using a pre-training scheme can extract speaker embeddings, giving a significant improvement in performance. Additional objective functions simplify the process of extracting speaker embeddings by merging conventional two-phase processes: extracting utterance-level features such as i-vectors or x-vectors and the feature enhancement phase, e.g., linear discriminant analysis. Effective back-end classification models that suit the proposed speaker embedding are also explored. We propose an end-to-end system that comprises two deep neural networks, one front-end for utterance-level speaker embedding extraction and the other for back-end classification. Experiments conducted on the VoxCeleb1 dataset demonstrate that the proposed model achieves state-of-the-art performance among systems without data augmentation. The proposed system is also comparable to the state-of-the-art x-vector system that adopts heavy data augmentation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2022

Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances

Conventional automatic speaker verification systems can usually be decom...
research
10/25/2018

Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings

Input utterance with short duration is one of the most critical threats ...
research
06/08/2018

Analysis of Length Normalization in End-to-End Speaker Verification System

The classical i-vectors and the latest end-to-end deep speaker embedding...
research
05/03/2018

Supervector Compression Strategies to Speed up I-Vector System Development

The front-end factor analysis (FEFA), an extension of principal componen...
research
09/13/2019

Probing the Information Encoded in x-vectors

Deep neural network based speaker embeddings, such as x-vectors, have be...
research
04/01/2020

Improved RawNet with Filter-wise Rescaling for Text-independent Speaker Verification using Raw Waveforms

Recent advances in deep learning have facilitated the design of speaker ...
research
07/31/2018

Prosodic-Enhanced Siamese Convolutional Neural Networks for Cross-Device Text-Independent Speaker Verification

In this paper a novel cross-device text-independent speaker verification...

Please sign up or login with your details

Forgot password? Click here to reset