DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech Synthesis

12/09/2020
by   Anurag Chowdhury, et al.
0

Automatic speaker recognition algorithms typically characterize speech audio using short-term spectral features that encode the physiological and anatomical aspects of speech production. Such algorithms do not fully capitalize on speaker-dependent characteristics present in behavioral speech features. In this work, we propose a prosody encoding network called DeepTalk for extracting vocal style features directly from raw audio data. The DeepTalk method outperforms several state-of-the-art speaker recognition systems across multiple challenging datasets. The speaker recognition performance is further improved by combining DeepTalk with a state-of-the-art physiological speech feature-based speaker recognition system. We also integrate DeepTalk into a current state-of-the-art speech synthesizer to generate synthetic speech. A detailed analysis of the synthetic speech shows that the DeepTalk captures F0 contours essential for vocal style modeling. Furthermore, DeepTalk-based synthetic speech is shown to be almost indistinguishable from real speech in the context of speaker recognition.

READ FULL TEXT

page 2

page 4

research
08/26/2020

DeepVOX: Discovering Features from Raw Audio for Speaker Recognition in Degraded Audio Signals

Automatic speaker recognition algorithms typically use pre-defined filte...
research
05/09/2018

Speaker Recognition using Deep Belief Networks

Short time spectral features such as mel frequency cepstral coefficients...
research
05/13/2023

Vocal Style Factorization for Effective Speaker Recognition in Affective Scenarios

The accuracy of automated speaker recognition is negatively impacted by ...
research
02/22/2018

Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics

Learning speaker-specific features is vital in many applications like sp...
research
06/22/2017

Speaker Recognition with Cough, Laugh and "Wei"

This paper proposes a speaker recognition (SRE) task with trivial speech...
research
11/16/2022

Psychophysiology-aided Perceptually Fluent Speech Analysis of Children Who Stutter

This first-of-its-kind paper presents a novel approach named PASAD that ...
research
08/30/2021

Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring

Automatic Speech Scoring (ASS) is the computer-assisted evaluation of a ...

Please sign up or login with your details

Forgot password? Click here to reset