SVSNet: An End-to-end Speaker Voice Similarity Assessment Model

07/20/2021
by   Cheng-Hung Hu, et al.
1

Neural evaluation metrics derived for numerous speech generation tasks have recently attracted great attention. In this paper, we propose SVSNet, the first end-to-end neural network model to assess the speaker voice similarity between natural speech and synthesized speech. Unlike most neural evaluation metrics that use hand-crafted features, SVSNet directly takes the raw waveform as input to more completely utilize speech information for prediction. SVSNet consists of encoder, co-attention, distance calculation, and prediction modules and is trained in an end-to-end manner. The experimental results on the Voice Conversion Challenge 2018 and 2020 (VCC2018 and VCC2020) datasets show that SVSNet notably outperforms well-known baseline systems in the assessment of speaker similarity at the utterance and system levels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2019

Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation

We describe Parrotron, an end-to-end-trained speech-to-speech conversion...
research
02/07/2021

EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

Synthesized speech from articulatory movements can have real-world use f...
research
08/22/2023

Evaluation of the Speech Resynthesis Capabilities of the VoicePrivacy Challenge Baseline B1

Speaker anonymization systems continue to improve their ability to obfus...
research
07/19/2021

Translatotron 2: Robust direct speech-to-speech translation

We present Translatotron 2, a neural direct speech-to-speech translation...
research
07/03/2023

An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023

The task of synthetic speech generation is to generate language content ...
research
09/16/2023

FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework

This paper integrates graph-to-sequence into an end-to-end text-to-speec...
research
06/15/2021

Towards the Objective Speech Assessment of Smoking Status based on Voice Features: A Review of the Literature

In smoking cessation clinical research and practice, objective validatio...

Please sign up or login with your details

Forgot password? Click here to reset