Evaluation of the Speech Resynthesis Capabilities of the VoicePrivacy Challenge Baseline B1

08/22/2023
by   Ünal Ege Gaznepoglu, et al.
0

Speaker anonymization systems continue to improve their ability to obfuscate the original speaker characteristics in a speech signal, but often create processing artifacts and unnatural sounding voices as a tradeoff. Many of those systems stem from the VoicePrivacy Challenge (VPC) Baseline B1, using a neural vocoder to synthesize speech from an F0, x-vectors and bottleneck features-based speech representation. Inspired by this, we investigate the reproduction capabilities of the aforementioned baseline, to assess how successful the shared methodology is in synthesizing human-like speech. We use four objective metrics to measure speech quality, waveform similarity, and F0 similarity. Our findings indicate that both the speech representation and the vocoder introduces artifacts, causing an unnatural perception. A MUSHRA-like listening test on 18 subjects corroborate our findings, motivating further research on the analysis and synthesis components of the VPC Baseline B1.

READ FULL TEXT
research
07/20/2021

SVSNet: An End-to-end Speaker Voice Similarity Assessment Model

Neural evaluation metrics derived for numerous speech generation tasks h...
research
11/29/2022

Hiding speaker's sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline

The use of modern vocoders in an analysis/synthesis pipeline allows us t...
research
04/01/2022

AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios

Adaptive text to speech (TTS) can synthesize new voices in zero-shot sce...
research
11/07/2021

Speaker Generation

This work explores the task of synthesizing speech in nonexistent human-...
research
04/23/2018

A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment

Voice conversion (VC) aims at conversion of speaker characteristic witho...
research
10/11/2021

LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example

Emotional and controllable speech synthesis is a topic that has received...
research
09/21/2022

An Initial study on Birdsong Re-synthesis Using Neural Vocoders

Modern speech synthesis uses neural vocoders to model raw waveform sampl...

Please sign up or login with your details

Forgot password? Click here to reset