Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms

09/13/2023
by   Chu Yuan Zhang, et al.
0

Recent strides in neural speech synthesis technologies, while enjoying widespread applications, have nonetheless introduced a series of challenges, spurring interest in the defence against the threat of misuse and abuse. Notably, source attribution of synthesized speech has value in forensics and intellectual property protection, but prior work in this area has certain limitations in scope. To address the gaps, we present our findings concerning the identification of the sources of synthesized speech in this paper. We investigate the existence of speech synthesis model fingerprints in the generated speech waveforms, with a focus on the acoustic model and the vocoder, and study the influence of each component on the fingerprint in the overall speech waveforms. Our research, conducted using the multi-speaker LibriTTS dataset, demonstrates two key insights: (1) vocoders and acoustic models impart distinct, model-specific fingerprints on the waveforms they generate, and (2) vocoder fingerprints are the more dominant of the two, and may mask the fingerprints from the acoustic model. These findings strongly suggest the existence of model-specific fingerprints for both the acoustic model and the vocoder, highlighting their potential utility in source identification applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2021

FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

Methods for modeling and controlling prosody with acoustic features have...
research
09/25/2019

Speech Recognition with Augmented Synthesized Speech

Recent success of the Tacotron speech synthesis architecture and its var...
research
06/07/2020

Analysis and Synthesis of Hypo and Hyperarticulated Speech

This paper focuses on the analysis and synthesis of hypo and hyperarticu...
research
05/15/2019

Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models

Speech-driven visual speech synthesis involves mapping features extracte...
research
07/11/2023

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

Self-supervised learning (SSL) speech representations learned from large...
research
11/28/2018

UFANS: U-shaped Fully-Parallel Acoustic Neural Structure For Statistical Parametric Speech Synthesis With 20X Faster

Neural networks with Auto-regressive structures, such as Recurrent Neura...
research
11/19/2018

Limitations of Source-Filter Coupling In Phonation

The coupling of vocal fold (source) and vocal tract (filter) is one of t...

Please sign up or login with your details

Forgot password? Click here to reset