Speaker Anonymization with Phonetic Intermediate Representations

07/11/2022
by   Sarina Meyer, et al.
0

In this work, we propose a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings. Using phones as the intermediate representation ensures near complete elimination of speaker identity information from the input while preserving the original phonetic content as much as possible. Our experimental results on LibriSpeech and VCTK corpora reveal two key findings: 1) although automatic speech recognition produces imperfect transcriptions, our neural speech synthesis system can handle such errors, making our system feasible and robust, and 2) combining speaker embeddings from different resources is beneficial and their appropriate normalization is crucial. Overall, our final best system outperforms significantly the baselines provided in the Voice Privacy Challenge 2020 in terms of privacy robustness against a lazy-informed attacker while maintaining high intelligibility and naturalness of the anonymized speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

A single speaker is almost all you need for automatic speech recognition

We explore the use of speech synthesis and voice conversion applied to a...
research
08/22/2022

Are disentangled representations all you need to build speaker anonymization systems?

Speech signals contain a lot of sensitive information, such as the speak...
research
11/29/2022

Hiding speaker's sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline

The use of modern vocoders in an analysis/synthesis pipeline allows us t...
research
09/25/2021

Topic Model Robustness to Automatic Speech Recognition Errors in Podcast Transcripts

For a multilingual podcast streaming service, it is critical to be able ...
research
02/23/2022

Differentially Private Speaker Anonymization

Sharing real-world speech utterances is key to the training and deployme...
research
02/18/2023

Speaker and Language Change Detection using Wav2vec2 and Whisper

We investigate recent transformer networks pre-trained for automatic spe...
research
10/13/2022

Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy

In order to protect the privacy of speech data, speaker anonymization ai...

Please sign up or login with your details

Forgot password? Click here to reset