Voice Conversion With Just Nearest Neighbors

05/30/2023
by   Matthew Baas, et al.
0

Any-to-any voice conversion aims to transform source speech into a target voice with just a few examples of the target speaker as a reference. Recent methods produce convincing conversions, but at the cost of increased complexity – making results difficult to reproduce and build on. Instead, we keep it simple. We propose k-nearest neighbors voice conversion (kNN-VC): a straightforward yet effective method for any-to-any conversion. First, we extract self-supervised representations of the source and reference speech. To convert to the target speaker, we replace each frame of the source representation with its nearest neighbor in the reference. Finally, a pretrained vocoder synthesizes audio from the converted representation. Objective and subjective evaluations show that kNN-VC improves speaker similarity with similar intelligibility scores to existing methods. Code, samples, trained models: https://bshall.github.io/knn-vc

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2023

Rhythm Modeling for Voice Conversion

Voice conversion aims to transform source speech into a different target...
research
02/24/2023

Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion

Voice conversion (VC) techniques can be abused by malicious parties to t...
research
09/14/2019

Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech

Voice conversion (VC) and text-to-speech (TTS) are two tasks that share ...
research
07/10/2022

A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

We present a large-scale comparative study of self-supervised speech rep...
research
09/08/2021

Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion

Voice conversion (VC) is an effective approach to electrolaryngeal (EL) ...
research
04/02/2021

Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques

In this paper, we pose the current state-of-the-art voice conversion (VC...
research
06/15/2018

A 5-Dimensional Tonnetz for Nearly Symmetric Hexachords

The standard 2-dimensional Tonnetz describes parsimonious voice-leading ...

Please sign up or login with your details

Forgot password? Click here to reset