AAS-VC: On the Generalization Ability of Automatic Alignment Search based Non-autoregressive Sequence-to-sequence Voice Conversion

09/14/2023
by   Wen-Chin Huang, et al.
0

Non-autoregressive (non-AR) sequence-to-seqeunce (seq2seq) models for voice conversion (VC) is attractive in its ability to effectively model the temporal structure while enjoying boosted intelligibility and fast inference thanks to non-AR modeling. However, the dependency of current non-AR seq2seq VC models on ground truth durations extracted from an external AR model greatly limits its generalization ability to smaller training datasets. In this paper, we first demonstrate the above-mentioned problem by varying the training data size. Then, we present AAS-VC, a non-AR seq2seq VC model based on automatic alignment search (AAS), which removes the dependency on external durations and serves as a proper inductive bias to provide the required generalization ability for small datasets. Experimental results show that AAS-VC can generalize better to a training dataset of only 5 minutes. We also conducted ablation studies to justify several model design choices. The audio samples and implementation are available online.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2021

FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion

This paper proposes a non-autoregressive extension of our previously pro...
research
04/14/2021

Non-autoregressive sequence-to-sequence voice conversion

This paper proposes a novel voice conversion (VC) method based on non-au...
research
05/02/2020

Improving Non-autoregressive Neural Machine Translation with Monolingual Data

Non-autoregressive (NAR) neural machine translation is usually done via ...
research
04/04/2021

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition

The autoregressive (AR) models, such as attention-based encoder-decoder ...
research
07/14/2022

Scene Text Recognition with Permuted Autoregressive Sequence Models

Context-aware STR methods typically use internal autoregressive (AR) lan...
research
06/08/2022

Autoregressive Perturbations for Data Poisoning

The prevalence of data scraping from social media as a means to obtain d...

Please sign up or login with your details

Forgot password? Click here to reset