Deep generative factorization for speech signal

10/27/2020
by   Haoran Sun, et al.
0

Various information factors are blended in speech signals, which forms the primary difficulty for most speech information processing tasks. An intuitive idea is to factorize speech signal into individual information factors (e.g., phonetic content and speaker trait), though it turns out to be highly challenging. This paper presents a speech factorization approach based on a novel factorial discriminative normalization flow model (factorial DNF). Experiments conducted on a two-factor case that involves phonetic content and speaker trait demonstrates that the proposed factorial DNF has powerful capability to factorize speech signals and outperforms several comparative models in terms of information representation and manipulation.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/27/2018

Deep factorization for speech signal

Various informative factors mixed in speech signals, leading to great di...
10/29/2019

On Investigation of Unsupervised Speech Factorization Based on Normalization Flow

Speech signals are complex composites of various information, including ...
10/30/2019

Mixture factorized auto-encoder for unsupervised hierarchical deep factorization of speech signal

Speech signal is constituted and contributed by various informative fact...
03/31/2016

Multi-task Recurrent Model for Speech and Speaker Recognition

Although highly correlated, speech and speaker recognition have been reg...
11/02/2020

Speaker anonymisation using the McAdams coefficient

Anonymisation has the goal of manipulating speech signals in order to de...
07/07/2021

Effective and Differentiated Use of Control Information for Multi-speaker Speech Synthesis

In multi-speaker speech synthesis, data from a number of speakers usuall...
03/10/2021

Variable-rate discrete representation learning

Semantically meaningful information content in perceptual signals is usu...