Deep generative factorization for speech signal

by   Haoran Sun, et al.

Various information factors are blended in speech signals, which forms the primary difficulty for most speech information processing tasks. An intuitive idea is to factorize speech signal into individual information factors (e.g., phonetic content and speaker trait), though it turns out to be highly challenging. This paper presents a speech factorization approach based on a novel factorial discriminative normalization flow model (factorial DNF). Experiments conducted on a two-factor case that involves phonetic content and speaker trait demonstrates that the proposed factorial DNF has powerful capability to factorize speech signals and outperforms several comparative models in terms of information representation and manipulation.


page 1

page 2

page 3

page 4


Deep factorization for speech signal

Various informative factors mixed in speech signals, leading to great di...

On Investigation of Unsupervised Speech Factorization Based on Normalization Flow

Speech signals are complex composites of various information, including ...

Mixture factorized auto-encoder for unsupervised hierarchical deep factorization of speech signal

Speech signal is constituted and contributed by various informative fact...

Multi-task Recurrent Model for Speech and Speaker Recognition

Although highly correlated, speech and speaker recognition have been reg...

Speaker anonymisation using the McAdams coefficient

Anonymisation has the goal of manipulating speech signals in order to de...

Effective and Differentiated Use of Control Information for Multi-speaker Speech Synthesis

In multi-speaker speech synthesis, data from a number of speakers usuall...

Variable-rate discrete representation learning

Semantically meaningful information content in perceptual signals is usu...