Speech Modeling with a Hierarchical Transformer Dynamical VAE

03/07/2023
by   Xiaoyu Lin, et al.
0

The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep generative models that extends the VAE to model a sequence of observed data and a corresponding sequence of latent vectors. In almost all the DVAEs of the literature, the temporal dependencies within each sequence and across the two sequences are modeled with recurrent neural networks. In this paper, we propose to model speech signals with the Hierarchical Transformer DVAE (HiT-DVAE), which is a DVAE with two levels of latent variable (sequence-wise and frame-wise) and in which the temporal dependencies are implemented with the Transformer architecture. We show that HiT-DVAE outperforms several other DVAEs for speech spectrogram modeling, while enabling a simpler training procedure, revealing its high potential for downstream low-level speech processing tasks such as speech enhancement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2021

A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling

The Variational Autoencoder (VAE) is a powerful deep generative model th...
research
06/23/2021

Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders

Dynamical variational auto-encoders (DVAEs) are a class of deep generati...
research
11/25/2018

Sequential Variational Autoencoders for Collaborative Filtering

Variational autoencoders were proven successful in domains such as compu...
research
08/28/2020

Dynamical Variational Autoencoders: A Comprehensive Review

The Variational Autoencoder (VAE) is a powerful deep generative model th...
research
12/18/2017

Deep generative models of genetic variation capture mutation effects

The functions of proteins and RNAs are determined by a myriad of interac...
research
09/24/2018

Classify, predict, detect, anticipate and synthesize: Hierarchical recurrent latent variable models for human activity modeling

Human activity modeling operates on two levels: high-level action modeli...
research
07/11/2022

Hierarchical Latent Structure for Multi-Modal Vehicle Trajectory Forecasting

Variational autoencoder (VAE) has widely been utilized for modeling data...

Please sign up or login with your details

Forgot password? Click here to reset