Heiga Zen

research

∙ 06/13/2023

SayTap: Language to Quadrupedal Locomotion

Large language models (LLMs) have demonstrated the potential to perform ...

0 Yujin Tang, et al. ∙

research

∙ 05/30/2023

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

This paper introduces a new speech dataset called “LibriTTS-R” designed ...

0 Yuma Koizumi, et al. ∙

research

∙ 05/27/2023

Translatotron 3: Speech to Speech Translation with Monolingual Data

This paper presents Translatotron 3, a novel approach to train a direct ...

0 Eliya Nachmani, et al. ∙

research

∙ 03/03/2023

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

Speech restoration (SR) is a task of converting degraded speech signals ...

0 Yuma Koizumi, et al. ∙

research

∙ 10/28/2022

Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation

Adapting a neural text-to-speech (TTS) model to a target speaker typical...

0 Nobuyuki Morioka, et al. ∙

research

∙ 10/27/2022

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech

This paper proposes Virtuoso, a massively multilingual speech-text joint...

0 Takaaki Saeki, et al. ∙

research

∙ 10/03/2022

WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration

Denoising diffusion probabilistic models (DDPMs) and generative adversar...

0 Yuma Koizumi, et al. ∙

research

∙ 08/28/2022

Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks

Transfer tasks in text-to-speech (TTS) synthesis - where one or more asp...

0 Lev Finkelstein, et al. ∙

research

∙ 04/07/2022

MAESTRO: Matched Speech Text Representations through Modality Matching

We present Maestro, a self-supervised training method to unify represent...

0 Zhehuai Chen, et al. ∙

research

∙ 03/31/2022

SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping

Neural vocoder using denoising diffusion probabilistic model (DDPM) has ...

0 Yuma Koizumi, et al. ∙

research

∙ 01/11/2022

CVSS Corpus and Massively Multilingual Speech-to-Speech Translation

We introduce CVSS, a massively multilingual-to-English speech-to-speech ...

0 Ye Jia, et al. ∙

research

∙ 06/17/2021

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

This paper introduces WaveGrad 2, a non-autoregressive generative model ...

0 Nanxin Chen, et al. ∙

research

∙ 03/28/2021

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS

This paper introduces PnG BERT, a new encoder model for neural TTS. This...

0 Ye Jia, et al. ∙

research

∙ 03/26/2021

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

This paper introduces Parallel Tacotron 2, a non-autoregressive neural t...

0 Isaac Elias, et al. ∙

research

∙ 10/22/2020

Parallel Tacotron: Non-Autoregressive and Controllable TTS

Although neural end-to-end text-to-speech models can synthesize highly n...

0 Isaac Elias, et al. ∙

research

∙ 10/08/2020

Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling

This paper presents Non-Attentive Tacotron based on the Tacotron 2 text-...

0 Jonathan Shen, et al. ∙

research

∙ 09/02/2020

WaveGrad: Estimating Gradients for Waveform Generation

This paper introduces WaveGrad, a conditional model for waveform generat...

5 Nanxin Chen, et al. ∙

research

∙ 02/06/2020

Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis

This paper proposes a hierarchical, fine-grained and interpretable laten...

0 Guangzhi Sun, et al. ∙

research

∙ 02/06/2020

Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior

Recent neural text-to-speech (TTS) models with fine-grained latent featu...

0 Guangzhi Sun, et al. ∙

research

∙ 07/09/2019

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

We present a multispeaker, multilingual text-to-speech (TTS) synthesis m...

0 Yu Zhang, et al. ∙

research

∙ 04/05/2019

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

This paper introduces a new speech corpus called "LibriTTS" designed for...

0 Heiga Zen, et al. ∙

research

∙ 02/21/2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Lingvo is a Tensorflow framework offering a complete solution for collab...

13 Jonathan Shen, et al. ∙

research

∙ 10/16/2018

Hierarchical Generative Modeling for Controllable Speech Synthesis

This paper proposes a neural end-to-end text-to-speech (TTS) model which...

0 Wei-Ning Hsu, et al. ∙

research

∙ 09/27/2018

Sample Efficient Adaptive Text-to-Speech

We present a meta-learning approach for adaptive text-to-speech (TTS) wi...

2 Yutian Chen, et al. ∙

research

∙ 11/28/2017

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

The recently-developed WaveNet architecture is the current state of the ...

0 Aaron van den Oord, et al. ∙

research

∙ 09/12/2016

WaveNet: A Generative Model for Raw Audio

This paper introduces WaveNet, a deep neural network for generating raw ...

0 Aaron van den Oord, et al. ∙

research

∙ 06/20/2016

Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices

Acoustic models based on long short-term memory recurrent neural network...

0 Heiga Zen, et al. ∙

Heiga Zen

Featured Co-authors

Sign in with Google

Consider DeepAI Pro