J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis

01/26/2022
by   Shinnosuke Takamichi, et al.
0

In this paper, we construct a Japanese audiobook speech corpus called "J-MAC" for speech synthesis research. With the success of reading-style speech synthesis, the research target is shifting to tasks that use complicated contexts. Audiobook speech synthesis is a good example that requires cross-sentence, expressiveness, etc. Unlike reading-style speech, speaker-specific expressiveness in audiobook speech also becomes the context. To enhance this research, we propose a method of constructing a corpus from audiobooks read by professional speakers. From many audiobooks and their texts, our method can automatically extract and refine the data without any language dependency. Specifically, we use vocal-instrumental separation to extract clean data, connectionist temporal classification to roughly align text and audio, and voice activity detection to refine the alignment. J-MAC is open-sourced in our project page. We also conduct audiobook speech synthesis evaluations, and the results give insights into audiobook speech synthesis.

READ FULL TEXT
research
08/17/2019

JVS corpus: free Japanese multi-speaker voice corpus

Thanks to improvements in machine learning techniques, including deep le...
research
10/05/2020

JSSS: free Japanese speech corpus for summarization and simplification

In this paper, we construct a new Japanese speech corpus for speech-base...
research
10/14/2022

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis

We present a comprehensive empirical study for personalized spontaneous ...
research
11/17/2020

Learn2Sing: Target Speaker Singing Voice Synthesis by learning from a Singing Teacher

Singing voice synthesis has been paid rising attention with the rapid de...
research
10/06/2020

Neural Speech Synthesis for Estonian

This technical report describes the results of a collaboration between t...
research
08/03/2017

Estimating speech from lip dynamics

The goal of this project is to develop a limited lip reading algorithm f...
research
01/13/2020

Unsupervised Any-to-Many Audiovisual Synthesis via Exemplar Autoencoders

We present an unsupervised approach that enables us to convert the speec...

Please sign up or login with your details

Forgot password? Click here to reset