JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification

12/17/2021
by   Shinnosuke Takamichi, et al.
0

In this paper, we construct a new Japanese speech corpus called "JTubeSpeech." Although recent end-to-end learning requires large-size speech corpora, open-sourced such corpora for languages other than English have not yet been established. In this paper, we describe the construction of a corpus from YouTube videos and subtitles for speech recognition and speaker verification. Our method can automatically filter the videos and subtitles with almost no language-dependent processes. We consistently employ Connectionist Temporal Classification (CTC)-based techniques for automatic speech recognition (ASR) and a speaker variation-based method for automatic speaker verification (ASV). We build 1) a large-scale Japanese ASR benchmark with more than 1,300 hours of data and 2) 900 hours of data for Japanese ASV.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2018

TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation

In this paper, we present TED-LIUM release 3 corpus dedicated to speech ...
research
12/08/2019

A Multi Purpose and Large Scale Speech Corpus in Persian and English for Speaker and Speech Recognition: the DeepMine Database

DeepMine is a speech database in Persian and English designed to build a...
research
02/16/2020

Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language

Ainu is an unwritten language that has been spoken by Ainu people who ar...
research
03/01/2019

KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos

In this paper, we describe KT-Speech-Crawler: an approach for automatic ...
research
04/29/2019

A Comparison of Online Automatic Speech Recognition Systems and the Nonverbal Responses to Unintelligible Speech

Automatic Speech Recognition (ASR) systems have proliferated over the re...
research
06/22/2019

End-to-End ASR for Code-switched Hindi-English Speech

End-to-end (E2E) models have been explored for large speech corpora and ...
research
06/27/2023

YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus

Machine learning for sign languages is bottlenecked by data. In this pap...

Please sign up or login with your details

Forgot password? Click here to reset