NeMo Toolbox for Speech Dataset Construction

04/11/2021
by   Evelina Bakhturina, et al.
0

In this paper, we introduce a new toolbox for constructing speech datasets from long audio recording and raw reference texts. We develop tools for each step of the speech dataset construction pipeline including data preprocessing, audio-text alignment, data post-processing and filtering. The proposed pipeline also supports human-in-the-loop to address text-audio mismatch issues and remove samples that don't satisfy the quality requirements. We demonstrated the toolbox efficiency by building the Russian LibriSpeech corpus (RuLS) from LibriVox audiobooks. The toolbox is opne sourced in NeMo framework. The RuLS corpus is released in OpenSLR.

READ FULL TEXT

page 2

page 3

research
09/16/2017

AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline

An open-source Mandarin speech corpus called AISHELL-1 is released. It i...
research
06/11/2021

HUI-Audio-Corpus-German: A high quality TTS dataset

The increasing availability of audio data on the internet lead to a mult...
research
04/05/2019

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

This paper introduces a new speech corpus called "LibriTTS" designed for...
research
06/01/2017

Polish Read Speech Corpus for Speech Tools and Services

This paper describes the speech processing activities conducted at the P...
research
05/30/2023

Weakly-supervised forced alignment of disfluent speech using phoneme-level modeling

The study of speech disorders can benefit greatly from time-aligned data...
research
06/15/2022

Text-Aware End-to-end Mispronunciation Detection and Diagnosis

Mispronunciation detection and diagnosis (MDD) technology is a key compo...
research
02/28/2023

Automatic Heteronym Resolution Pipeline Using RAD-TTS Aligners

Grapheme-to-phoneme (G2P) transduction is part of the standard text-to-s...

Please sign up or login with your details

Forgot password? Click here to reset