Lhotse: a speech data representation library for the modern deep learning ecosystem

10/25/2021
by   Piotr Żelasko, et al.
0

Speech data is notoriously difficult to work with due to a variety of codecs, lengths of recordings, and meta-data formats. We present Lhotse, a speech data representation library that draws upon lessons learned from Kaldi speech recognition toolkit and brings its concepts into the modern deep learning ecosystem. Lhotse provides a common JSON description format with corresponding Python classes and data preparation recipes for over 30 popular speech corpora. Various datasets can be easily combined together and re-purposed for different tasks. The library handles multi-channel recordings, long recordings, local and cloud storage, lazy and on-the-fly operations amongst other features. We introduce Cut and CutSet concepts, which simplify common data wrangling tasks for audio and help incorporate acoustic context of speech utterances. Finally, we show how Lhotse leverages PyTorch data API abstractions and adopts them to handle speech data for deep learning.

READ FULL TEXT
research
04/11/2018

Flexible and Scalable Deep Learning with MMLSpark

In this work we detail a novel open source library, called MMLSpark, tha...
research
10/27/2022

Masked Autoencoders Are Articulatory Learners

Articulatory recordings track the positions and motion of different arti...
research
01/24/2021

A Review of Speaker Diarization: Recent Advances with Deep Learning

Speaker diarization is a task to label audio or video recordings with cl...
research
11/06/2018

Reconstructing Speech Stimuli From Human Auditory Cortex Activity Using a WaveNet Approach

The superior temporal gyrus (STG) region of cortex critically contribute...
research
02/11/2020

fastai: A Layered API for Deep Learning

fastai is a deep learning library which provides practitioners with high...
research
05/03/2023

Analysing the Impact of Audio Quality on the Use of Naturalistic Long-Form Recordings for Infant-Directed Speech Research

Modelling of early language acquisition aims to understand how infants b...
research
02/28/2023

BrainBERT: Self-supervised representation learning for intracranial recordings

We create a reusable Transformer, BrainBERT, for intracranial recordings...

Please sign up or login with your details

Forgot password? Click here to reset