SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

04/05/2021
by   Patrick K. O'Neill, et al.
14

In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models. This adds complexity and limits performance, as many formatting tasks benefit from semantic information present in the acoustic signal but absent in transcription. Here we propose a new STT task: end-to-end neural transcription with fully formatted text for target labels. We present baseline Conformer-based models trained on a corpus of 5,000 hours of professionally transcribed earnings calls, achieving a CER of 1.7. As a contribution to the STT research community, we release the corpus free for non-commercial use at https://datasets.kensho.com/datasets/scribe.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2018

TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation

In this paper, we present TED-LIUM release 3 corpus dedicated to speech ...
research
04/05/2019

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

This paper introduces a new speech corpus called "LibriTTS" designed for...
research
03/30/2018

Conditional End-to-End Audio Transforms

We present an end-to-end method for transforming audio from one style to...
research
10/28/2017

JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis

Thanks to improvements in machine learning techniques including deep lea...
research
05/18/2023

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

This paper introduces FunASR, an open-source speech recognition toolkit ...
research
02/21/2022

Spanish and English Phoneme Recognition by Training on Simulated Classroom Audio Recordings of Collaborative Learning Environments

Audio recordings of collaborative learning environments contain a consta...
research
07/17/2023

BASS: Block-wise Adaptation for Speech Summarization

End-to-end speech summarization has been shown to improve performance ov...

Please sign up or login with your details

Forgot password? Click here to reset