Wav2Letter: an End-to-End ConvNet-based Speech Recognition System

09/11/2016
by   Ronan Collobert, et al.
0

This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding. It is trained to output letters, with transcribed speech, without the need for force alignment of phonemes. We introduce an automatic segmentation criterion for training from sequence annotation without alignment that is on par with CTC while being simpler. We show competitive results in word error rate on the Librispeech corpus with MFCC features, and promising results from raw waveform.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2020

End to End ASR System with Automatic Punctuation Insertion

Recent Automatic Speech Recognition systems have been moving towards end...
research
12/22/2017

Letter-Based Speech Recognition with Gated ConvNets

In this paper we introduce a new speech recognition system, leveraging a...
research
06/19/2018

End-to-End Speech Recognition From the Raw Waveform

State-of-the-art speech recognition systems rely on fixed, hand-crafted ...
research
12/10/2019

A Novel Topology for End-to-end Temporal Classification and Segmentation with Recurrent Neural Network

Connectionist temporal classification (CTC) has matured as an alignment ...
research
12/17/2020

CIF-based Collaborative Decoding for End-to-end Contextual Speech Recognition

End-to-end (E2E) models have achieved promising results on multiple spee...
research
09/19/2019

A Comparison of Hybrid and End-to-End Models for Syllable Recognition

This paper presents a comparison of a traditional hybrid speech recognit...
research
09/06/2021

Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model

An emerging trend in audio processing is capturing low-level speech repr...

Please sign up or login with your details

Forgot password? Click here to reset