Letter-Based Speech Recognition with Gated ConvNets

12/22/2017
by   Vitaliy Liptchinsky, et al.
0

In this paper we introduce a new speech recognition system, leveraging a simple letter-based ConvNet acoustic model. The acoustic model requires -- only audio transcription for training -- no alignment annotations, nor any forced alignment step is needed. At inference, our decoder takes only a word list and a language model, and is fed with letter scores from the -- acoustic model -- no phonetic word lexicon is needed. Key ingredients for the acoustic model are Gated Linear Units and high dropout. We show near state-of-the-art results in word error rate on the LibriSpeech corpus using log-mel filterbanks, both on the "clean" and "other" configurations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2016

Wav2Letter: an End-to-End ConvNet-based Speech Recognition System

This paper presents a simple end-to-end model for speech recognition, co...
research
06/10/2019

Word-level Speech Recognition with a Dynamic Lexicon

We propose a direct-to-word sequence model with a dynamic lexicon. Our w...
research
08/21/2017

The Microsoft 2017 Conversational Speech Recognition System

We describe the 2017 version of Microsoft's conversational speech recogn...
research
10/25/2022

Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition

The recent emergence of joint CTC-Attention model shows significant impr...
research
10/12/2021

Word Order Does Not Matter For Speech Recognition

In this paper, we study training of automatic speech recognition system ...
research
03/15/2018

Advancing Acoustic-to-Word CTC Model

The acoustic-to-word model based on the connectionist temporal classific...
research
12/31/2018

Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units

The acoustic-to-word model based on the Connectionist Temporal Classific...

Please sign up or login with your details

Forgot password? Click here to reset