The IBM 2016 English Conversational Telephone Speech Recognition System

04/27/2016
by   George Saon, et al.
0

We describe a collection of acoustic and language modeling techniques that lowered the word error rate of our English conversational telephone LVCSR system to a record 6.6 testset. On the acoustic side, we use a score fusion of three strong models: recurrent nets with maxout activations, very deep convolutional nets with 3x3 kernels, and bidirectional long short-term memory nets which operate on FMLLR and i-vector features. On the language modeling side, we use an updated model "M" and hierarchical neural network LMs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2016

The Microsoft 2016 Conversational Speech Recognition System

We describe Microsoft's conversational speech recognition system, in whi...
research
03/06/2017

English Conversational Telephone Speech Recognition by Humans and Machines

One of the most difficult speech recognition tasks is accurate recogniti...
research
11/05/2018

The Marchex 2018 English Conversational Telephone Speech Recognition System

In this paper, we describe recent improvements to the production Marchex...
research
11/05/2021

Conversational speech recognition leveraging effective fusion methods for cross-utterance language modeling

Conversational speech normally is embodied with loose syntactic structur...
research
10/17/2016

Achieving Human Parity in Conversational Speech Recognition

Conversational speech recognition has served as a flagship speech recogn...
research
07/05/2018

Neural Language Codes for Multilingual Acoustic Models

Multilingual Speech Recognition is one of the most costly AI problems, b...
research
05/03/2016

TheanoLM - An Extensible Toolkit for Neural Network Language Modeling

We present a new tool for training neural network language models (NNLMs...

Please sign up or login with your details

Forgot password? Click here to reset