Multi-encoder multi-resolution framework for end-to-end speech recognition

11/12/2018
by   Ruizhi Li, et al.
0

Attention-based methods and Connectionist Temporal Classification (CTC) network have been promising research directions for end-to-end Automatic Speech Recognition (ASR). The joint CTC/Attention model has achieved great success by utilizing both architectures during multi-task training and joint decoding. In this work, we present a novel Multi-Encoder Multi-Resolution (MEMR) framework based on the joint CTC/Attention model. Two heterogeneous encoders with different architectures, temporal resolutions and separate CTC networks work in parallel to extract complimentary acoustic information. A hierarchical attention mechanism is then used to combine the encoder-level information. To demonstrate the effectiveness of the proposed model, experiments are conducted on Wall Street Journal (WSJ) and CHiME-4, resulting in relative Word Error Rate (WER) reduction of 18.0-32.1 WER in the WSJ eval92 test set, which is the best WER reported for an end-to-end system on this benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2019

Multi-Stream End-to-End Speech Recognition

Attention-based methods and Connectionist Temporal Classification (CTC) ...
research
11/12/2018

Stream attention-based multi-array end-to-end speech recognition

Automatic Speech Recognition (ASR) using multiple microphone arrays has ...
research
06/20/2023

Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

Low-resource accented speech recognition is one of the important challen...
research
10/23/2019

A practical two-stage training strategy for multi-stream end-to-end speech recognition

The multi-stream paradigm of audio processing, in which several sources ...
research
02/07/2018

Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition

The performance of automatic speech recognition systems degrades with in...
research
08/03/2017

Sensor Transformation Attention Networks

Recent work on encoder-decoder models for sequence-to-sequence mapping h...
research
03/31/2022

NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism

Although deep learning and end-to-end models have been widely used and s...

Please sign up or login with your details

Forgot password? Click here to reset