Stream attention-based multi-array end-to-end speech recognition

11/12/2018
by   Xiaofei Wang, et al.
0

Automatic Speech Recognition (ASR) using multiple microphone arrays has achieved great success in the far-field robustness. Taking advantage of all the information that each array shares and contributes is crucial in this task. Motivated by the advances of joint Connectionist Temporal Classification (CTC)/attention mechanism in the End-to-End (E2E) ASR, a stream attention-based multi-array framework is proposed in this work. Microphone arrays, acting as information streams, are activated by separate encoders and decoded under the instruction of both CTC and attention networks. In terms of attention, a hierarchical structure is adopted. On top of the regular attention networks, stream attention is introduced to steer the decoder toward the most informative encoders. Experiments have been conducted on AMI and DIRHA multi-array corpora using the encoder-decoder architecture. Compared with the best single-array results, the proposed framework has achieved relative Word Error Rates (WERs) reduction of 3.7 than conventional strategies as well.

READ FULL TEXT
research
06/17/2019

Multi-Stream End-to-End Speech Recognition

Attention-based methods and Connectionist Temporal Classification (CTC) ...
research
11/29/2017

Stream Attention for far-field multi-microphone ASR

A stream attention framework has been applied to the posterior probabili...
research
11/12/2018

Multi-encoder multi-resolution framework for end-to-end speech recognition

Attention-based methods and Connectionist Temporal Classification (CTC) ...
research
09/17/2021

Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk and Far-Talk Speech Recognition

In this paper, we propose a dual-encoder ASR architecture for joint mode...
research
10/23/2019

A practical two-stage training strategy for multi-stream end-to-end speech recognition

The multi-stream paradigm of audio processing, in which several sources ...
research
04/22/2018

Multi-Head Decoder for End-to-End Speech Recognition

This paper presents a new network architecture called multi-head decoder...
research
02/05/2021

Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream End-to-End ASR

Performance degradation of an Automatic Speech Recognition (ASR) system ...

Please sign up or login with your details

Forgot password? Click here to reset