Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

09/15/2023
by   Mohammad Zeineldeen, et al.
0

We study a streamable attention-based encoder-decoder model in which either the decoder, or both the encoder and decoder, operate on pre-defined, fixed-size windows called chunks. A special end-of-chunk (EOC) symbol advances from one chunk to the next chunk, effectively replacing the conventional end-of-sequence symbol. This modification, while minor, situates our model as equivalent to a transducer model that operates on chunks instead of frames, where EOC corresponds to the blank symbol. We further explore the remaining differences between a standard transducer and our model. Additionally, we examine relevant aspects such as long-form speech generalization, beam size, and length normalization. Through experiments on Librispeech and TED-LIUM-v2, and by concatenating consecutive sequences for long-form trials, we find that our streamable model maintains competitive performance compared to the non-streamable variant and generalizes very well to long-form speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2023

Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation

Attention-based encoder-decoder (AED) speech recognition model has been ...
research
05/04/2023

Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks

Transducer and Attention based Encoder-Decoder (AED) are two widely used...
research
08/30/2018

End-to-end Speech Recognition with Adaptive Computation Steps

In this paper, we present Adaptive Computation Steps (ACS) algorithm, wh...
research
01/06/2020

Character-Aware Attention-Based End-to-End Speech Recognition

Predicting words and subword units (WSUs) as the output has shown to be ...
research
07/10/2020

Gated Recurrent Context: Softmax-free Attention for Online Encoder-Decoder Speech Recognition

Recently, attention-based encoder-decoder (AED) models have shown state-...
research
04/22/2018

Multi-Head Decoder for End-to-End Speech Recognition

This paper presents a new network architecture called multi-head decoder...
research
04/20/2018

A Mixed Hierarchical Attention based Encoder-Decoder Approach for Standard Table Summarization

Structured data summarization involves generation of natural language su...

Please sign up or login with your details

Forgot password? Click here to reset