On using 2D sequence-to-sequence models for speech recognition

11/20/2019
by   Parnia Bahar, et al.
0

Attention-based sequence-to-sequence models have shown promising results in automatic speech recognition. Using these architectures, one-dimensional input and output sequences are related by an attention approach, thereby replacing more explicit alignment processes, like in classical HMM-based modeling. In contrast, here we apply a novel two-dimensional long short-term memory (2DLSTM) architecture to directly model the input/output relation between audio/feature vector sequences and word sequences. The proposed model is an alternative model such that instead of using any type of attention components, we apply a 2DLSTM layer to assimilate the context from both input observations and output transcriptions. The experimental evaluation on the Switchboard 300h automatic speech recognition task shows word error rates for the 2DLSTM model that are competitive to end-to-end attention-based model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2016

End-to-end attention-based distant speech recognition with Highway LSTM

End-to-end attention-based models have been shown to be competitive alte...
research
11/04/2020

Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition

Attention-based sequence-to-sequence automatic speech recognition (ASR) ...
research
12/08/2016

Towards better decoding and language model integration in sequence to sequence models

The recently proposed Sequence-to-Sequence (seq2seq) framework advocates...
research
06/12/2023

Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition

Convolutional frontends are a typical choice for Transformer-based autom...
research
01/25/2016

Survey on the attention based RNN model and its applications in computer vision

The recurrent neural networks (RNN) can be used to solve the sequence to...
research
05/08/2018

Improved training of end-to-end attention models for speech recognition

Sequence-to-sequence attention-based models on subword units allow simpl...
research
06/09/2021

A Comparative Study on Neural Architectures and Training Methods for Japanese Speech Recognition

End-to-end (E2E) modeling is advantageous for automatic speech recogniti...

Please sign up or login with your details

Forgot password? Click here to reset