A Lexical-aware Non-autoregressive Transformer-based ASR Model

05/18/2023
by   Chong-En Lin, et al.
0

Non-autoregressive automatic speech recognition (ASR) has become a mainstream of ASR modeling because of its fast decoding speed and satisfactory result. To further boost the performance, relaxing the conditional independence assumption and cascading large-scaled pre-trained models are two active research directions. In addition to these strategies, we propose a lexical-aware non-autoregressive Transformer-based (LA-NAT) ASR framework, which consists of an acoustic encoder, a speech-text shared encoder, and a speech-text shared decoder. The acoustic encoder is used to process the input speech features as usual, and the speech-text shared encoder and decoder are designed to train speech and text data simultaneously. By doing so, LA-NAT aims to make the ASR model aware of lexical information, so the resulting model is expected to achieve better results by leveraging the learned linguistic knowledge. A series of experiments are conducted on the AISHELL-1, CSJ, and TEDLIUM 2 datasets. According to the experiments, the proposed LA-NAT can provide superior results than other recently proposed non-autoregressive ASR models. In addition, LA-NAT is a relatively compact model than most non-autoregressive ASR models, and it is about 58 times faster than the classic autoregressive model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2021

Non-autoregressive Transformer-based End-to-end ASR using BERT

Transformer-based models have led to a significant innovation in various...
research
10/12/2022

A context-aware knowledge transferring strategy for CTC-based ASR

Non-autoregressive automatic speech recognition (ASR) modeling has recei...
research
09/04/2023

AVATAR: Robust Voice Search Engine Leveraging Autoregressive Document Retrieval and Contrastive Learning

Voice, as input, has progressively become popular on mobiles and seems t...
research
09/27/2021

Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates

The multi-decoder (MD) end-to-end speech translation model has demonstra...
research
05/23/2023

Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding

Attention-based encoder-decoder (AED) models have shown impressive perfo...
research
05/25/2022

Improving CTC-based ASR Models with Gated Interlayer Collaboration

For Automatic Speech Recognition (ASR), the CTC-based methods have becom...
research
09/15/2023

Unimodal Aggregation for CTC-based Speech Recognition

This paper works on non-autoregressive automatic speech recognition. A u...

Please sign up or login with your details

Forgot password? Click here to reset