Independent language modeling architecture for end-to-end ASR

11/25/2019
by   Van Tung Pham, et al.
0

The attention-based end-to-end (E2E) automatic speech recognition (ASR) architecture allows for joint optimization of acoustic and language models within a single network. However, in a vanilla E2E ASR architecture, the decoder sub-network (subnet), which incorporates the role of the language model (LM), is conditioned on the encoder output. This means that the acoustic encoder and the language model are entangled that doesn't allow language model to be trained separately from external text data. To address this problem, in this work, we propose a new architecture that separates the decoder subnet from the encoder output. In this way, the decoupled subnet becomes an independently trainable LM subnet, which can easily be updated using the external text data. We study two strategies for updating the new architecture. Experimental results show that, 1) the independent LM architecture benefits from external text data, achieving 9.3 Mandarin HKUST and English NSC datasets respectively; 2)the proposed architecture works well with external LM and can be generalized to different amount of labelled data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2022

USTED: Improving ASR with a Unified Speech and Text Encoder-Decoder

Improving end-to-end speech recognition by incorporating external text d...
research
11/03/2020

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

The external language models (LM) integration remains a challenging task...
research
09/16/2023

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation

Collecting audio-text pairs is expensive; however, it is much easier to ...
research
09/19/2023

Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition

We present a novel integration of an instruction-tuned large language mo...
research
02/16/2019

A Fully Differentiable Beam Search Decoder

We introduce a new beam search decoder that is fully differentiable, mak...
research
05/21/2020

Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

In this work, we study leveraging extra text data to improve low-resourc...
research
10/18/2021

Automatic Learning of Subword Dependent Model Scales

To improve the performance of state-of-the-art automatic speech recognit...

Please sign up or login with your details

Forgot password? Click here to reset