BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model

10/29/2022
by   Yosuke Higuchi, et al.
0

This paper presents BERT-CTC, a novel formulation of end-to-end speech recognition that adapts BERT for connectionist temporal classification (CTC). Our formulation relaxes the conditional independence assumptions used in conventional CTC and incorporates linguistic knowledge through the explicit output dependency obtained by BERT contextual embedding. BERT-CTC attends to the full contexts of the input and hypothesized output sequences via the self-attention mechanism. This mechanism encourages a model to learn inner/inter-dependencies between the audio and token representations while maintaining CTC's training efficiency. During inference, BERT-CTC combines a mask-predict algorithm with CTC decoding, which iteratively refines an output sequence. The experimental results reveal that BERT-CTC improves over conventional approaches across variations in speaking styles and languages. Finally, we show that the semantic representations in BERT-CTC are beneficial towards downstream spoken language understanding tasks.

READ FULL TEXT
research
04/22/2022

WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment

Historically lower-level tasks such as automatic speech recognition (ASR...
research
09/24/2019

Understanding Semantics from Speech Through Pre-training

End-to-end Spoken Language Understanding (SLU) is proposed to infer the ...
research
10/11/2022

On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding

In this paper we examine the use of semantically-aligned speech represen...
research
11/02/2022

BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder

We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic sp...
research
01/25/2022

Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models

While Transformers have achieved promising results in end-to-end (E2E) a...
research
04/12/2020

AMR Parsing via Graph-Sequence Iterative Inference

We propose a new end-to-end model that treats AMR parsing as a series of...
research
12/16/2022

Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

End-to-end text-to-speech synthesis (TTS) can generate highly natural sy...

Please sign up or login with your details

Forgot password? Click here to reset