Unimodal Aggregation for CTC-based Speech Recognition

09/15/2023
by   Ying Fang, et al.
0

This paper works on non-autoregressive automatic speech recognition. A unimodal aggregation (UMA) is proposed to segment and integrate the feature frames that belong to the same text token, and thus to learn better feature representations for text tokens. The frame-wise features and weights are both derived from an encoder. Then, the feature frames with unimodal weights are integrated and further processed by a decoder. Connectionist temporal classification (CTC) loss is applied for training. Compared to the regular CTC, the proposed method learns better feature representations and shortens the sequence length, resulting in lower recognition error and computational complexity. Experiments on three Mandarin datasets show that UMA demonstrates superior or comparable performance to other advanced non-autoregressive methods, such as self-conditioned CTC. Moreover, by integrating self-conditioned CTC into the proposed framework, the performance can be further noticeably improved.

READ FULL TEXT
research
02/15/2021

Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT

Attention-based encoder-decoder (AED) models have achieved promising per...
research
09/08/2022

Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM

Connectionist temporal classification (CTC) -based models are attractive...
research
04/21/2023

Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding

This paper presents the use of non-autoregressive (NAR) approaches for j...
research
01/28/2022

Star Temporal Classification: Sequence Classification with Partially Labeled Data

We develop an algorithm which can learn from partially labeled and unseg...
research
10/28/2020

CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer for Speech Recognition

We propose a CTC alignment-based single step non-autoregressive transfor...
research
05/18/2023

A Lexical-aware Non-autoregressive Transformer-based ASR Model

Non-autoregressive automatic speech recognition (ASR) has become a mains...
research
07/01/2019

Learning to aggregate feature representations

The Algonauts challenge require to construct an multi-subject encoder of...

Please sign up or login with your details

Forgot password? Click here to reset