A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

10/11/2021
by   Yosuke Higuchi, et al.
0

Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines. Showing great potential for real-time applications, an increasing number of NAR models have been explored in different fields to mitigate the performance gap against AR models. In this work, we conduct a comparative study of various NAR modeling methods for end-to-end automatic speech recognition (ASR). Experiments are performed in the state-of-the-art setting using ESPnet. The results on various tasks provide interesting findings for developing an understanding of NAR ASR, such as the accuracy-speed trade-off and robustness against long-form utterances. We also show that the techniques can be combined for further improvement and applied to NAR end-to-end speech translation. All the implementations are publicly available to encourage further research in NAR speech processing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2021

Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models

Non-autoregressive (NAR) modeling has gained more and more attention in ...
research
04/22/2020

A Study of Non-autoregressive Model for Sequence Generation

Non-autoregressive (NAR) models generate all the tokens of a sequence in...
research
04/21/2023

Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding

This paper presents the use of non-autoregressive (NAR) approaches for j...
research
05/11/2021

Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation

We study the possibilities of building a non-autoregressive speech-to-te...
research
01/25/2022

Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models

While Transformers have achieved promising results in end-to-end (E2E) a...
research
04/20/2022

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

Non-autoregressive (NAR) generation, which is first proposed in neural m...
research
02/22/2022

VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition

While end-to-end models have shown great success on the Automatic Speech...

Please sign up or login with your details

Forgot password? Click here to reset