Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation

05/11/2021
by   Shun-Po Chuang, et al.
0

We study the possibilities of building a non-autoregressive speech-to-text translation model using connectionist temporal classification (CTC), and use CTC-based automatic speech recognition as an auxiliary task to improve the performance. CTC's success on translation is counter-intuitive due to its monotonicity assumption, so we analyze its reordering capability. Kendall's tau distance is introduced as the quantitative metric, and gradient-based visualization provides an intuitive way to take a closer look into the model. Our analysis shows that transformer encoders have the ability to change the word order and points out the future research direction that worth being explored more on non-autoregressive speech translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2020

Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin

Nigerian Pidgin remains one of the most popular languages in West Africa...
research
04/07/2021

Pushing the Limits of Non-Autoregressive Speech Recognition

We combine recent advancements in end-to-end speech recognition to non-a...
research
10/11/2021

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Non-autoregressive (NAR) models simultaneously generate multiple outputs...
research
11/20/2019

A Comparative Study on End-to-end Speech to Text Translation

Recent advances in deep learning show that end-to-end speech to text tra...
research
05/16/2020

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition

Non-autoregressive transformer models have achieved extremely fast infer...
research
05/12/2020

DiscreTalk: Text-to-Speech as a Machine Translation Problem

This paper proposes a new end-to-end text-to-speech (E2E-TTS) model base...

Please sign up or login with your details

Forgot password? Click here to reset