Log In Sign Up

Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation

by   Shun-Po Chuang, et al.

We study the possibilities of building a non-autoregressive speech-to-text translation model using connectionist temporal classification (CTC), and use CTC-based automatic speech recognition as an auxiliary task to improve the performance. CTC's success on translation is counter-intuitive due to its monotonicity assumption, so we analyze its reordering capability. Kendall's tau distance is introduced as the quantitative metric, and gradient-based visualization provides an intuitive way to take a closer look into the model. Our analysis shows that transformer encoders have the ability to change the word order and points out the future research direction that worth being explored more on non-autoregressive speech translation.


page 1

page 2

page 3

page 4


Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin

Nigerian Pidgin remains one of the most popular languages in West Africa...

Pushing the Limits of Non-Autoregressive Speech Recognition

We combine recent advancements in end-to-end speech recognition to non-a...

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Non-autoregressive (NAR) models simultaneously generate multiple outputs...

A Comparative Study on End-to-end Speech to Text Translation

Recent advances in deep learning show that end-to-end speech to text tra...

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition

Non-autoregressive transformer models have achieved extremely fast infer...

DiscreTalk: Text-to-Speech as a Machine Translation Problem

This paper proposes a new end-to-end text-to-speech (E2E-TTS) model base...

Improved Speech Representations with Multi-Target Autoregressive Predictive Coding

Training objectives based on predictive coding have recently been shown ...