Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation

02/21/2023
by   Biao Zhang, et al.
0

For end-to-end speech translation, regularizing the encoder with the Connectionist Temporal Classification (CTC) objective using the source transcript or target translation as labels can greatly improve quality metrics. However, CTC demands an extra prediction layer over the vocabulary space, bringing in nonnegligible model parameters and computational overheads, although this layer is typically not used for inference. In this paper, we re-examine the need for genuine vocabulary labels for CTC for regularization and explore strategies to reduce the CTC label space, targeting improved efficiency without quality degradation. We propose coarse labeling for CTC (CoLaCTC), which merges vocabulary labels via simple heuristic rules, such as using truncation, division or modulo (MOD) operations. Despite its simplicity, our experiments on 4 source and 8 target languages show that CoLaCTC with MOD particularly can compress the label space aggressively to 256 and even further, gaining training efficiency (1.18x   1.77x speedup depending on the original vocabulary size) yet still delivering comparable or better performance than the CTC baseline. We also show that CoLaCTC successfully generalizes to CTC regularization regardless of using transcript or translation for labeling.

READ FULL TEXT
research
06/03/2020

Self-Training for End-to-End Speech Translation

One of the main challenges for end-to-end speech translation is data sca...
research
03/06/2022

Focus on the Target's Vocabulary: Masked Label Smoothing for Machine Translation

Label smoothing and vocabulary sharing are two widely used techniques in...
research
02/12/2018

End-to-End Automatic Speech Translation of Audiobooks

We investigate end-to-end speech-to-text translation on a corpus of audi...
research
11/02/2020

Focus on the present: a regularization method for the ASR source-target attention layer

This paper introduces a novel method to diagnose the source-target atten...
research
04/11/2022

End-to-End Speech Translation for Code Switched Speech

Code switching (CS) refers to the phenomenon of interchangeably using wo...
research
11/08/2018

Few-shot learning with attention-based sequence-to-sequence models

End-to-end approaches have recently become popular as a means of simplif...
research
05/13/2022

The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation

Vocabulary selection, or lexical shortlisting, is a well-known technique...

Please sign up or login with your details

Forgot password? Click here to reset