Efficient Speech Translation with Dynamic Latent Perceivers

10/28/2022
by   Ioannis Tsiamas, et al.
0

Transformers have been the dominant architecture for Speech Translation in recent years, achieving significant improvements in translation quality. Since speech signals are longer than their textual counterparts, and due to the quadratic complexity of the Transformer, a down-sampling step is essential for its adoption in Speech Translation. Instead, in this research, we propose to ease the complexity by using a Perceiver encoder to map the speech inputs to a fixed-length latent representation. Furthermore, we introduce a novel way of training Perceivers, with Dynamic Latent Access (DLA), unlocking larger latent spaces without any additional computational overhead. Speech-to-Text Perceivers with DLA can match the performance of a Transformer baseline across three language pairs in MuST-C. Finally, a DLA-trained model is easily adaptable to DLA at inference, and can be flexibly deployed with various computational budgets, without significant drops in translation quality.

READ FULL TEXT
research
10/16/2022

RedApt: An Adaptor for wav2vec 2 Encoding Faster and Smaller Speech Translation without Quality Compromise

Pre-trained speech Transformers in speech translation (ST) have facilita...
research
10/28/2020

Bridging the Modality Gap for Speech-to-Text Translation

End-to-end speech translation aims to translate speech in one language i...
research
12/12/2022

Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features

Speech-to-speech translation directly translates a speech utterance to a...
research
06/13/2019

Lattice Transformer for Speech Translation

Recent advances in sequence modeling have highlighted the strengths of t...
research
09/09/2021

Speechformer: Reducing Information Loss in Direct Speech Translation

Transformer-based models have gained increasing popularity achieving sta...
research
07/03/2023

Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation

Transformer models using segment-based processing have been an effective...
research
10/24/2022

Real-time Speech Interruption Analysis: From Cloud to Client Deployment

Meetings are an essential form of communication for all types of organiz...

Please sign up or login with your details

Forgot password? Click here to reset