Efficient Transformer for Direct Speech Translation

07/07/2021
by   Belen Alastruey, et al.
7

The advent of Transformer-based models has surpassed the barriers of text. When working with speech, we must face a problem: the sequence length of an audio input is not suitable for the Transformer. To bypass this problem, a usual approach is adding strided convolutional layers, to reduce the sequence length before using the Transformer. In this paper, we propose a new approach for direct Speech Translation, where thanks to an efficient Transformer we can work with a spectrogram without having to use convolutional layers before the Transformer. This allows the encoder to learn directly from the spectrogram and no information is lost. We have created an encoder-decoder model, where the encoder is an efficient Transformer – the Longformer – and the decoder is a traditional Transformer decoder. Our results, which are close to the ones obtained with the standard approach, show that this is a promising research direction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2022

GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation

Transformer structure, stacked by a sequence of encoder and decoder netw...
research
06/23/2023

The Double Helix inside the NLP Transformer

We introduce a framework for analyzing various types of information in a...
research
01/02/2023

Transformer Based Geocoding

In this paper, we formulate the problem of predicting a geolocation from...
research
09/09/2021

Speechformer: Reducing Information Loss in Direct Speech Translation

Transformer-based models have gained increasing popularity achieving sta...
research
11/17/2020

s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis

Neural end-to-end text-to-speech (TTS) , which adopts either a recurrent...
research
07/05/2022

Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

Speech coding facilitates the transmission of speech over low-bandwidth ...
research
02/14/2023

Synthesizing audio from tongue motion during speech using tagged MRI via transformer

Investigating the relationship between internal tissue point motion of t...

Please sign up or login with your details

Forgot password? Click here to reset