LegoNN: Building Modular Encoder-Decoder Models

06/07/2022
by   Siddharth Dalmia, et al.
0

State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or speech recognition (ASR)) are constructed and trained end-to-end as an atomic unit. No component of the model can be (re-)used without the others. We describe LegoNN, a procedure for building encoder-decoder architectures with decoder modules that can be reused across various MT and ASR tasks, without the need for any fine-tuning. To achieve reusability, the interface between each encoder and decoder modules is grounded to a sequence of marginal distributions over a discrete vocabulary pre-defined by the model designer. We present two approaches for ingesting these marginals; one is differentiable, allowing the flow of gradients across the entire network, and the other is gradient-isolating. To enable portability of decoder modules between MT tasks for different source languages and across other tasks like ASR, we introduce a modality agnostic encoder which consists of a length control mechanism to dynamically adapt encoders' output lengths in order to match the expected input length range of pre-trained decoders. We present several experiments to demonstrate the effectiveness of LegoNN models: a trained language generation LegoNN decoder module from German-English (De-En) MT task can be reused with no fine-tuning for the Europarl English ASR and the Romanian-English (Ro-En) MT tasks to match or beat respective baseline models. When fine-tuned towards the target task for few thousand updates, our LegoNN models improved the Ro-En MT task by 1.5 BLEU points, and achieved 12.5 Europarl ASR task. Furthermore, to show its extensibility, we compose a LegoNN ASR model from three modules – each has been learned within different end-to-end trained models on three different datasets – boosting the WER reduction to 19.5

READ FULL TEXT

page 1

page 4

page 6

page 8

research
06/09/2020

Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

Transfer learning from high-resource languages is known to be an efficie...
research
07/13/2021

The IWSLT 2021 BUT Speech Translation Systems

The paper describes BUT's English to German offline speech translation(S...
research
05/12/2021

Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders

Encoder pre-training is promising in end-to-end Speech Translation (ST),...
research
04/25/2020

Jointly Trained Transformers models for Spoken Language Translation

Conventional spoken language translation (SLT) systems are pipeline base...
research
03/31/2023

Lego-Features: Exporting modular encoder features for streaming and deliberation ASR

In end-to-end (E2E) speech recognition models, a representational tight-...
research
11/09/2019

Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models

Inspired by modular software design principles of independence, intercha...
research
10/27/2019

Training ASR models by Generation of Contextual Information

Supervised ASR models have reached unprecedented levels of accuracy, tha...

Please sign up or login with your details

Forgot password? Click here to reset