Dynamic Transformer for Efficient Machine Translation on Embedded Devices

07/17/2021
by   Hishan Parry, et al.
0

The Transformer architecture is widely used for machine translation tasks. However, its resource-intensive nature makes it challenging to implement on constrained embedded devices, particularly where available hardware resources can vary at run-time. We propose a dynamic machine translation model that scales the Transformer architecture based on the available resources at any particular time. The proposed approach, 'Dynamic-HAT', uses a HAT SuperTransformer as the backbone to search for SubTransformers with different accuracy-latency trade-offs at design time. The optimal SubTransformers are sampled from the SuperTransformer at run-time, depending on latency constraints. The Dynamic-HAT is tested on the Jetson Nano and the approach uses inherited SubTransformers sampled directly from the SuperTransformer with a switching time of <1s. Using inherited SubTransformers results in a BLEU score loss of <1.5 scratch after sampling. However, to recover this loss in performance, the dimensions of the design space can be reduced to tailor it to a family of target hardware. The new reduced design space results in a BLEU score increase of approximately 1 a wide range for performance scaling between 0.356s - 1.526s for the GPU and 2.9s - 7.31s for the CPU.

READ FULL TEXT

page 1

page 2

research
01/03/2020

Learning Accurate Integer Transformer Machine-Translation Models

We describe a method for training accurate Transformer machine-translati...
research
04/24/2020

Lite Transformer with Long-Short Range Attention

Transformer has become ubiquitous in natural language processing (e.g., ...
research
05/28/2020

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Transformers are ubiquitous in Natural Language Processing (NLP) tasks, ...
research
02/12/2021

Dancing along Battery: Enabling Transformer with Run-time Reconfigurability on Mobile Devices

A pruning-based AutoML framework for run-time reconfigurability, namely ...
research
02/01/2023

Attention Link: An Efficient Attention-Based Low Resource Machine Translation Architecture

Transformers have achieved great success in machine translation, but tra...
research
10/01/2019

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Neural sequence-to-sequence models, particularly the Transformer, are th...
research
06/05/2022

Multilingual Neural Machine Translation with Deep Encoder and Multiple Shallow Decoders

Recent work in multilingual translation advances translation quality sur...

Please sign up or login with your details

Forgot password? Click here to reset