MobileNMT: Enabling Translation in 15MB and 30ms

06/07/2023
by   Ye Lin, et al.
0

Deploying NMT models on mobile devices is essential for privacy, low latency, and offline scenarios. For high model capacity, NMT models are rather large. Running these models on devices is challenging with limited storage, memory, computation, and power consumption. Existing work either only focuses on a single metric such as FLOPs or general engine which is not good at auto-regressive decoding. In this paper, we present MobileNMT, a system that can translate in 15MB and 30ms on devices. We propose a series of principles for model compression when combined with quantization. Further, we implement an engine that is friendly to INT8 and decoding. With the co-design of model and engine, compared with the existing system, we speed up 47.0x and save 99.5 memory with only 11.6 https://github.com/zjersey/Lightseq-ARM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2021

Power Consumption of Video-Decoders on Various Android Devices

The critical constraint of mobile devices is a limited battery life that...
research
09/16/2021

The NiuTrans System for WNGT 2020 Efficiency Task

This paper describes the submissions of the NiuTrans Team to the WNGT 20...
research
02/27/2020

MNN: A Universal and Efficient Inference Engine

Deploying deep learning models on mobile devices draws more and more att...
research
09/16/2021

The NiuTrans System for the WMT21 Efficiency Task

This paper describes the NiuTrans system for the WMT21 translation effic...
research
02/02/2019

An end-to-end Generative Retrieval Method for Sponsored Search Engine --Decoding Efficiently into a Closed Target Domain

In this paper, we present a generative retrieval method for sponsored se...
research
03/03/2023

Rotation Invariant Quantization for Model Compression

Post-training Neural Network (NN) model compression is an attractive app...
research
05/12/2023

Monitoring and Adapting ML Models on Mobile Devices

ML models are increasingly being pushed to mobile devices, for low-laten...

Please sign up or login with your details

Forgot password? Click here to reset