ProFormer: Towards On-Device LSH Projection Based Transformers

04/13/2020
by   Chinnadhurai Sankar, et al.
0

At the heart of text based neural models lay word representations, which are powerful but occupy a lot of memory making it challenging to deploy to devices with memory constraints such as mobile phones, watches and IoT. To surmount these challenges, we introduce ProFormer – a projection based transformer architecture that is faster and lighter making it suitable to deploy to memory constraint devices and preserve user privacy. We use LSH projection layer to dynamically generate word representations on-the-fly without embedding lookup tables leading to significant memory footprint reduction from O(V.d) to O(T), where V is the vocabulary size, d is the embedding dimension size and T is the dimension of the LSH projection representation. We also propose a local projection attention (LPA) layer, which uses self-attention to transform the input sequence of N LSH word projections into a sequence of N/K representations reducing the computations quadratically by O(K^2). We evaluate ProFormer on multiple text classification tasks and observed improvements over prior state-of-the-art on-device approaches for short text classification and comparable performance for long text classification tasks. In comparison with a 2-layer BERT model, ProFormer reduced the embedding memory footprint from 92.16 MB to 1.3 KB and requires 16 times less computation overhead, which is very impressive making it the fastest and smallest on-device model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2019

On the Robustness of Projection Neural Networks For Efficient Text Representation: An Empirical Study

Recently, there has been strong interest in developing natural language ...
research
06/04/2019

Transferable Neural Projection Representations

Neural word representations are at the core of many state-of-the-art nat...
research
08/02/2017

ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections

Deep neural networks have become ubiquitous for applications related to ...
research
10/14/2022

HashFormers: Towards Vocabulary-independent Pre-trained Transformers

Transformer-based pre-trained language models are vocabulary-dependent, ...
research
05/09/2021

FNet: Mixing Tokens with Fourier Transforms

We show that Transformer encoder architectures can be massively sped up,...
research
07/22/2018

On Tree-structured Multi-stage Principal Component Analysis (TMPCA) for Text Classification

A novel sequence-to-vector (seq2vec) embedding method, called the tree-s...
research
11/16/2020

Text Information Aggregation with Centrality Attention

A lot of natural language processing problems need to encode the text se...

Please sign up or login with your details

Forgot password? Click here to reset