Distilled embedding: non-linear embedding factorization using knowledge distillation

10/02/2019
by   Vasileios Lioutas, et al.
1

Word-embeddings are a vital component of Natural Language Processing (NLP) systems and have been extensively researched. Better representations of words have come at the cost of huge memory footprints, which has made deploying NLP models on edge-devices challenging due to memory limitations. Compressing embedding matrices without sacrificing model performance is essential for successful commercial edge deployment. In this paper, we propose Distilled Embedding, an (input/output) embedding compression method based on low-rank matrix decomposition with an added non-linearity. First, we initialize the weights of our decomposition by learning to reconstruct the full word-embedding and then fine-tune on the downstream task employing knowledge distillation on the factorized embedding. We conduct extensive experimentation with various compression rates on machine translation, using different data-sets with a shared word-embedding matrix for both embedding and vocabulary projection matrices. We show that the proposed technique outperforms conventional low-rank matrix factorization, and other recently proposed word-embedding matrix compression methods.

READ FULL TEXT
research
11/01/2018

Online Embedding Compression for Text Classification using Low Rank Matrix Factorization

Deep learning models have become state of the art for natural language p...
research
06/18/2018

GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking

Model compression is essential for serving large deep neural nets on dev...
research
02/23/2020

Sketching Transformed Matrices with Applications to Natural Language Processing

Suppose we are given a large matrix A=(a_i,j) that cannot be stored in m...
research
01/30/2019

Tensorized Embedding Layers for Efficient Model Compression

The embedding layers transforming input words into real vectors are the ...
research
08/27/2019

On the Effectiveness of Low-Rank Matrix Factorization for LSTM Model Compression

Despite their ubiquity in NLP tasks, Long Short-Term Memory (LSTM) netwo...
research
06/15/2021

Direction is what you need: Improving Word Embedding Compression in Large Language Models

The adoption of Transformer-based models in natural language processing ...
research
11/06/2019

Word Embedding Algorithms as Generalized Low Rank Models and their Canonical Form

Word embedding algorithms produce very reliable feature representations ...

Please sign up or login with your details

Forgot password? Click here to reset