MorphTE: Injecting Morphology in Tensorized Embeddings

10/27/2022
by   Guobing Gan, et al.
0

In the era of deep learning, word embeddings are essential when dealing with text tasks. However, storing and accessing these embeddings requires a large amount of space. This is not conducive to the deployment of these models on resource-limited devices. Combining the powerful compression capability of tensor products, we propose a word embedding compression method with morphological augmentation, Morphologically-enhanced Tensorized Embeddings (MorphTE). A word consists of one or more morphemes, the smallest units that bear meaning or have a grammatical function. MorphTE represents a word embedding as an entangled form of its morpheme vectors via the tensor product, which injects prior semantic and grammatical knowledge into the learning of embeddings. Furthermore, the dimensionality of the morpheme vector and the number of morphemes are much smaller than those of words, which greatly reduces the parameters of the word embeddings. We conduct experiments on tasks such as machine translation and question answering. Experimental results on four translation datasets of different languages show that MorphTE can compress word embedding parameters by about 20 times without performance loss and significantly outperforms related embedding compression methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/08/2018

Exploration on Grounded Word Embedding: Matching Words and Images with Image-Enhanced Skip-Gram Model

Word embedding is designed to represent the semantic meaning of a word w...
research
12/11/2018

On the Dimensionality of Word Embedding

In this paper, we provide a theoretical understanding of word embedding ...
research
08/30/2019

Single Training Dimension Selection for Word Embedding with PCA

In this paper, we present a fast and reliable method based on PCA to sel...
research
04/25/2020

All Word Embeddings from One Embedding

In neural network-based models for natural language processing (NLP), th...
research
08/11/2022

Word-Embeddings Distinguish Denominal and Root-Derived Verbs in Semitic

Proponents of the Distributed Morphology framework have posited the exis...
research
05/21/2019

Enhancing Domain Word Embedding via Latent Semantic Imputation

We present a novel method named Latent Semantic Imputation (LSI) to tran...
research
05/24/2018

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

Many deep learning architectures have been proposed to model the composi...

Please sign up or login with your details

Forgot password? Click here to reset