All Word Embeddings from One Embedding

04/25/2020
by   Sho Takase, et al.
0

In neural network-based models for natural language processing (NLP), the largest part of the parameters often consists of word embeddings. Conventional models prepare a large embedding matrix whose size depends on the vocabulary size. Therefore, storing these models in memory and disk storage is costly. In this study, to reduce the total number of parameters, the embeddings for all words are represented by transforming a shared embedding. The proposed method, ALONE (all word embeddings from one), constructs the embedding of a word by modifying the shared embedding with a filter vector, which is word-specific but non-trainable. Then, we input the constructed embedding into a feed-forward neural network to increase its expressiveness. Naively, the filter vectors occupy the same memory size as the conventional embedding matrix, which depends on the vocabulary size. To solve this issue, we also introduce a memory-efficient filter construction approach. We indicate our ALONE can be used as word representation sufficiently through an experiment on the reconstruction of pre-trained word embeddings. In addition, we also conduct experiments on NLP application tasks: machine translation and summarization. We combined ALONE with the current state-of-the-art encoder-decoder model, the Transformer, and achieved comparable scores on WMT 2014 English-to-German translation and DUC 2004 very short summarization with less parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2016

Neural-based Noise Filtering from Word Embeddings

Word embeddings have been demonstrated to benefit NLP tasks impressively...
research
11/12/2019

word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement

Deep learning natural language processing models often use vector word e...
research
10/27/2022

MorphTE: Injecting Morphology in Tensorized Embeddings

In the era of deep learning, word embeddings are essential when dealing ...
research
11/03/2017

Compressing Word Embeddings via Deep Compositional Code Learning

Natural language processing (NLP) models often require a massive number ...
research
04/24/2023

Semantic Tokenizer for Enhanced Natural Language Processing

Traditionally, NLP performance improvement has been focused on improving...
research
05/18/2023

Less is More! A slim architecture for optimal language translation

The softmax attention mechanism has emerged as a noteworthy development ...
research
12/19/2022

Multi hash embeddings in spaCy

The distributed representation of symbols is one of the key technologies...

Please sign up or login with your details

Forgot password? Click here to reset