Compressing Word Embeddings Using Syllables

01/13/2022
by   Laurent Mertens, et al.
0

This work examines the possibility of using syllable embeddings, instead of the often used n-gram embeddings, as subword embeddings. We investigate this for two languages: English and Dutch. To this end, we also translated two standard English word embedding evaluation datasets, WordSim353 and SemEval-2017, to Dutch. Furthermore, we provide the research community with data sets of syllabic decompositions for both languages. We compare our approach to full word and n-gram embeddings. Compared to full word embeddings, we obtain English models that are 20 to 30 times smaller while retaining 80 70 used, our models can be trained in a matter of minutes, as opposed to hours for the n-gram approach. We identify a path toward upgrading performance in future work. All code is made publicly available, as well as our collected English and Dutch syllabic decompositions and Dutch evaluation set translations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2021

Co-occurrences using Fasttext embeddings for word similarity tasks in Urdu

Urdu is a widely spoken language in South Asia. Though immoderate litera...
research
07/27/2017

Analysis of Italian Word Embeddings

In this work we analyze the performances of two of the most used word em...
research
10/23/2019

A context sensitive real-time Spell Checker with language adaptability

We present a novel language adaptable spell checking system which detect...
research
04/24/2017

Streaming Word Embeddings with the Space-Saving Algorithm

We develop a streaming (one-pass, bounded-memory) word embedding algorit...
research
10/04/2019

DialectGram: Automatic Detection of Dialectal Variation at Multiple Geographic Resolutions

We propose DialectGram, a method to detect dialectical variation across ...
research
09/19/2020

Word class flexibility: A deep contextualized approach

Word class flexibility refers to the phenomenon whereby a single word fo...
research
07/10/2020

Topic Modeling on User Stories using Word Mover's Distance

Requirements elicitation has recently been complemented with crowd-based...

Please sign up or login with your details

Forgot password? Click here to reset