Unsupervised Morphological Expansion of Small Datasets for Improving Word Embeddings

11/15/2017
by   Syed Sarfaraz Akhtar, et al.
0

We present a language independent, unsupervised method for building word embeddings using morphological expansion of text. Our model handles the problem of data sparsity and yields improved word embeddings by relying on training word embeddings on artificially generated sentences. We evaluate our method using small sized training sets on eleven test sets for the word similarity task across seven languages. Further, for English, we evaluated the impacts of our approach using a large training set on three standard test sets. Our method improved results across all languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2017

An Unsupervised Approach for Mapping between Vector Spaces

We present a language independent, unsupervised approach for transformin...
research
04/24/2017

A Trie-Structured Bayesian Model for Unsupervised Morphological Segmentation

In this paper, we introduce a trie-structured Bayesian model for unsuper...
research
04/29/2017

Extending and Improving Wordnet via Unsupervised Word Embeddings

This work presents an unsupervised approach for improving WordNet that b...
research
11/29/2016

Geometry of Compositionality

This paper proposes a simple test for compositionality (i.e., literal us...
research
05/04/2020

The Paradigm Discovery Problem

This work treats the paradigm discovery problem (PDP), the task of learn...
research
05/29/2018

Unsupervised Alignment of Embeddings with Wasserstein Procrustes

We consider the task of aligning two sets of points in high dimension, w...
research
05/21/2020

The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs

In this article we present the Frankfurt Latin Lexicon (FLL), a lexical ...

Please sign up or login with your details

Forgot password? Click here to reset