Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

07/01/2016
by   Stéphan Tulkens, et al.
0

Word embeddings have recently seen a strong increase in interest as a result of strong performance gains on a variety of tasks. However, most of this research also underlined the importance of benchmark datasets, and the difficulty of constructing these for a variety of language-specific tasks. Still, many of the datasets used in these tasks could prove to be fruitful linguistic resources, allowing for unique observations into language use and variability. In this paper we demonstrate the performance of multiple types of embeddings, created with both count and prediction-based architectures on a variety of corpora, in two language-specific tasks: relation evaluation, and dialect identification. For the latter, we compare unsupervised methods with a traditional, hand-crafted dictionary. With this research, we provide the embeddings themselves, the relation evaluation task benchmark for use in further research, and demonstrate how the benchmarked embeddings prove a useful unsupervised linguistic resource, effectively used in a downstream task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2019

Evaluation of Greek Word Embeddings

Since word embeddings have been the most popular input for many NLP task...
research
09/06/2018

Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation

Following the recent success of word embeddings, it has been argued that...
research
08/07/2018

Word-Level Loss Extensions for Neural Temporal Relation Classification

Unsupervised pre-trained word embeddings are used effectively for many t...
research
06/06/2018

The Limitations of Cross-language Word Embeddings Evaluation

The aim of this work is to explore the possible limitations of existing ...
research
04/14/2021

UPB at SemEval-2021 Task 1: Combining Deep Learning and Hand-Crafted Features for Lexical Complexity Prediction

Reading is a complex process which requires proper understanding of text...
research
02/23/2021

Paraphrases do not explain word analogies

Many types of distributional word embeddings (weakly) encode linguistic ...
research
06/07/2017

Insights into Analogy Completion from the Biomedical Domain

Analogy completion has been a popular task in recent years for evaluatin...

Please sign up or login with your details

Forgot password? Click here to reset