Automated Generation of Multilingual Clusters for the Evaluation of Distributed Representations

11/04/2016
by   Philip Blair, et al.
0

We propose a language-agnostic way of automatically generating sets of semantically similar clusters of entities along with sets of "outlier" elements, which may then be used to perform an intrinsic evaluation of word embeddings in the outlier detection task. We used our methodology to create a gold-standard dataset, which we call WikiSem500, and evaluated multiple state-of-the-art embeddings. The results show a correlation between performance on this dataset and performance on sentiment analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2017

Topic Based Sentiment Analysis Using Deep Learning

In this paper , we tackle Sentiment Analysis conditioned on a Topic in T...
research
03/06/2020

Quality of Word Embeddings on Sentiment Analysis Tasks

Word embeddings or distributed representations of words are being used i...
research
04/18/2020

A Hybrid Approach for Aspect-Based Sentiment Analysis Using Deep Contextual Word Embeddings and Hierarchical Attention

The Web has become the main platform where people express their opinions...
research
05/27/2022

Semeval-2022 Task 1: CODWOE – Comparing Dictionaries and Word Embeddings

Word embeddings have advanced the state of the art in NLP across numerou...
research
05/03/2017

On the effectiveness of feature set augmentation using clusters of word embeddings

Word clusters have been empirically shown to offer important performance...
research
11/15/2020

The Challenge of Diacritics in Yoruba Embeddings

The major contributions of this work include the empirical establishment...
research
09/12/2019

Lost in Evaluation: Misleading Benchmarks for Bilingual Dictionary Induction

The task of bilingual dictionary induction (BDI) is commonly used for in...

Please sign up or login with your details

Forgot password? Click here to reset