On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms

06/17/2020
by   Adam Sutton, et al.
5

Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper we introduce the notion of "concept" as a list of words that have shared semantic content. We use this notion to analyse the learnability of certain concepts, defined as the capability of a classifier to recognise unseen members of a concept after training on a random subset of it. We first use this method to measure the learnability of concepts on pretrained word embeddings. We then develop a statistical analysis of concept learnability, based on hypothesis testing and ROC curves, in order to compare the relative merits of various embedding algorithms using a fixed corpora and hyper parameters. We find that all embedding methods capture the semantic content of those word lists, but fastText performs better than the others.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2016

New word analogy corpus for exploring embeddings of Czech words

The word embedding methods have been proven to be very useful in many ta...
research
11/12/2019

word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement

Deep learning natural language processing models often use vector word e...
research
07/19/2018

Imparting Interpretability to Word Embeddings

As an ubiquitous method in natural language processing, word embeddings ...
research
07/23/2020

Word Embeddings: Stability and Semantic Change

Word embeddings are computed by a class of techniques within natural lan...
research
12/16/2021

Khmer Word Search: Challenges, Solutions, and Semantic-Aware Search

Search is one of the key functionalities in digital platforms and applic...
research
05/19/2017

A Lightweight Regression Method to Infer Psycholinguistic Properties for Brazilian Portuguese

Psycholinguistic properties of words have been used in various approache...
research
05/31/2022

LEXpander: applying colexification networks to automated lexicon expansion

Recent approaches to text analysis from social media and other corpora r...

Please sign up or login with your details

Forgot password? Click here to reset