A comprehensive comparative evaluation and analysis of Distributional Semantic Models

05/20/2021
by   Alessandro Lenci, et al.
7

Distributional semantics has deeply changed in the last decades. First, predict models stole the thunder from traditional count ones, and more recently both of them were replaced in many NLP applications by contextualized vectors produced by Transformer neural language models. Although an extensive body of research has been devoted to Distributional Semantic Model (DSM) evaluation, we still lack a thorough comparison with respect to tested models, semantic tasks, and benchmark datasets. Moreover, previous work has mostly focused on task-driven evaluation, instead of exploring the differences between the way models represent the lexical semantic space. In this paper, we perform a comprehensive evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT. First of all, we investigate the performance of embeddings in several semantic tasks, carrying out an in-depth statistical analysis to identify the major factors influencing the behavior of DSMs. The results show that i.) the alleged superiority of predict based models is more apparent than real, and surely not ubiquitous and ii.) static DSMs surpass contextualized representations in most out-of-context semantic tasks and datasets. Furthermore, we borrow from cognitive neuroscience the methodology of Representational Similarity Analysis (RSA) to inspect the semantic spaces generated by distributional models. RSA reveals important differences related to the frequency and part-of-speech of lexical items.

READ FULL TEXT

page 23

page 29

page 30

research
10/03/2022

Lexical semantics enhanced neural word embeddings

Current breakthroughs in natural language processing have benefited dram...
research
11/23/2021

Using Distributional Principles for the Semantic Study of Contextual Language Models

Many studies were recently done for investigating the properties of cont...
research
06/17/2015

Non-distributional Word Vector Representations

Data-driven representation learning for words is a technique of central ...
research
05/16/2022

Assessing the Limits of the Distributional Hypothesis in Semantic Spaces: Trait-based Relational Knowledge and the Impact of Co-occurrences

The increase in performance in NLP due to the prevalence of distribution...
research
04/22/2018

Query Focused Variable Centroid Vectors for Passage Re-ranking in Semantic Search

In this paper, we propose a new approach for passage re-ranking. We show...
research
02/06/2018

A Neurobiologically Motivated Analysis of Distributional Semantic Models

The pervasive use of distributional semantic models or word embeddings i...
research
04/21/2017

Improving Semantic Composition with Offset Inference

Count-based distributional semantic models suffer from sparsity due to u...

Please sign up or login with your details

Forgot password? Click here to reset