A Study of Metrics of Distance and Correlation Between Ranked Lists for Compositionality Detection

03/10/2017
by   Christina Lioma, et al.
0

Compositionality in language refers to how much the meaning of some phrase can be decomposed into the meaning of its constituents and the way these constituents are combined. Based on the premise that substitution by synonyms is meaning-preserving, compositionality can be approximated as the semantic similarity between a phrase and a version of that phrase where words have been replaced by their synonyms. Different ways of representing such phrases exist (e.g., vectors [1] or language models [2]), and the choice of representation affects the measurement of semantic similarity. We propose a new compositionality detection method that represents phrases as ranked lists of term weights. Our method approximates the semantic similarity between two ranked list representations using a range of well-known distance and correlation metrics. In contrast to most state-of-the-art approaches in compositionality detection, our method is completely unsupervised. Experiments with a publicly available dataset of 1048 human-annotated phrases shows that, compared to strong supervised baselines, our approach provides superior measurement of compositionality using any of the distance and correlation metrics considered.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2022

Are Representations Built from the Ground Up? An Empirical Examination of Local Composition in Language Models

Compositionality, the phenomenon where the meaning of a phrase can be de...
research
08/19/2017

ClaC: Semantic Relatedness of Words and Phrases

The measurement of phrasal semantic relatedness is an important metric f...
research
08/15/2019

A Multivariate Model for Representing Semantic Non-compositionality

Semantically non-compositional phrases constitute an intriguing research...
research
03/20/2019

Contextual Compositionality Detection with External Knowledge Bases andWord Embeddings

When the meaning of a phrase cannot be inferred from the individual mean...
research
09/10/2021

Euphemistic Phrase Detection by Masked Language Model

It is a well-known approach for fringe groups and organizations to use e...
research
10/21/2022

Describing Sets of Images with Textual-PCA

We seek to semantically describe a set of images, capturing both the att...
research
05/15/2016

A Proposal for Linguistic Similarity Datasets Based on Commonality Lists

Similarity is a core notion that is used in psychology and two branches ...

Please sign up or login with your details

Forgot password? Click here to reset