Automatic Ambiguity Detection

05/28/2019
by   Richard Sproat, et al.
0

Most work on sense disambiguation presumes that one knows beforehand -- e.g. from a thesaurus -- a set of polysemous terms. But published lists invariably give only partial coverage. For example, the English word tan has several obvious senses, but one may overlook the abbreviation for tangent. In this paper, we present an algorithm for identifying interesting polysemous terms and measuring their degree of polysemy, given an unlabeled corpus. The algorithm involves: (i) collecting all terms within a k-term window of the target term; (ii) computing the inter-term distances of the contextual terms, and reducing the multi-dimensional distance space to two dimensions using standard methods; (iii) converting the two-dimensional representation into radial coordinates and using isotonic/antitonic regression to compute the degree to which the distribution deviates from a single-peak model. The amount of deviation is the proposed polysemy index

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/10/2016

Penta and Hexa Valued Representation of Neutrosophic Information

Starting from the primary representation of neutrosophic information, na...
research
04/15/2017

MUSE: Modularizing Unsupervised Sense Embeddings

This paper proposes to address the word sense ambiguity issue in an unsu...
research
10/24/2016

Geometry of Polysemy

Vector representations of words have heralded a transformational approac...
research
11/29/2022

Evaluating and reducing the distance between synthetic and real speech distributions

While modern Text-to-Speech (TTS) systems can produce speech rated highl...
research
05/10/2021

Large deviation principles induced by the Stiefel manifold, and random multi-dimensional projections

Given an n-dimensional random vector X^(n) , for k < n, consider its k-d...
research
04/04/2019

Multi-Context Term Embeddings: the Use Case of Corpus-based Term Set Expansion

In this paper, we present a novel algorithm that combines multi-context ...
research
11/04/2020

Probing Multilingual BERT for Genetic and Typological Signals

We probe the layers in multilingual BERT (mBERT) for phylogenetic and ge...

Please sign up or login with your details

Forgot password? Click here to reset