Topological Data Analysis for Word Sense Disambiguation

03/01/2022
by   Michael Rawson, et al.
2

We develop and test a novel unsupervised algorithm for word sense induction and disambiguation which uses topological data analysis. Typical approaches to the problem involve clustering, based on simple low level features of distance in word embeddings. Our approach relies on advanced mathematical concepts in the field of topology which provides a richer conceptualization of clusters for the word sense induction tasks. We use a persistent homology barcode algorithm on the SemCor dataset and demonstrate that our approach gives low relative error on word sense induction. This shows the promise of topological algorithms for natural language processing and we advocate for future work in this promising area.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2020

A Novel Method of Extracting Topological Features from Word Embeddings

In recent years, topological data analysis has been utilized for a wide ...
research
02/28/2013

KSU KDD: Word Sense Induction by Clustering in Topic Space

We describe our language-independent unsupervised word sense induction s...
research
05/05/2019

HHMM at SemEval-2019 Task 2: Unsupervised Frame Induction using Contextualized Word Embeddings

We present our system for semantic frame induction that showed the best ...
research
08/22/2022

Dialogue Term Extraction using Transfer Learning and Topological Data Analysis

Goal oriented dialogue systems were originally designed as a natural lan...
research
10/11/2022

Word Sense Induction with Hierarchical Clustering and Mutual Information Maximization

Word sense induction (WSI) is a difficult problem in natural language pr...
research
07/25/2017

ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing

In this paper, we present a novel unsupervised algorithm for word sense ...
research
01/11/2021

Clustering Word Embeddings with Self-Organizing Maps. Application on LaRoSeDa – A Large Romanian Sentiment Data Set

Romanian is one of the understudied languages in computational linguisti...

Please sign up or login with your details

Forgot password? Click here to reset