ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing

07/25/2017
by   Andrei M. Butnaru, et al.
0

In this paper, we present a novel unsupervised algorithm for word sense disambiguation (WSD) at the document level. Our algorithm is inspired by a widely-used approach in the field of genetics for whole genome sequencing, known as the Shotgun sequencing technique. The proposed WSD algorithm is based on three main steps. First, a brute-force WSD algorithm is applied to short context windows (up to 10 words) selected from the document in order to generate a short list of likely sense configurations for each window. In the second step, these local sense configurations are assembled into longer composite configurations based on suffix and prefix matching. The resulted configurations are ranked by their length, and the sense of each word is chosen based on a voting scheme that considers only the top k configurations in which the word appears. We compare our algorithm with other state-of-the-art unsupervised WSD algorithms and demonstrate better performance, sometimes by a very large margin. We also show that our algorithm can yield better performance than the Most Common Sense (MCS) baseline on one data set. Moreover, our algorithm has a very small number of parameters, is robust to parameter tuning, and, unlike other bio-inspired methods, it gives a deterministic solution (it does not involve random choices).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/05/2018

Knowledge-based Word Sense Disambiguation using Topic Models

Word Sense Disambiguation is an open problem in Natural Language Process...
research
11/27/2021

Language models in word sense disambiguation for Polish

In the paper, we test two different approaches to the unsupervised word ...
research
03/01/2022

Topological Data Analysis for Word Sense Disambiguation

We develop and test a novel unsupervised algorithm for word sense induct...
research
06/24/2019

LIAAD at SemDeep-5 Challenge: Word-in-Context (WiC)

This paper describes the LIAAD system that was ranked second place in th...
research
01/08/2021

A Novel Word Sense Disambiguation Approach Using WordNet Knowledge Graph

Various applications in computational linguistics and artificial intelli...
research
07/07/2000

Boosting Applied to Word Sense Disambiguation

In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is ap...
research
12/14/2018

Detecting Reliable Novel Word Senses: A Network-Centric Approach

In this era of Big Data, due to expeditious exchange of information on t...

Please sign up or login with your details

Forgot password? Click here to reset