Monolingual alignment of word senses and definitions in lexicographical resources

09/06/2022
by   Sina Ahmadi, et al.
0

The focus of this thesis is broadly on the alignment of lexicographical data, particularly dictionaries. In order to tackle some of the challenges in this field, two main tasks of word sense alignment and translation inference are addressed. The first task aims to find an optimal alignment given the sense definitions of a headword in two different monolingual dictionaries. This is a challenging task, especially due to differences in sense granularity, coverage and description in two resources. After describing the characteristics of various lexical semantic resources, we introduce a benchmark containing 17 datasets of 15 languages where monolingual word senses and definitions are manually annotated across different resources by experts. In the creation of the benchmark, lexicographers' knowledge is incorporated through the annotations where a semantic relation, namely exact, narrower, broader, related or none, is selected for each sense pair. This benchmark can be used for evaluation purposes of word-sense alignment systems. The performance of a few alignment techniques based on textual and non-textual semantic similarity detection and semantic relation induction is evaluated using the benchmark. Finally, we extend this work to translation inference where translation pairs are induced to generate bilingual lexicons in an unsupervised way using various approaches based on graph analysis. This task is of particular interest for the creation of lexicographical resources for less-resourced and under-represented languages and also, assists in increasing coverage of the existing resources. From a practical point of view, the techniques and methods that are developed in this thesis are implemented within a tool that can facilitate the alignment task.

READ FULL TEXT
research
03/10/2020

Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity

We introduce Multi-SimLex, a large-scale lexical resource and evaluation...
research
03/09/2022

Unsupervised Alignment of Distributional Word Embeddings

Cross-domain alignment play a key roles in tasks ranging from machine tr...
research
05/14/2019

Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation

In this article, we tackle the issue of the limited quantity of manually...
research
10/12/2020

Look It Up: Bilingual and Monolingual Dictionaries Improve Neural Machine Translation

Despite advances in neural machine translation (NMT) quality, rare words...
research
01/30/2021

Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks

The use of language is subject to variation over time as well as across ...
research
06/07/2023

Unbalanced Optimal Transport for Unbalanced Word Alignment

Monolingual word alignment is crucial to model semantic interactions bet...
research
06/04/2021

Neural semi-Markov CRF for Monolingual Word Alignment

Monolingual word alignment is important for studying fine-grained editin...

Please sign up or login with your details

Forgot password? Click here to reset