Huge Automatically Extracted Training Sets for Multilingual Word Sense Disambiguation

05/12/2018
by   Tommaso Pasini, et al.
0

We release to the community six large-scale sense-annotated datasets in multiple language to pave the way for supervised multilingual Word Sense Disambiguation. Our datasets cover all the nouns in the English WordNet and their translations in other languages for a total of millions of sense-tagged sentences. Experiments prove that these corpora can be effectively used as training sets for supervised WSD systems, surpassing the state of the art for low-resourced languages and providing competitive results for English, where manually annotated training sets are accessible. The data is available at trainomatic.org.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2018

A Short Survey on Sense-Annotated Corpora for Diverse Languages and Resources

With the advancement of research in word sense disambiguation and deep l...
research
06/11/2021

Semi-Supervised and Unsupervised Sense Annotation via Translations

Acquisition of multilingual training data continues to be a challenge in...
research
03/03/2023

Mapping Wordnets on the Fly with Permanent Sense Keys

Most of the major databases on the semantic web have links to Princeton ...
research
08/24/2016

A Large-Scale Multilingual Disambiguation of Glosses

Linking concepts and named entities to knowledge bases has become a cruc...
research
09/17/2018

Unsupervised Sense-Aware Hypernymy Extraction

In this paper, we show how unsupervised sense representations can be use...
research
09/06/2019

To lemmatize or not to lemmatize: how word normalisation affects ELMo performance in word sense disambiguation

We critically evaluate the widespread assumption that deep learning NLP ...
research
11/27/2016

Semi Supervised Preposition-Sense Disambiguation using Multilingual Data

Prepositions are very common and very ambiguous, and understanding their...

Please sign up or login with your details

Forgot password? Click here to reset