DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

04/17/2021
by   Dominik Schlechtweg, et al.
0

Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We thoroughly describe the multi-round incremental annotation process, the choice for a clustering algorithm to group usages into senses, and possible - diachronic and synchronic - uses for this dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2022

Persian Abstract Meaning Representation

Abstract Meaning Representation (AMR) is an annotation framework represe...
research
12/29/2020

The Parallel Meaning Bank: A Framework for Semantically Annotating Multiple Languages

This paper gives a general description of the ideas behind the Parallel ...
research
08/02/2018

OntoSenseNet: A Verb-Centric Ontological Resource for Indian Languages

Following approaches for understanding lexical meaning developed by Yask...
research
09/28/2022

RuDSI: graph-based word sense induction dataset for Russian

We present RuDSI, a new benchmark for word sense induction (WSI) in Russ...
research
08/22/2019

Unsupervised Lemmatization as Embeddings-Based Word Clustering

We focus on the task of unsupervised lemmatization, i.e. grouping togeth...
research
04/17/2016

From Incremental Meaning to Semantic Unit (phrase by phrase)

This paper describes an experimental approach to Detection of Minimal Se...
research
01/13/2022

NorDiaChange: Diachronic Semantic Change Dataset for Norwegian

We describe NorDiaChange: the first diachronic semantic change dataset f...

Please sign up or login with your details

Forgot password? Click here to reset