Bridging Natural Language Processing and Psycholinguistics: computationally grounded semantic similarity datasets for Basque and Spanish

04/19/2023
by   J. Goikoetxea, et al.
0

We present a computationally-grounded word similarity dataset based on two well-known Natural Language Processing resources; text corpora and knowledge bases. This dataset aims to fulfil a gap in psycholinguistic research by providing a variety of quantifications of semantic similarity in an extensive set of noun pairs controlled by variables that play a significant role in lexical processing. The dataset creation has consisted in three steps, 1) computing four key psycholinguistic features for each noun; concreteness, frequency, semantic and phonological neighbourhood density; 2) pairing nouns across these four variables; 3) for each noun pair, assigning three types of word similarity measurements, computed out of text, Wordnet and hybrid embeddings. The present dataset includes noun pairs' information in Basque and European Spanish, but further work intends to extend it to more languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2022

SimRelUz: Similarity and Relatedness scores as a Semantic Evaluation dataset for Uzbek language

Semantic relatedness between words is one of the core concepts in natura...
research
02/15/2018

Calculating the similarity between words and sentences using a lexical database and corpus statistics

Calculating the semantic similarity between sentences is a long dealt pr...
research
04/11/2022

Resources for Turkish Natural Language Processing: A critical survey

This paper presents a comprehensive survey of corpora and lexical resour...
research
09/24/2021

Rethinking Crowd Sourcing for Semantic Similarity

Estimation of semantic similarity is crucial for a variety of natural la...
research
11/26/2015

OntoSeg: a Novel Approach to Text Segmentation using Ontological Similarity

Text segmentation (TS) aims at dividing long text into coherent segments...
research
10/23/2020

Learning to Recognize Dialect Features

Linguists characterize dialects by the presence, absence, and frequency ...
research
05/22/2023

A study of conceptual language similarity: comparison and evaluation

An interesting line of research in natural language processing (NLP) aim...

Please sign up or login with your details

Forgot password? Click here to reset