A Lightweight Regression Method to Infer Psycholinguistic Properties for Brazilian Portuguese

05/19/2017
by   Leandro B. dos Santos, et al.
0

Psycholinguistic properties of words have been used in various approaches to Natural Language Processing tasks, such as text simplification and readability assessment. Most of these properties are subjective, involving costly and time-consuming surveys to be gathered. Recent approaches use the limited datasets of psycholinguistic properties to extend them automatically to large lexicons. However, some of the resources used by such approaches are not available to most languages. This study presents a method to infer psycholinguistic properties for Brazilian Portuguese (BP) using regressors built with a light set of features usually available for less resourced languages: word length, frequency lists, lexical databases composed of school dictionaries and word embedding models. The correlations between the properties inferred are close to those obtained by related works. The resulting resource contains 26,874 words in BP annotated with concreteness, age of acquisition, imageability and subjective frequency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/08/2021

AfriVEC: Word Embedding Models for African Languages. Case Study of Fon and Nobiin

From Word2Vec to GloVe, word embedding models have played key roles in t...
research
09/18/2018

FRAGE: Frequency-Agnostic Word Representation

Continuous word representation (aka word embedding) is a basic building ...
research
06/14/2018

Automatic Language Identification for Romance Languages using Stop Words and Diacritics

Automatic language identification is a natural language processing probl...
research
06/17/2020

On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms

Word Embeddings are used widely in multiple Natural Language Processing ...
research
02/25/2023

Smallest and Largest Block Palindrome Factorizations

A palindrome is a word that reads the same forwards and backwards. A blo...
research
05/26/2021

Automatic Construction of Sememe Knowledge Bases via Dictionaries

A sememe is defined as the minimum semantic unit in linguistics. Sememe ...
research
06/13/2022

Automatic generation of a large dictionary with concreteness/abstractness ratings based on a small human dictionary

Concrete/abstract words are used in a growing number of psychological an...

Please sign up or login with your details

Forgot password? Click here to reset