TxPI-u: A Resource for Personality Identification of Undergraduates

Resources such as labeled corpora are necessary to train automatic models within the natural language processing (NLP) field. Historically, a large number of resources regarding a broad number of problems are available mostly in English. One of such problems is known as Personality Identification where based on a psychological model (e.g. The Big Five Model), the goal is to find the traits of a subject's personality given, for instance, a text written by the same subject. In this paper we introduce a new corpus in Spanish called Texts for Personality Identification (TxPI). This corpus will help to develop models to automatically assign a personality trait to an author of a text document. Our corpus, TxPI-u, contains information of 416 Mexican undergraduate students with some demographics information such as, age, gender, and the academic program they are enrolled. Finally, as an additional contribution, we present a set of baselines to provide a comparison scheme for further research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2021

The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses

Gender bias in natural language processing (NLP) applications, particula...
research
01/16/2013

A Rhetorical Analysis Approach to Natural Language Processing

The goal of this research was to find a way to extend the capabilities o...
research
06/14/2018

Automatic Language Identification for Romance Languages using Stop Words and Diacritics

Automatic language identification is a natural language processing probl...
research
10/06/2018

Personality facets recognition from text

Fundamental Big Five personality traits and their facets are known to co...
research
05/11/2018

TutorialBank: A Manually-Collected Corpus for Prerequisite Chains, Survey Extraction and Resource Recommendation

The field of Natural Language Processing (NLP) is growing rapidly, with ...
research
01/17/2022

Towards a Cleaner Document-Oriented Multilingual Crawled Corpus

The need for raw large raw corpora has dramatically increased in recent ...
research
03/05/2020

Automatic Compilation of Resources for Academic Writing and Evaluating with Informal Word Identification and Paraphrasing System

We present the first approach to automatically building resources for ac...

Please sign up or login with your details

Forgot password? Click here to reset