Including Dialects and Language Varieties in Author Profiling

07/03/2017
by   Alina Maria Ciobanu, et al.
0

This paper presents a computational approach to author profiling taking gender and language variety into account. We apply an ensemble system with the output of multiple linear SVM classifiers trained on character and word n-grams. We evaluate the system using the dataset provided by the organizers of the 2017 PAN lab on author profiling. Our approach achieved 75 accuracy on gender identification on tweets written in four languages and 97 accuracy on language variety identification for Portuguese.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2017

N-GrAM: New Groningen Author-profiling Model

We describe our participation in the PAN 2017 shared task on Author Prof...
research
09/02/2020

Too good to be true? Predicting author profiles from abusive language

The problem of online threats and abuse could potentially be mitigated w...
research
06/16/2015

Author Identification using Multi-headed Recurrent Neural Networks

Recurrent neural networks (RNNs) are very good at modelling the flow of ...
research
12/11/2022

Authorship Identification of Source Code Segments Written by Multiple Authors Using Stacking Ensemble Method

Source code segment authorship identification is the task of identifying...
research
08/21/2017

Vector Space Model as Cognitive Space for Text Classification

In this era of digitization, knowing the user's sociolect aspects have b...
research
07/06/2019

Exploring difference in public perceptions on HPV vaccine between gender groups from Twitter using deep learning

In this study, we proposed a convolutional neural network model for gend...
research
11/21/2022

Refactoring = Substitution + Rewriting

We present an approach to describing refactorings that abstracts away fr...

Please sign up or login with your details

Forgot password? Click here to reset