N-GrAM: New Groningen Author-profiling Model

07/12/2017
by   Angelo Basile, et al.
0

We describe our participation in the PAN 2017 shared task on Author Profiling, identifying authors' gender and language variety for English, Spanish, Arabic and Portuguese. We describe both the final, submitted system, and a series of negative results. Our aim was to create a single model for both gender and language, and for all language varieties. Our best-performing system (on cross-validated results) is a linear support vector machine (SVM) with word unigrams and character 3- to 5-grams as features. A set of additional features, including POS tags, additional datasets, geographic entities, and Twitter handles, hurt, rather than improve, performance. Results from cross-validation indicated high performance overall and results on the test set confirmed them, at 0.86 averaged accuracy, with performance on sub-tasks ranging from 0.68 to 0.98.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2017

Including Dialects and Language Varieties in Author Profiling

This paper presents a computational approach to author profiling taking ...
research
02/23/2019

ABI Neural Ensemble Model for Gender Prediction Adapt Bar-Ilan Submission for the CLIN29 Shared Task on Gender Prediction

We present our system for the CLIN29 shared task on cross-genre gender d...
research
08/22/2019

Gender Prediction from Tweets: Improving Neural Representations with Hand-Crafted Features

Author profiling is the characterization of an author through some key a...
research
04/05/2020

Arabic Offensive Language on Twitter: Analysis and Experiments

Detecting offensive language on Twitter has many applications ranging fr...
research
12/26/2018

An Investigation of Supervised Learning Methods for Authorship Attribution in Short Hinglish Texts using Char & Word N-grams

The writing style of a person can be affirmed as a unique identity indic...
research
10/28/2020

Test Set Optimization by Machine Learning Algorithms

Diagnosis results are highly dependent on the volume of test set. To der...
research
07/08/2015

What Your Username Says About You

Usernames are ubiquitous on the Internet, and they are often suggestive ...

Please sign up or login with your details

Forgot password? Click here to reset