Word embeddings for idiolect identification

02/10/2019
by   Konstantinos Perifanos, et al.
0

The term idiolect refers to the unique and distinctive use of language of an individual and it is the theoretical foundation of Authorship Attribution. In this paper we are focusing on learning distributed representations (embeddings) of social media users that reflect their writing style. These representations can be considered as stylistic fingerprints of the authors. We are exploring the performance of the two main flavours of distributed representations, namely embeddings produced by Neural Probabilistic Language models (such as word2vec) and matrix factorization (such as GloVe).

READ FULL TEXT

page 1

page 2

page 3

research
11/11/2020

Exploring the Value of Personalized Word Embeddings

In this paper, we introduce personalized word embeddings, and examine th...
research
05/09/2018

Incorporating Subword Information into Matrix Factorization Word Embeddings

The positive effect of adding subword information to word embeddings has...
research
12/11/2017

Social Media Writing Style Fingerprint

We present our approach for computer-aided social media text authorship ...
research
06/15/2021

Deriving Word Vectors from Contextualized Language Models using Topic-Aware Mention Selection

One of the long-standing challenges in lexical semantics consists in lea...
research
04/25/2020

When do Word Embeddings Accurately Reflect Surveys on our Beliefs About People?

Social biases are encoded in word embeddings. This presents a unique opp...
research
12/02/2015

Learning Semantic Similarity for Very Short Texts

Levering data on social media, such as Twitter and Facebook, requires in...
research
05/21/2019

A Comparative Analysis of Distributional Term Representations for Author Profiling in Social Media

Author Profiling (AP) aims at predicting specific characteristics from a...

Please sign up or login with your details

Forgot password? Click here to reset