Author2Vec: A Framework for Generating User Embedding

03/17/2020
by   Xiaodong Wu, et al.
0

Online forums and social media platforms provide noisy but valuable data every day. In this paper, we propose a novel end-to-end neural network-based user embedding system, Author2Vec. The model incorporates sentence representations generated by BERT (Bidirectional Encoder Representations from Transformers) with a novel unsupervised pre-training objective, authorship classification, to produce better user embedding that encodes useful user-intrinsic properties. This user embedding system was pre-trained on post data of 10k Reddit users and was analyzed and evaluated on two user classification benchmarks: depression detection and personality classification, in which the model proved to outperform traditional count-based and prediction-based methods. We substantiate that Author2Vec successfully encoded useful user attributes and the generated user embedding performs well in downstream classification tasks without further finetuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2023

Domain-based user embedding for competing events on social media

Online social networks offer vast opportunities for computational social...
research
04/13/2021

Understanding Transformers for Bot Detection in Twitter

In this paper we shed light on the impact of fine-tuning over social med...
research
08/06/2021

Offensive Language and Hate Speech Detection with Deep Learning and Transfer Learning

Toxic online speech has become a crucial problem nowadays due to an expo...
research
02/22/2021

User Factor Adaptation for User Embedding via Multitask Learning

Language varies across users and their interested fields in social media...
research
08/09/2022

E2EG: End-to-End Node Classification Using Graph Topology and Text-based Node Attributes

Node classification utilizing text-based node attributes has many real-w...
research
07/22/2019

Realistic Channel Models Pre-training

In this paper, we propose a neural-network-based realistic channel model...
research
06/26/2019

Enhancing PIO Element Detection in Medical Text Using Contextualized Embedding

In this paper, we investigate a new approach to Population, Intervention...

Please sign up or login with your details

Forgot password? Click here to reset