Author2Vec: A Framework for Generating User Embedding
Online forums and social media platforms provide noisy but valuable data every day. In this paper, we propose a novel end-to-end neural network-based user embedding system, Author2Vec. The model incorporates sentence representations generated by BERT (Bidirectional Encoder Representations from Transformers) with a novel unsupervised pre-training objective, authorship classification, to produce better user embedding that encodes useful user-intrinsic properties. This user embedding system was pre-trained on post data of 10k Reddit users and was analyzed and evaluated on two user classification benchmarks: depression detection and personality classification, in which the model proved to outperform traditional count-based and prediction-based methods. We substantiate that Author2Vec successfully encoded useful user attributes and the generated user embedding performs well in downstream classification tasks without further finetuning.
READ FULL TEXT