The Author-Topic Model for Authors and Documents

07/11/2012
by   Michal Rosen-Zvi, et al.
0

We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. A document with multiple authors is modeled as a distribution over topics that is a mixture of the distributions associated with the authors. We apply the model to a collection of 1,700 NIPS conference papers and 160,000 CiteSeer abstracts. Exact inference is intractable for these datasets and we use Gibbs sampling to estimate the topic and author distributions. We compare the performance with two other generative models for documents, which are special cases of the author-topic model: LDA (a topic model) and a simple author model in which each author is associated with a distribution over words rather than a distribution over topics. We show topics recovered by the author-topic model, and demonstrate applications to computing similarity between authors and entropy of author output.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2021

Concentrated Document Topic Model

We propose a Concentrated Document Topic Model(CDTM) for unsupervised te...
research
10/05/2018

Clust-LDA: Joint Model for Text Mining and Author Group Inference

Social media corpora pose unique challenges and opportunities, including...
research
11/30/2016

Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge

While generative models such as Latent Dirichlet Allocation (LDA) have p...
research
10/30/2019

Higher Criticism for Discriminating Word-Frequency Tables and Testing Authorship

We adapt the Higher Criticism (HC) goodness-of-fit test to detect change...
research
06/09/2019

Crypto art: A decentralized view

This is a decentralized position paper on crypto art, which includes vie...
research
03/01/2022

Topic Analysis for Text with Side Data

Although latent factor models (e.g., matrix factorization) obtain good p...
research
01/15/2018

Topic Modeling on Health Journals with Regularized Variational Inference

Topic modeling enables exploration and compact representation of a corpu...

Please sign up or login with your details

Forgot password? Click here to reset