DAPPER: Scaling Dynamic Author Persona Topic Model to Billion Word Corpora

11/03/2018
by   Robert Giaquinto, et al.
0

Extracting common narratives from multi-author dynamic text corpora requires complex models, such as the Dynamic Author Persona (DAP) topic model. However, such models are complex and can struggle to scale to large corpora, often because of challenging non-conjugate terms. To overcome such challenges, in this paper we adapt new ideas in approximate inference to the DAP model, resulting in the DAP Performed Exceedingly Rapidly (DAPPER) topic model. Specifically, we develop Conjugate-Computation Variational Inference (CVI) based variational Expectation-Maximization (EM) for learning the model, yielding fast, closed form updates for each document, replacing iterative optimization in earlier work. Our results show significant improvements in model fit and training time without needing to compromise the model's temporal structure or the application of Regularized Variation Inference (RVI). We demonstrate the scalability and effectiveness of the DAPPER model by extracting health journeys from the CaringBridge corpus --- a collection of 9 million journals written by 200,000 authors during health crises.

READ FULL TEXT
research
01/15/2018

Topic Modeling on Health Journals with Regularized Variational Inference

Topic modeling enables exploration and compact representation of a corpu...
research
06/13/2012

Continuous Time Dynamic Topic Models

In this paper, we develop the continuous time dynamic topic model (cDTM)...
research
07/20/2018

Finding Structure in Dynamic Networks

This document is the first part of the author's habilitation thesis (HDR...
research
03/23/2015

On some provably correct cases of variational inference for topic models

Variational inference is a very efficient and popular heuristic used in ...
research
03/30/2015

Infinite Author Topic Model based on Mixed Gamma-Negative Binomial Process

Incorporating the side information of text corpus, i.e., authors, time s...
research
09/21/2016

Bibliographic Analysis on Research Publications using Authors, Categorical Labels and the Citation Network

Bibliographic analysis considers the author's research areas, the citati...
research
01/02/2021

A Multilayer Correlated Topic Model

We proposed a novel multilayer correlated topic model (MCTM) to analyze ...

Please sign up or login with your details

Forgot password? Click here to reset