Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models

06/06/2019
by   Alexander Terenin, et al.
0

Nonparametric extensions of topic models such as Latent Dirichlet Allocation, including Hierarchical Dirichlet Process (HDP), are often studied in natural language processing. Training these models generally requires use of serial algorithms, which limits scalability to large data sets and complicates acceleration via use of parallel and distributed systems. Most current approaches to scalable training of such models either don't converge to the correct target, or are not data-parallel. Moreover, these approaches generally do not utilize all available sources of sparsity found in natural language - an important way to make computation efficient. Based upon a representation of certain conditional distributions within an HDP, we propose a doubly sparse data-parallel sampler for the HDP topic model that addresses these issues. We benchmark our method on a well-known corpora (PubMed) with 8m documents and 768m tokens, using a single multi-core machine in under three days.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2017

Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler

Latent Dirichlet Allocation (LDA) is a topic model widely used in natura...
research
06/11/2015

Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models

Topic models, and more specifically the class of Latent Dirichlet Alloca...
research
01/08/2023

Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method

Topic Modelling (TM) is from the research branches of natural language u...
research
10/21/2011

Kernel Topic Models

Latent Dirichlet Allocation models discrete data as a mixture of discret...
research
03/21/2021

Posterior distributions for Hierarchical Spike and Slab Indian Buffet processes

Bayesian nonparametric hierarchical priors are highly effective in provi...
research
11/27/2019

Conditional Hierarchical Bayesian Tucker Decomposition

Our research focuses on studying and developing methods for reducing the...
research
11/24/2018

Latent Dirichlet Allocation with Residual Convolutional Neural Network Applied in Evaluating Credibility of Chinese Listed Companies

This project demonstrated a methodology to estimating cooperate credibil...

Please sign up or login with your details

Forgot password? Click here to reset