Towards Big Topic Modeling

11/17/2013
by   Jian-Feng Yan, et al.
0

To solve the big topic modeling problem, we need to reduce both time and space complexities of batch latent Dirichlet allocation (LDA) algorithms. Although parallel LDA algorithms on the multi-processor architecture have low time and space complexities, their communication costs among processors often scale linearly with the vocabulary size and the number of topics, leading to a serious scalability problem. To reduce the communication complexity among processors for a better scalability, we propose a novel communication-efficient parallel topic modeling architecture based on power law, which consumes orders of magnitude less communication time when the number of topics is large. We combine the proposed communication-efficient parallel architecture with the online belief propagation (OBP) algorithm referred to as POBP for big topic modeling tasks. Extensive empirical results confirm that POBP has the following advantages to solve the big topic modeling problem: 1) high accuracy, 2) communication-efficient, 3) fast speed, and 4) constant memory usage when compared with recent state-of-the-art parallel LDA algorithms on the multi-processor architecture.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2017

Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler

Latent Dirichlet Allocation (LDA) is a topic model widely used in natura...
research
10/08/2016

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discre...
research
11/10/2014

Model-Parallel Inference for Big Topic Models

In real world industrial applications of topic modeling, the ability to ...
research
01/24/2013

Transfer Topic Modeling with Ease and Scalability

The increasing volume of short texts generated on social media sites, su...
research
05/24/2016

Computing Web-scale Topic Models using an Asynchronous Parameter Server

Topic models such as Latent Dirichlet Allocation (LDA) have been widely ...
research
10/21/2015

High Performance Latent Variable Models

Latent variable models have accumulated a considerable amount of interes...
research
10/22/2015

Multi-GPU Distributed Parallel Bayesian Differential Topic Modelling

There is an explosion of data, documents, and other content, and people ...

Please sign up or login with your details

Forgot password? Click here to reset