Computing Web-scale Topic Models using an Asynchronous Parameter Server

05/24/2016
by   Rolf Jagerman, et al.
0

Topic models such as Latent Dirichlet Allocation (LDA) have been widely used in information retrieval for tasks ranging from smoothing and feedback methods to tools for exploratory search and discovery. However, classical methods for inferring topic models do not scale up to the massive size of today's publicly available Web-scale data sets. The state-of-the-art approaches rely on custom strategies, implementations and hardware to facilitate their asynchronous, communication-intensive workloads. We present APS-LDA, which integrates state-of-the-art topic modeling with cluster computing frameworks such as Spark using a novel asynchronous parameter server. Advantages of this integration include convenient usage of existing data processing pipelines and eliminating the need for disk writes as data can be kept in memory from start to finish. Our goal is not to outperform highly customized implementations, but to propose a general high-performance topic modeling framework that can easily be used in today's data processing pipelines. We compare APS-LDA to the existing Spark LDA implementations and show that our system can, on a 480-core cluster, process up to 135 times more data and 10 times more topics without sacrificing model quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2011

Using Variational Inference and MapReduce to Scale Topic Modeling

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique ...
research
04/12/2017

Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler

Latent Dirichlet Allocation (LDA) is a topic model widely used in natura...
research
10/18/2021

Uncertainty-aware Topic Modeling Visualization

Topic modeling is a state-of-the-art technique for analyzing text corpor...
research
11/17/2013

Towards Big Topic Modeling

To solve the big topic modeling problem, we need to reduce both time and...
research
09/18/2023

A Novel Method of Fuzzy Topic Modeling based on Transformer Processing

Topic modeling is admittedly a convenient way to monitor markets trend. ...
research
08/11/2018

Familia: A Configurable Topic Modeling Framework for Industrial Text Engineering

In the last decade, a variety of topic models have been proposed for tex...
research
12/04/2014

LightLDA: Big Topic Models on Modest Compute Clusters

When building large-scale machine learning (ML) programs, such as big to...

Please sign up or login with your details

Forgot password? Click here to reset