The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies

10/03/2007
by   David M. Blei, et al.
0

We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitely-deep, infinitely-branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning--the use of Bayesian nonparametric methods to infer distributions on flexible data structures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2012

Nested Hierarchical Dirichlet Processes

We develop a nested hierarchical Dirichlet process (nHDP) for hierarchic...
research
01/16/2013

A Nested HDP for Hierarchical Topic Models

We develop a nested hierarchical Dirichlet process (nHDP) for hierarchic...
research
02/25/2010

Syntactic Topic Models

The syntactic topic model (STM) is a Bayesian nonparametric model of lan...
research
02/23/2017

Scalable Inference for Nested Chinese Restaurant Process Topic Models

Nested Chinese Restaurant Process (nCRP) topic models are powerful nonpa...
research
01/04/2016

Scalable Models for Computing Hierarchies in Information Networks

Information hierarchies are organizational structures that often used to...
research
04/16/2021

Hierarchical Topic Presence Models

Topic models analyze text from a set of documents. Documents are modeled...
research
08/17/2020

A Common Atom Model for the Bayesian Nonparametric Analysis of Nested Data

The use of high-dimensional data for targeted therapeutic interventions ...

Please sign up or login with your details

Forgot password? Click here to reset