Nested Hierarchical Dirichlet Processes for Multi-Level Non-Parametric Admixture Modeling

08/26/2015
by   Lavanya Sita Tekumalla, et al.
0

Dirichlet Process(DP) is a Bayesian non-parametric prior for infinite mixture modeling, where the number of mixture components grows with the number of data items. The Hierarchical Dirichlet Process (HDP), is an extension of DP for grouped data, often used for non-parametric topic modeling, where each group is a mixture over shared mixture densities. The Nested Dirichlet Process (nDP), on the other hand, is an extension of the DP for learning group level distributions from data, simultaneously clustering the groups. It allows group level distributions to be shared across groups in a non-parametric setting, leading to a non-parametric mixture of mixtures. The nCRF extends the nDP for multilevel non-parametric mixture modeling, enabling modeling topic hierarchies. However, the nDP and nCRF do not allow sharing of distributions as required in many applications, motivating the need for multi-level non-parametric admixture modeling. We address this gap by proposing multi-level nested HDPs (nHDP) where the base distribution of the HDP is itself a HDP at each level thereby leading to admixtures of admixtures at each level. Because of couplings between various HDP levels, scaling up is naturally a challenge during inference. We propose a multi-level nested Chinese Restaurant Franchise (nCRF) representation for the nested HDP, with which we outline an inference algorithm based on Gibbs Sampling. We evaluate our model with the two level nHDP for non-parametric entity topic modeling where an inner HDP creates a countably infinite set of topic mixtures and associates them with author entities, while an outer HDP associates documents with these author entities. In our experiments on two real world research corpora, the nHDP is able to generalize significantly better than existing models and detect missing author entities with a reasonable level of accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2012

A simple non-parametric Topic Mixture for Authors and Documents

This article reviews the Author-Topic Model and presents a new non-param...
research
01/09/2014

Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts

We present a Bayesian nonparametric framework for multilevel clustering ...
research
04/19/2015

Exploring Bayesian Models for Multi-level Clustering of Hierarchically Grouped Sequential Data

A wide range of Bayesian models have been proposed for data that is divi...
research
04/16/2016

Smoothed Hierarchical Dirichlet Process: A Non-Parametric Approach to Constraint Measures

Time-varying mixture densities occur in many scenarios, for example, the...
research
10/13/2014

Mining Block I/O Traces for Cache Preloading with Sparse Temporal Non-parametric Mixture of Multivariate Poisson

Existing caching strategies, in the storage domain, though well suited t...
research
01/16/2013

A Nested HDP for Hierarchical Topic Models

We develop a nested hierarchical Dirichlet process (nHDP) for hierarchic...
research
05/27/2022

Inference and Sampling for Archimax Copulas

Understanding multivariate dependencies in both the bulk and the tails o...

Please sign up or login with your details

Forgot password? Click here to reset