Syntactic Topic Models

02/25/2010
by   Jordan Boyd-Graber, et al.
0

The syntactic topic model (STM) is a Bayesian nonparametric model of language that discovers latent distributions of words (topics) that are both semantically and syntactically coherent. The STM models dependency parsed corpora where sentences are grouped into documents. It assumes that each word is drawn from a latent topic chosen by combining document-level features and the local syntactic context. Each document has a distribution over latent topics, as in topic models, which provides the semantic consistency. Each element in the dependency parse tree also has a distribution over the topics of its children, as in latent-state syntax models, which provides the syntactic consistency. These distributions are convolved so that the topic of each word is likely under both its document and syntactic context. We derive a fast posterior inference algorithm based on variational methods. We report qualitative and quantitative studies on both synthetic data and hand-parsed documents. We show that the STM is a more predictive model of language than current models based only on syntax or only on topics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2021

Concentrated Document Topic Model

We propose a Concentrated Document Topic Model(CDTM) for unsupervised te...
research
03/15/2022

FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

Syntax is a fundamental component of language, yet few metrics have been...
research
09/29/2019

Lifelong Neural Topic Learning in Contextualized Autoregressive Topic Models of Language via Informative Transfers

Topic models such as LDA, DocNADE, iDocNADEe have been popular in docume...
research
01/05/2013

A New Geometric Approach to Latent Topic Modeling and Discovery

A new geometrically-motivated algorithm for nonnegative matrix factoriza...
research
11/05/2016

TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency

In this paper, we propose TopicRNN, a recurrent neural network (RNN)-bas...
research
10/03/2007

The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies

We present the nested Chinese restaurant process (nCRP), a stochastic pr...
research
06/05/2018

Explaining Away Syntactic Structure in Semantic Document Representations

Most generative document models act on bag-of-words input in an attempt ...

Please sign up or login with your details

Forgot password? Click here to reset