Scalable Text and Link Analysis with Mixed-Topic Link Models

03/28/2013
by   Yaojia Zhu, et al.
0

Many data sets contain rich information about objects, as well as pairwise relations between them. For instance, in networks of websites, scientific papers, and other documents, each node has content consisting of a collection of words, as well as hyperlinks or citations to other nodes. In order to perform inference on such data sets, and make predictions and recommendations, it is useful to have models that are able to capture the processes which generate the text at each node and the links between them. In this paper, we combine classic ideas in topic modeling with a variant of the mixed-membership block model recently developed in the statistical physics community. The resulting model has the advantage that its parameters, including the mixture of topics of each document and the resulting overlapping communities, can be inferred with a simple and scalable expectation-maximization algorithm. We test our model on three data sets, performing unsupervised topic classification and link prediction. For both tasks, our model outperforms several existing state-of-the-art methods, achieving higher accuracy with significantly less computation, analyzing a data set with 1.3 million words and 44 thousand links in a few minutes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2018

Discovering Hidden Topical Hubs and Authorities in Online Social Networks

Finding influential users in online social networks is an important prob...
research
01/21/2020

Random-walk Based Generative Model for Classifying Document Networks

Document networks are found in various collections of real-world data, s...
research
02/09/2017

Memetic search for overlapping topics based on a local evaluation of link communities

In spite of recent advances in field delineation methods, bibliometricia...
research
02/21/2020

Struct-MMSB: Mixed Membership Stochastic Blockmodels with Interpretable Structured Priors

The mixed membership stochastic blockmodel (MMSB) is a popular framework...
research
01/17/2017

From Community Detection to Community Profiling

Most existing community-related studies focus on detection, which aim to...
research
09/13/2014

A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data

Topic modeling based on latent Dirichlet allocation (LDA) has been a fra...
research
02/02/2023

Causal Lifting and Link Prediction

Current state-of-the-art causal models for link prediction assume an und...

Please sign up or login with your details

Forgot password? Click here to reset