Text Network Exploration via Heterogeneous Web of Topics

10/02/2016
by   Junxian He, et al.
0

A text network refers to a data type that each vertex is associated with a text document and the relationship between documents is represented by edges. The proliferation of text networks such as hyperlinked webpages and academic citation networks has led to an increasing demand for quickly developing a general sense of a new text network, namely text network exploration. In this paper, we address the problem of text network exploration through constructing a heterogeneous web of topics, which allows people to investigate a text network associating word level with document level. To achieve this, a probabilistic generative model for text and links is proposed, where three different relationships in the heterogeneous topic web are quantified. We also develop a prototype demo system named TopicAtlas to exhibit such heterogeneous topic web, and demonstrate how this system can facilitate the task of text network exploration. Extensive qualitative analyses are included to verify the effectiveness of this heterogeneous topic web. Besides, we validate our model on real-life text networks, showing that it preserves good performance on objective evaluation metrics.

READ FULL TEXT
research
10/27/2011

TopicViz: Semantic Navigation of Document Collections

When people explore and manage information, they think in terms of topic...
research
04/13/2023

G2T: A Simple but Effective Framework for Topic Modeling based on Pretrained Language Model and Community Detection

It has been reported that clustering-based topic models, which cluster h...
research
04/16/2021

Hierarchical Topic Presence Models

Topic models analyze text from a set of documents. Documents are modeled...
research
01/21/2020

Random-walk Based Generative Model for Classifying Document Networks

Document networks are found in various collections of real-world data, s...
research
11/10/2017

Joint Sentiment/Topic Modeling on Text Data Using Boosted Restricted Boltzmann Machine

Recently by the development of the Internet and the Web, different types...
research
05/10/2018

hyperdoc2vec: Distributed Representations of Hypertext Documents

Hypertext documents, such as web pages and academic papers, are of great...
research
01/04/2016

Scalable Models for Computing Hierarchies in Information Networks

Information hierarchies are organizational structures that often used to...

Please sign up or login with your details

Forgot password? Click here to reset