A Rare Topic Discovery Model for Short Texts Based on Co-occurrence word Network

06/30/2022
by   Chengjie Ma, et al.
0

We provide a simple and general solution for the discovery of scarce topics in unbalanced short-text datasets, namely, a word co-occurrence network-based model CWIBTD, which can simultaneously address the sparsity and unbalance of short-text topics and attenuate the effect of occasional pairwise occurrences of words, allowing the model to focus more on the discovery of scarce topics. Unlike previous approaches, CWIBTD uses co-occurrence word networks to model the topic distribution of each word, which improves the semantic density of the data space and ensures its sensitivity in identify-ing rare topics by improving the way node activity is calculated and normal-izing scarce topics and large topics to some extent. In addition, using the same Gibbs sampling as LDA makes CWIBTD easy to be extended to vari-ous application scenarios. Extensive experimental validation in the unbal-anced short text dataset confirms the superiority of CWIBTD over the base-line approach in discovering rare topics. Our model can be used for early and accurate discovery of emerging topics or unexpected events on social platforms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/17/2014

Word Network Topic Model: A Simple but General Solution for Short and Imbalanced Texts

The short text has been the prevalent format for information of Internet...
research
05/21/2016

Latent Tree Models for Hierarchical Topic Detection

We present a novel method for hierarchical topic detection where topics ...
research
11/02/2015

Spatial Semantic Scan: Jointly Detecting Subtle Events and their Spatial Footprint

Many methods have been proposed for detecting emerging events in text st...
research
03/23/2021

TeCoMiner: Topic Discovery Through Term Community Detection

This note is a short description of TeCoMiner, an interactive tool for e...
research
10/19/2018

Conceptual Organization is Revealed by Consumer Activity Patterns

Meaning may arise from an element's role or interactions within a larger...
research
11/12/2021

Dataset of Philippine Presidents Speeches from 1935 to 2016

The dataset was collected to examine and identify possible key topics wi...
research
01/30/2018

Creative Exploration Using Topic Based Bisociative Networks

Bisociative knowledge discovery is an approach that combines elements fr...

Please sign up or login with your details

Forgot password? Click here to reset