TopicsRanksDC: Distance-based Topic Ranking applied on Two-Class Data

05/17/2021
by   Malik Yousef, et al.
0

In this paper, we introduce a novel approach named TopicsRanksDC for topics ranking based on the distance between two clusters that are generated by each topic. We assume that our data consists of text documents that are associated with two-classes. Our approach ranks each topic contained in these text documents by its significance for separating the two-classes. Firstly, the algorithm detects topics using Latent Dirichlet Allocation (LDA). The words defining each topic are represented as two clusters, where each one is associated with one of the classes. We compute four distance metrics, Single Linkage, Complete Linkage, Average Linkage and distance between the centroid. We compare the results of LDA topics and random topics. The results show that the rank for LDA topics is much higher than random topics. The results of TopicsRanksDC tool are promising for future work to enable search engines to suggest related topics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/24/2018

Measuring LDA Topic Stability from Clusters of Replicated Runs

Background: Unstructured and textual data is increasing rapidly and Late...
research
07/23/2022

A Data-driven Latent Semantic Analysis for Automatic Text Summarization using LDA Topic Modelling

With the advent and popularity of big data mining and huge text analysis...
research
08/04/2017

A network approach to topic models

One of the main computational and scientific challenges in the modern ag...
research
06/02/2016

Source-LDA: Enhancing probabilistic topic models using prior knowledge sources

A popular approach to topic modeling involves extracting co-occurring n-...
research
12/20/2018

Recommendation System based on Semantic Scholar Mining and Topic modeling: A behavioral analysis of researchers from six conferences

Recommendation systems have an important place to help online users in t...
research
02/03/2014

A high-reproducibility and high-accuracy method for automated topic classification

Much of human knowledge sits in large databases of unstructured text. Le...
research
07/18/2018

Latent Dirichlet Allocation (LDA) for Topic Modeling of the CFPB Consumer Complaints

A text mining approach is proposed based on latent Dirichlet allocation ...

Please sign up or login with your details

Forgot password? Click here to reset