Discovering topics in text datasets by visualizing relevant words

07/18/2017
by   Franziska Horn, et al.
0

When dealing with large collections of documents, it is imperative to quickly get an overview of the texts' contents. In this paper we show how this can be achieved by using a clustering algorithm to identify topics in the dataset and then selecting and visualizing relevant words, which distinguish a group of documents from the rest of the texts, to summarize the contents of the documents belonging to each topic. We demonstrate our approach by discovering trending topics in a collection of New York Times article snippets.

READ FULL TEXT
research
07/17/2017

Exploring text datasets by visualizing relevant words

When working with a new dataset, it is important to first explore and fa...
research
11/25/2019

Discovering topics with neural topic models built from PLSA assumptions

In this paper we present a model for unsupervised topic discovery in tex...
research
01/16/2014

Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback

While traditional research on text clustering has largely focused on gro...
research
01/06/2020

Topic Extraction of Crawled Documents Collection using Correlated Topic Model in MapReduce Framework

The tremendous increase in the amount of available research documents im...
research
01/28/2022

Probably Reasonable Search in eDiscovery

In eDiscovery, a party to a lawsuit or similar action must search throug...
research
11/12/2021

Dataset of Philippine Presidents Speeches from 1935 to 2016

The dataset was collected to examine and identify possible key topics wi...
research
11/02/2015

Spatial Semantic Scan: Jointly Detecting Subtle Events and their Spatial Footprint

Many methods have been proposed for detecting emerging events in text st...

Please sign up or login with your details

Forgot password? Click here to reset