TopicSifter: Interactive Search Space Reduction Through Targeted Topic Modeling

07/28/2019
by   Hannah Kim, et al.
0

Topic modeling is commonly used to analyze and understand large document collections. However, in practice, users want to focus on specific aspects or "targets" rather than the entire corpus. For example, given a large collection of documents, users may want only a smaller subset which more closely aligns with their interests, tasks, and domains. In particular, our paper focuses on large-scale document retrieval with high recall where any missed relevant documents can be critical. A simple keyword matching search is generally not effective nor efficient as 1) it is difficult to find a list of keyword queries that can cover the documents of interest before exploring the dataset, 2) some documents may not contain the exact keywords of interest but may still be highly relevant, and 3) some words have multiple meanings, which would result in irrelevant documents included in the retrieved subset. In this paper, we present TopicSifter, a visual analytics system for interactive search space reduction. Our system utilizes targeted topic modeling based on nonnegative matrix factorization and allows users to give relevance feedback in order to refine their target and guide the topic modeling to the most relevant results.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 7

page 8

page 10

research
11/25/2019

FLATM: A Fuzzy Logic Approach Topic Model for Medical Documents

One of the challenges for text analysis in medical domains is analyzing ...
research
12/15/2020

Efficient Clustering from Distributions over Topics

There are many scenarios where we may want to find pairs of textually si...
research
01/22/2020

Keyword-based Topic Modeling and Keyword Selection

Certain type of documents such as tweets are collected by specifying a s...
research
07/11/2000

Two Steps Feature Selection and Neural Network Classification for the TREC-8 Routing

For the TREC-8 routing, one specific filter is built for each topic. Eac...
research
10/16/2018

A Retrieval Framework and Implementation for Electronic Documents with Similar Layouts

As the number of digital documents requiring investigation increases, it...
research
06/07/2019

A Tree Pattern Matching Algorithm for XML Queries with Structural Preferences

In the XML community, exact queries allow users to specify exactly what ...
research
05/03/2022

A Comparison of Approaches for Imbalanced Classification Problems in the Context of Retrieving Relevant Documents for an Analysis

One of the first steps in many text-based social science studies is to r...

Please sign up or login with your details

Forgot password? Click here to reset