Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ

03/02/2017
by   Jason S. Kessler, et al.
0

Scattertext is an open source tool for visualizing linguistic variation between document categories in a language-independent way. The tool presents a scatterplot, where each axis corresponds to the rank-frequency a term occurs in a category of documents. Through a tie-breaking strategy, the tool is able to display thousands of visible term-representing points and find space to legibly label hundreds of them. Scattertext also lends itself to a query-based visualization of how the use of terms with similar embeddings differs between document categories, as well as a visualization for comparing the importance scores of bag-of-words features to univariate metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2017

Remedies against the Vocabulary Gap in Information Retrieval

Search engines rely heavily on term-based approaches that represent quer...
research
11/21/2022

JSON Stats Analyzer

In this paper, we present the JSON Stats Analyzer, a free-to-use open-so...
research
03/11/2020

ConceptScope: Organizing and Visualizing Knowledge in Documents based on Domain Ontology

Current text visualization techniques typically provide overviews of doc...
research
11/23/2017

Open Evaluation Tool for Layout Analysis of Document Images

This paper presents an open tool for standardizing the evaluation proces...
research
05/03/2013

Feature Selection Based on Term Frequency and T-Test for Text Categorization

Much work has been done on feature selection. Existing methods are based...
research
09/09/2021

Worbel: Aggregating Point Labels into Word Clouds

Point feature labeling is a classical problem in cartography and GIS tha...

Please sign up or login with your details

Forgot password? Click here to reset