VisImages: A Large-scale, High-quality Image Corpus in Visualization Publications

07/09/2020
by   Dazhen Deng, et al.
0

Images in visualization publications contain rich information, such as novel visual designs, model details, and experiment results. Constructing such an image corpus can contribute to the community in many aspects, including literature analysis from the perspective of visual representations, empirical studies on visual memorability, and machine learning research for chart detection. This study presents VisImages, a high-quality and large-scale image corpus collected from visualization publications. VisImages contain fruitful and diverse annotations for each image, including captions, types of visual representations, and bounding boxes. First, we algorithmically extract the images associated with captions and manually correct the errors. Second, to categorize visualizations in publications, we extend and iteratively refine the existing taxonomy through a multi-round pilot study. Third, guided by this taxonomy, we invite senior visualization practitioners to annotate visual representations that appear in each image. In this process, we borrow techniques such as "gold standards" and majority voting for quality control. Finally, we recruit the crowd to draw bounding boxes for visual representations in the images. The resulting corpus contains 35,096 annotated visualizations from 12,267 images with 12,057 captions in 1397 papers from VAST and InfoVis. We demonstrate the usefulness of VisImages through the following four use cases: 1) analysis of color usage in VAST and InfoVis papers across years, 2) discussion of the researcher preference on visualization types, 3) spatial distribution analysis of visualizations in visual analytic systems, and 4) training visualization detection models.

READ FULL TEXT

page 2

page 5

page 6

page 7

page 8

page 9

research
03/20/2022

Revisiting the Design Patterns of Composite Visualizations

Composite visualization is a popular design strategy that represents com...
research
09/15/2022

Not As Easy As You Think – Experiences and Lessons Learnt from Trying to Create a Bottom-Up Visualization Image Typology

We present and discuss the results of a two-year qualitative analysis of...
research
11/02/2018

The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale

We present Open Images V4, a dataset of 9.2M images with unified annotat...
research
08/28/2023

Eleven Years of Gender Data Visualization: A Step Towards More Inclusive Gender Representation

We present an analysis of the representation of gender as a data dimensi...
research
06/09/2023

A Qualitative Analysis of Common Practices in Annotations: A Taxonomy and Design Space

Annotations are a vital component of data externalization and collaborat...
research
12/22/2020

VIS30K: A Collection of Figures and Tables from IEEE Visualization Conference Publications

We present the VIS30K dataset, a collection of 29,689 images that repres...
research
02/11/2019

Net2Vis: Transforming Deep Convolutional Networks into Publication-Ready Visualizations

To properly convey neural network architectures in publications, appropr...

Please sign up or login with your details

Forgot password? Click here to reset