Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models

09/19/2023
by   Hyung-Kwon Ko, et al.
0

We introduce a Large Language Model (LLM) framework that generates rich and diverse NL datasets using only Vega-Lite specifications as input, thereby streamlining the development of Natural Language Interfaces (NLIs) for data visualization. We propose two techniques to synthesize relevant chart semantics accurately and enhance syntactic diversity in each NL dataset, respectively: 1) a guided discovery incorporated into prompting so that LLMs can steer themselves to create varying NL datasets in a self-directed manner; 2) a score-based paraphrasing to augment NL syntax along with four well-defined language axes. We also present a new chart collection of 1,981 real-world Vega-Lite specifications that have increased diversity and complexity compared to benchmarks, to demonstrate the generalizability of our framework. The experimental results show that our framework accurately extracts chart semantics and generates L1/L2 captions with 89.4 respectively, while generating and paraphrasing utterances and questions with greater diversity than benchmarks. The codes and chart collection are available at https://github.com/hyungkwonko/chart-llm.

READ FULL TEXT

page 5

page 9

research
03/06/2023

LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models

Systems that support users in the automatic creation of visualizations m...
research
10/01/2021

Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations

Natural language interfaces (NLIs) for data visualization are becoming i...
research
08/24/2020

NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries

Natural language interfaces (NLIs) have shown great promise for visual d...
research
02/02/2022

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

PromptSource is a system for creating, sharing, and using natural langua...
research
02/11/2022

NALABS: Detecting Bad Smells in Natural Language Requirements and Test Specifications

In large-scale embedded system development, requirement and test specifi...
research
05/02/2020

Benchmarking Multimodal Regex Synthesis with Complex Structures

Existing datasets for regular expression (regex) generation from natural...
research
08/17/2023

Language-enhanced RNR-Map: Querying Renderable Neural Radiance Field maps with natural language

We present Le-RNR-Map, a Language-enhanced Renderable Neural Radiance ma...

Please sign up or login with your details

Forgot password? Click here to reset