Visualizing Linguistic Diversity of Text Datasets Synthesized by Large Language Models

05/19/2023
by   Emily Reif, et al.
0

Large language models (LLMs) can be used to generate smaller, more refined datasets via few-shot prompting for benchmarking, fine-tuning or other use cases. However, understanding and evaluating these datasets is difficult, and the failure modes of LLM-generated data are still not well understood. Specifically, the data can be repetitive in surprising ways, not only semantically but also syntactically and lexically. We present LinguisticLens, a novel inter-active visualization tool for making sense of and analyzing syntactic diversity of LLM-generated datasets. LinguisticLens clusters text along syntactic, lexical, and semantic axes. It supports hierarchical visualization of a text dataset, allowing users to quickly scan for an overview and inspect individual examples. The live demo is available at shorturl.at/zHOUV.

READ FULL TEXT

page 1

page 3

research
11/18/2021

How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN

Current language models can generate high-quality text. Are they simply ...
research
04/18/2021

GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation

Large-scale language models such as GPT-3 are excellent few-shot learner...
research
09/30/2019

Lexical Features Are More Vulnerable, Syntactic Features Have More Predictive Power

Understanding the vulnerability of linguistic features extracted from no...
research
06/13/2023

NoCoLA: The Norwegian Corpus of Linguistic Acceptability

While there has been a surge of large language models for Norwegian in r...
research
06/07/2023

Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions

Large language models (LLMs) can be used to generate text data for train...
research
10/12/2022

Perplexity from PLM Is Unreliable for Evaluating Text Quality

Recently, amounts of works utilize perplexity (PPL) to evaluate the qual...
research
02/07/2022

Red Teaming Language Models with Language Models

Language Models (LMs) often cannot be deployed because of their potentia...

Please sign up or login with your details

Forgot password? Click here to reset