STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation

10/04/2020
by   Nader Akoury, et al.
0

Systems for story generation are asked to produce plausible and enjoyable stories given an input context. This task is underspecified, as a vast number of diverse stories can originate from a single input. The large output space makes it difficult to build and evaluate story generation models, as (1) existing datasets lack rich enough contexts to meaningfully guide models, and (2) existing evaluations (both crowdsourced and automatic) are unreliable for assessing long-form creative text. To address these issues, we introduce a dataset and evaluation platform built from STORIUM, an online collaborative storytelling community. Our author-generated dataset contains 6K lengthy stories (125M tokens) with fine-grained natural language annotations (e.g., character goals and attributes) interspersed throughout each narrative, forming a robust source for guiding models. We evaluate language models fine-tuned on our dataset by integrating them onto STORIUM, where real authors can query a model for suggested story continuations and then edit them. Automatic metrics computed over these edits correlate well with both user ratings of generated stories and qualitative feedback from semi-structured user interviews. We release both the STORIUM dataset and evaluation platform to spur more principled research into story generation.

READ FULL TEXT

page 2

page 5

page 12

page 14

research
01/04/2021

Outline to Story: Fine-grained Controllable Story Generation from Cascaded Events

Large-scale pretrained language models have shown thrilling generation c...
research
09/21/2020

Content Planning for Neural Story Generation with Aristotelian Rescoring

Long-form narrative text generated from large language models manages a ...
research
03/15/2023

DeltaScore: Evaluating Story Generation with Differentiating Perturbations

Various evaluation metrics exist for natural language generation tasks, ...
research
10/16/2022

StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning

Existing automatic story evaluation methods place a premium on story lex...
research
03/23/2021

Plug-and-Blend: A Framework for Controllable Story Generation with Blended Control Codes

We describe a Plug-and-Play controllable language generation framework, ...
research
08/14/2023

Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation

Fine-grained, span-level human evaluation has emerged as a reliable and ...
research
11/20/2020

Collaborative Storytelling with Large-scale Neural Language Models

Storytelling plays a central role in human socializing and entertainment...

Please sign up or login with your details

Forgot password? Click here to reset