ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

09/17/2023
by   Ian Arawjo, et al.
0

Evaluating outputs of large language models (LLMs) is challenging, requiring making – and making sense of – many responses. Yet tools that go beyond basic prompting tend to require knowledge of programming APIs, focus on narrow domains, or are closed-source. We present ChainForge, an open-source visual toolkit for prompt engineering and on-demand hypothesis testing of text generation LLMs. ChainForge provides a graphical interface for comparison of responses across models and prompt variations. Our system was designed to support three tasks: model selection, prompt template design, and hypothesis testing (e.g., auditing). We released ChainForge early in its development and iterated on its design with academics and online users. Through in-lab and interview studies, we find that a range of people could use ChainForge to investigate hypotheses that matter to them, including in real-world settings. We identify three modes of prompt engineering and LLM hypothesis testing: opportunistic exploration, limited evaluation, and iterative refinement.

READ FULL TEXT
research
01/31/2022

On Sub-optimality of Random Binning for Distributed Hypothesis Testing

We investigate the quantize and binning scheme, known as the Shimokawa-H...
research
11/16/2020

Using Ordinal Data to Assess Distance Learning

There is some disagreement on whether Likert scale data should be treate...
research
09/08/2008

Predictive Hypothesis Identification

While statistics focusses on hypothesis testing and on estimating (prope...
research
01/19/2019

Custodes: Auditable Hypothesis Testing

We present Custodes: a new approach to solving the complex issue of prev...
research
02/03/2020

CoTK: An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation

In text generation evaluation, many practical issues, such as inconsiste...
research
10/31/2018

A tutorial on MDL hypothesis testing for graph analysis

This document provides a tutorial description of the use of the MDL prin...

Please sign up or login with your details

Forgot password? Click here to reset