TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks

While LLMs have shown great success in understanding and generating text in traditional conversational settings, their potential for performing ill-defined complex tasks is largely under-studied. Indeed, we are yet to conduct comprehensive benchmarking studies with multiple LLMs that are exclusively focused on a complex task. However, conducting such benchmarking studies is challenging because of the large variations in LLMs' performance when different prompt types/styles are used and different degrees of detail are provided in the prompts. To address this issue, the paper proposes a general taxonomy that can be used to design prompts with specific properties in order to perform a wide range of complex tasks. This taxonomy will allow future benchmarking studies to report the specific categories of prompts used as part of the study, enabling meaningful comparisons across different studies. Also, by establishing a common standard through this taxonomy, researchers will be able to draw more accurate conclusions about LLMs' performance on a specific complex task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/26/2019

An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations

K-Means is one of the most used algorithms for data clustering and the u...
research
12/04/2015

Toward a Taxonomy and Computational Models of Abnormalities in Images

The human visual system can spot an abnormal image, and reason about wha...
research
11/23/2020

Studying Taxonomy Enrichment on Diachronic WordNet Versions

Ontologies, taxonomies, and thesauri are used in many NLP tasks. However...
research
07/30/2019

What should I document? A preliminary systematic mapping study into API documentation knowledge

Background: Good API documentation facilities the development process, i...
research
09/20/2023

Visualizing Comparisons of Bills of Materials

Data analysis often involves the comparison of complex objects. With the...
research
12/11/2018

Dockerization Impacts in Database Performance Benchmarking

Docker seems to be an attractive solution for cloud database benchmarkin...
research
12/03/2018

Essential guidelines for computational method benchmarking

In computational biology and other sciences, researchers are frequently ...

Please sign up or login with your details

Forgot password? Click here to reset