Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

09/16/2023
by   Xiangru Tang, et al.
0

Despite the power of Large Language Models (LLMs) like GPT-4, they still struggle with tasks that require generating complex, structured outputs. In this study, we assess the capability of Current LLMs in generating complex structured data and propose a structure-aware fine-tuning approach as a solution to improve this ability. To perform a comprehensive evaluation, we propose Struc-Bench, include five representative LLMs (i.e., GPT-NeoX 20B, GPT-3.5, GPT-4, and Vicuna) and evaluate them on our carefully constructed datasets spanning raw text, HTML, and LaTeX tables. Based on our analysis of current model performance, we identify specific common formatting errors and areas of potential improvement. To address complex formatting requirements, we utilize FormatCoT (Chain-of-Thought) to generate format instructions from target outputs. Our experiments show that our structure-aware fine-tuning method, when applied to LLaMA-7B, significantly improves adherence to natural language constraints, outperforming other evaluated LLMs. Based on these results, we present an ability map of model capabilities from six dimensions (i.e., coverage, formatting, reasoning, comprehension, pragmatics, and hallucination). This map highlights the weaknesses of LLMs in handling complex structured outputs and suggests promising directions for future work. Our code and models can be found at https://github.com/gersteinlab/Struc-Bench.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2022

Large Language Models Are Reasoning Teachers

Language models (LMs) have demonstrated remarkable performance on downst...
research
10/20/2022

Large Language Models Can Self-Improve

Large Language Models (LLMs) have achieved excellent performances in var...
research
09/15/2023

Reward Engineering for Generating Semi-structured Explanation

Semi-structured explanation depicts the implicit process of a reasoner w...
research
04/26/2023

Exploring the Curious Case of Code Prompts

Recent work has shown that prompting language models with code-like repr...
research
08/03/2023

Local Large Language Models for Complex Structured Medical Tasks

This paper introduces an approach that combines the language reasoning c...
research
09/15/2021

On the Universality of Deep COntextual Language Models

Deep Contextual Language Models (LMs) like ELMO, BERT, and their success...
research
09/07/2023

Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty

Open Information Extraction (OIE) task aims at extracting structured fac...

Please sign up or login with your details

Forgot password? Click here to reset