TABLET: Learning From Instructions For Tabular Data

04/25/2023
by   Dylan Slack, et al.
0

Acquiring high-quality data is often a significant challenge in training machine learning (ML) models for tabular prediction, particularly in privacy-sensitive and costly domains like medicine and finance. Providing natural language instructions to large language models (LLMs) offers an alternative solution. However, it is unclear how effectively instructions leverage the knowledge in LLMs for solving tabular prediction problems. To address this gap, we introduce TABLET, a benchmark of 20 diverse tabular datasets annotated with instructions that vary in their phrasing, granularity, and technicality. Additionally, TABLET includes the instructions' logic and structured modifications to the instructions. We find in-context instructions increase zero-shot F1 performance for Flan-T5 11b by 44 ChatGPT on TABLET. Also, we explore the limitations of using LLMs for tabular prediction in our benchmark by evaluating instruction faithfulness. We find LLMs often ignore instructions and fail to predict specific instances correctly, even with examples. Our analysis on TABLET shows that, while instructions help LLM performance, learning from instructions for tabular data requires new capabilities.

READ FULL TEXT
research
12/21/2022

MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

Instruction tuning, a new learning paradigm that fine-tunes pre-trained ...
research
05/23/2023

Probing in Context: Toward Building Robust Classifiers via Probing Large Language Models

Large language models are able to learn new tasks in context, where they...
research
08/17/2023

Do you really follow me? Adversarial Instructions for Evaluating the Robustness of Large Language Models

Large Language Models (LLMs) have shown remarkable proficiency in follow...
research
08/22/2023

Towards an On-device Agent for Text Rewriting

Large Language Models (LLMs) have demonstrated impressive capabilities f...
research
06/22/2023

Prompt to GPT-3: Step-by-Step Thinking Instructions for Humor Generation

Artificial intelligence has made significant progress in natural languag...
research
05/19/2020

A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks

Many high-level procedural tasks can be decomposed into sequences of ins...
research
11/01/2022

Optimization of Oblivious Decision Tree Ensembles Evaluation for CPU

CatBoost is a popular machine learning library. CatBoost models are base...

Please sign up or login with your details

Forgot password? Click here to reset