Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks

05/23/2023
by   Tiedong Liu, et al.
0

We introduce Goat, a fine-tuned LLaMA model that significantly outperforms GPT-4 on a range of arithmetic tasks. Fine-tuned on a synthetically generated dataset, Goat achieves state-of-the-art performance on BIG-bench arithmetic sub-task. In particular, the zero-shot Goat-7B matches or even surpasses the accuracy achieved by the few-shot PaLM-540B. Surprisingly, Goat can achieve near-perfect accuracy on large-number addition and subtraction through supervised fine-tuning only, which is almost impossible with previous pretrained language models, such as Bloom, OPT, GPT-NeoX, etc. We attribute Goat's exceptional performance to LLaMA's consistent tokenization of numbers. To tackle more challenging tasks like large-number multiplication and division, we propose an approach that classifies tasks based on their learnability, and subsequently decomposes unlearnable tasks, such as multi-digit multiplication and division, into a series of learnable tasks by leveraging basic arithmetic principles. We thoroughly examine the performance of our model, offering a comprehensive evaluation of the effectiveness of our proposed decomposition steps. Additionally, Goat-7B can be easily trained using LoRA on a 24GB VRAM GPU, facilitating reproducibility for other researchers. We release our model, dataset, and the Python script for dataset generation.

READ FULL TEXT

page 1

page 14

page 15

research
04/21/2023

Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition

In recent years, Large Language Models such as GPT-3 showed remarkable c...
research
08/02/2023

Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

In this work, we evaluate 10 open-source instructed LLMs on four represe...
research
06/07/2015

Visual Learning of Arithmetic Operations

A simple Neural Network model is presented for end-to-end visual learnin...
research
08/14/2023

Platypus: Quick, Cheap, and Powerful Refinement of LLMs

We present Platypus, a family of fine-tuned and merged Large Language Mo...
research
01/23/2022

An Application of Pseudo-Log-Likelihoods to Natural Language Scoring

Language models built using semi-supervised machine learning on large co...
research
05/22/2023

Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models

Task arithmetic has recently emerged as a cost-effective and scalable ap...
research
06/15/2023

ChatGPT for Suicide Risk Assessment on Social Media: Quantitative Evaluation of Model Performance, Potentials and Limitations

This paper presents a novel framework for quantitatively evaluating the ...

Please sign up or login with your details

Forgot password? Click here to reset