Orca: Progressive Learning from Complex Explanation Traces of GPT-4

06/05/2023
by   Subhabrata Mukherjee, et al.
0

Recent research has focused on enhancing the capability of smaller models through imitation learning, drawing on the outputs generated by large foundation models (LFMs). A number of issues impact the quality of these models, ranging from limited imitation signals from shallow LFM outputs; small scale homogeneous training data; and most notably a lack of rigorous evaluation resulting in overestimating the small model's capability as they tend to learn to imitate the style, but not the reasoning process of LFMs. To address these challenges, we develop Orca (We are working with our legal team to publicly release a diff of the model weights in accordance with LLaMA's release policy to be published at https://aka.ms/orca-lm), a 13-billion parameter model that learns to imitate the reasoning process of LFMs. Orca learns from rich signals from GPT-4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT. To promote this progressive learning, we tap into large-scale and diverse imitation data with judicious sampling and selection. Orca surpasses conventional state-of-the-art instruction-tuned models such as Vicuna-13B by more than 100 (BBH) and 42 benchmark and shows competitive performance (4 pts gap with optimized system message) in professional and academic examinations like the SAT, LSAT, GRE, and GMAT, both in zero-shot settings without CoT; while trailing behind GPT-4. Our research indicates that learning from step-by-step explanations, whether these are generated by humans or more advanced AI models, is a promising direction to improve model capabilities and skills.

READ FULL TEXT

page 11

page 27

page 31

page 35

page 36

page 37

page 39

page 40

research
04/24/2023

WizardLM: Empowering Large Language Models to Follow Complex Instructions

Training large language models (LLM) with open-domain instruction follow...
research
05/20/2023

LogiCoT: Logical Chain-of-Thought Instruction-Tuning Data Collection with GPT-4

Generative Pre-trained Transformer 4 (GPT-4) demonstrates impressive cha...
research
11/11/2022

NeuroCERIL: Robotic Imitation Learning via Hierarchical Cause-Effect Reasoning in Programmable Attractor Neural Networks

Imitation learning allows social robots to learn new skills from human t...
research
11/27/2020

TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game

StarCraft, one of the most difficult esport games with long-standing his...
research
09/20/2023

Prompt, Plan, Perform: LLM-based Humanoid Control via Quantized Imitation Learning

In recent years, reinforcement learning and imitation learning have show...
research
05/25/2023

The False Promise of Imitating Proprietary LLMs

An emerging method to cheaply improve a weaker language model is to fine...

Please sign up or login with your details

Forgot password? Click here to reset