Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models

08/03/2023
by   Zheyu Zhang, et al.
0

Large Language Models (LLMs) demonstrate remarkable performance on a variety of Natural Language Understanding (NLU) tasks, primarily due to their in-context learning ability. This ability is utilized in our proposed "CoThought" pipeline, which efficiently trains smaller "baby" language models (BabyLMs) by leveraging the Chain of Thought (CoT) prompting of LLMs. Our pipeline restructures a dataset of less than 100M in size using GPT-3.5-turbo, transforming it into task-oriented, human-readable texts that are comparable to the school texts for language learners. The BabyLM is then pretrained on this restructured dataset in a RoBERTa (Liu et al., 2019) fashion. In evaluations across 4 benchmarks, our BabyLM outperforms the RoBERTa-base in 10 linguistic, NLU, and question answering tasks by more than 3 points, showing superior ability to extract contextual information. These results suggest that compact LMs pretrained on small, LLM-restructured data can better understand tasks and achieve improved performance. The code for data processing and model training is available at: https://github.com/oooranz/Baby-CoThought.

READ FULL TEXT
research
09/15/2020

It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

When scaled to hundreds of billions of parameters, pretrained language m...
research
04/06/2021

Blow the Dog Whistle: A Chinese Dataset for Cant Understanding with Common Sense and World Knowledge

Cant is important for understanding advertising, comedies and dog-whistl...
research
11/08/2019

Negated LAMA: Birds cannot fly

Pretrained language models have achieved remarkable improvements in a br...
research
05/04/2023

AutoML-GPT: Automatic Machine Learning with GPT

AI tasks encompass a wide range of domains and fields. While numerous AI...
research
10/20/2022

Counterfactual Recipe Generation: Exploring Compositional Generalization in a Realistic Scenario

People can acquire knowledge in an unsupervised manner by reading, and c...
research
06/08/2021

Reading StackOverflow Encourages Cheating: Adding Question Text Improves Extractive Code Generation

Answering a programming question using only its title is difficult as sa...
research
06/26/2020

ProVe – Self-supervised pipeline for automated product replacement and cold-starting based on neural language models

In retail vertical industries, businesses are dealing with human limitat...

Please sign up or login with your details

Forgot password? Click here to reset