Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning

07/05/2023
by   Deepanway Ghosal, et al.
0

Recently, the release of INSTRUCTEVAL has provided valuable insights into the performance of large language models (LLMs) that utilize encoder-decoder or decoder-only architecture. Interestingly, despite being introduced four years ago, T5-based LLMs, such as FLAN-T5, continue to outperform the latest decoder-based LLMs, such as LLAMA and VICUNA, on tasks that require general problem-solving skills. This performance discrepancy can be attributed to three key factors: (1) Pre-training data, (2) Backbone architecture, and (3) Instruction dataset. In this technical report, our main focus is on investigating the impact of the third factor by leveraging VICUNA, a large language model based on LLAMA, which has undergone fine-tuning on ChatGPT conversations. To achieve this objective, we fine-tuned VICUNA using a customized instruction dataset collection called FLANMINI. This collection includes a subset of the large-scale instruction dataset known as FLAN, as well as various code-related datasets and conversational datasets derived from ChatGPT/GPT-4. This dataset comprises a large number of tasks that demand problem-solving skills. Our experimental findings strongly indicate that the enhanced problem-solving abilities of our model, FLACUNA, are obtained through fine-tuning VICUNA on the FLAN dataset, leading to significant improvements across numerous benchmark datasets in INSTRUCTEVAL. FLACUNA is publicly available at https://huggingface.co/declare-lab/flacuna-13b-v1.0.

READ FULL TEXT
research
12/22/2022

OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

Recent work has shown that fine-tuning large pre-trained language models...
research
04/24/2023

AMR Parsing with Instruction Fine-tuned Pre-trained Language Models

Instruction fine-tuned language models on a collection of instruction an...
research
05/18/2023

SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities

Multi-modal large language models are regarded as a crucial step towards...
research
07/14/2022

Exploration of an End-to-End Automatic Number-plate Recognition neural network for Indian datasets

Indian vehicle number plates have wide variety in terms of size, font, s...
research
07/08/2023

Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators

Large language models that exhibit instruction-following behaviour repre...
research
10/16/2019

Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models

In this work, we study how the large-scale pretrain-finetune framework c...
research
05/29/2023

Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning

Chain-of-thought (CoT) prompting with large language models has proven e...

Please sign up or login with your details

Forgot password? Click here to reset