Towards Better Instruction Following Language Models for Chinese: Investigating the Impact of Training Data and Evaluation

04/16/2023
by   Yunjie Ji, et al.
0

Recently, significant public efforts have been directed towards developing low-cost models with capabilities akin to ChatGPT, thereby fostering the growth of open-source conversational models. However, there remains a scarcity of comprehensive and in-depth evaluations of these models' performance. In this study, we examine the influence of training data factors, including quantity, quality, and linguistic distribution, on model performance. Our analysis is grounded in several publicly accessible, high-quality instruction datasets, as well as our own Chinese multi-turn conversations. We assess various models using a evaluation set of 1,000 samples, encompassing nine real-world scenarios. Our goal is to supplement manual evaluations with quantitative analyses, offering valuable insights for the continued advancement of open-source chat models. Furthermore, to enhance the performance and training and inference efficiency of models in the Chinese domain, we extend the vocabulary of LLaMA - the model with the closest open-source performance to proprietary language models like GPT-3 - and conduct secondary pre-training on 3.4B Chinese words. We make our model, data, as well as code publicly available.

READ FULL TEXT
research
05/04/2023

Panda LLM: Training Data and Evaluation for Open-Sourced Chinese Instruction-Following Large Language Models

This project focuses on enhancing open-source large language models thro...
research
03/17/2022

EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training

Large-scale pre-training has shown remarkable performance in building op...
research
05/23/2023

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

Fine-tuning on instruction data has been widely validated as an effectiv...
research
04/17/2023

Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca

Large Language Models (LLMs), such as ChatGPT and GPT-4, have revolution...
research
08/23/2023

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning

In the realm of Large Language Models, the balance between instruction d...
research
08/09/2023

CLEVA: Chinese Language Models EVAluation Platform

With the continuous emergence of Chinese Large Language Models (LLMs), h...
research
09/07/2023

dacl1k: Real-World Bridge Damage Dataset Putting Open-Source Data to the Test

Recognising reinforced concrete defects (RCDs) is a crucial element for ...

Please sign up or login with your details

Forgot password? Click here to reset