OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch

09/19/2023
by   Juntao Li, et al.
0

Large language models (LLMs) with billions of parameters have demonstrated outstanding performance on various natural language processing tasks. This report presents OpenBA, an open-sourced 15B bilingual asymmetric seq2seq model, to contribute an LLM variant to the Chinese-oriented open-source model community. We enhance OpenBA with effective and efficient techniques as well as adopt a three-stage training strategy to train the model from scratch. Our solution can also achieve very competitive performance with only 380B tokens, which is better than LLaMA-70B on the BELEBELE benchmark, BLOOM-176B on the MMLU benchmark, GLM-130B on the C-Eval (hard) benchmark. This report provides the main details to pre-train an analogous model, including pre-training data processing, Bilingual Flan data collection, the empirical observations that inspire our model architecture design, training objectives of different stages, and other enhancement techniques. We have refactored our code to follow the design principles of the Huggingface Transformers Library, making it more convenient for developers to use, and released checkpoints of different training stages at https://huggingface.co/openBA. More details of our project are available at https://github.com/OpenNLG/openBA.git.

READ FULL TEXT

page 5

page 6

page 8

page 12

page 16

research
11/10/2022

LERT: A Linguistically-motivated Pre-trained Language Model

Pre-trained Language Model (PLM) has become a representative foundation ...
research
06/14/2022

CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation

Code generation is a longstanding challenge, aiming to generate a code s...
research
12/26/2022

TextBox 2.0: A Text Generation Library with Pre-trained Language Models

To facilitate research on text generation, this paper presents a compreh...
research
09/05/2023

Data-Juicer: A One-Stop Data Processing System for Large Language Models

The immense evolution in Large Language Models (LLMs) has underscored th...
research
07/31/2023

Camoscio: an Italian Instruction-tuned LLaMA

In recent years Large Language Models (LLMs) have increased the state of...
research
06/05/2023

Benchmarking Middle-Trained Language Models for Neural Search

Middle training methods aim to bridge the gap between the Masked Languag...
research
10/05/2022

GLM-130B: An Open Bilingual Pre-trained Model

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained lan...

Please sign up or login with your details

Forgot password? Click here to reset