Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

07/29/2023
by   Viet Dac Lai, et al.
0

A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at https://github.com/nlp-uoregon/Okapi.

READ FULL TEXT

page 4

page 9

research
05/24/2023

Bactrian-X : A Multilingual Replicable Instruction-Following Model with Low-Rank Adaptation

Instruction tuning has shown great promise in the field of natural langu...
research
10/08/2022

Generative Language Models for Paragraph-Level Question Generation

Powerful generative models have led to recent progress in question gener...
research
07/08/2023

Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators

Large language models that exhibit instruction-following behaviour repre...
research
05/24/2023

ClusterLLM: Large Language Models as a Guide for Text Clustering

We introduce ClusterLLM, a novel text clustering framework that leverage...
research
09/16/2023

Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca

Foundational large language models (LLMs) can be instruction-tuned to de...
research
08/25/2023

The Poison of Alignment

From the perspective of content safety issues, alignment has shown to li...
research
06/08/2023

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

Instruction tuning large language models (LLMs) remains a challenging ta...

Please sign up or login with your details

Forgot password? Click here to reset