What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

09/10/2021
by   Boseop Kim, et al.
0

GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. Enhanced by our Korean-specific tokenization, HyperCLOVA with our training configuration shows state-of-the-art in-context zero-shot and few-shot learning performances on various downstream tasks in Korean. Also, we show the performance benefits of prompt-based learning and demonstrate how it can be integrated into the prompt engineering pipeline. Then we discuss the possibility of materializing the No Code AI paradigm by providing AI prototyping capabilities to non-experts of ML by introducing HyperCLOVA studio, an interactive prompt engineering interface. Lastly, we demonstrate the potential of our methods with three successful in-house applications.

READ FULL TEXT
research
04/28/2022

On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model

Many recent studies on large-scale language models have reported success...
research
08/30/2021

On the Multilingual Capabilities of Very Large-Scale English Language Models

Generative Pre-trained Transformers (GPTs) have recently been scaled to ...
research
01/28/2022

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Pretrained general-purpose language models can achieve state-of-the-art ...
research
04/05/2022

PaLM: Scaling Language Modeling with Pathways

Large language models have been shown to achieve remarkable performance ...
research
02/12/2022

Semantic-Oriented Unlabeled Priming for Large-Scale Language Models

Due to the high costs associated with finetuning large language models, ...
research
07/09/2022

Few-shot training LLMs for project-specific code-summarization

Very large language models (LLMs), such as GPT-3 and Codex have achieved...
research
02/02/2022

Co-training Improves Prompt-based Learning for Large Language Models

We demonstrate that co-training (Blum Mitchell, 1998) can improve th...

Please sign up or login with your details

Forgot password? Click here to reset