OPT: Open Pre-trained Transformer Language Models

05/02/2022
by   Susan Zhang, et al.
8

Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.

READ FULL TEXT

page 21

page 22

page 23

research
10/25/2022

IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models

We introduce a new open information extraction (OIE) benchmark for pre-t...
research
01/07/2021

Ask2Transformers: Zero-Shot Domain labelling with Pre-trained Language Models

In this paper we present a system that exploits different pre-trained La...
research
05/07/2021

Are Pre-trained Convolutions Better than Pre-trained Transformers?

In the era of pre-trained language models, Transformers are the de facto...
research
05/25/2022

BiT: Robustly Binarized Multi-distilled Transformer

Modern pre-trained transformers have rapidly advanced the state-of-the-a...
research
06/08/2023

Can AI Moderate Online Communities?

The task of cultivating healthy communication in online communities beco...
research
09/11/2023

An Empirical Study of NetOps Capability of Pre-Trained Large Language Models

Nowadays, the versatile capabilities of Pre-trained Large Language Model...
research
03/17/2023

GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models

We investigate the potential implications of large language models (LLMs...

Please sign up or login with your details

Forgot password? Click here to reset