nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources

09/05/2023
by   Piotr Nawrot, et al.
0

State-of-the-art language models like T5 have revolutionized the NLP landscape, but their computational demands hinder a large portion of the research community. To address this challenge, we present nanoT5, a specially-optimized PyTorch framework for efficient pre-training and fine-tuning of T5 models. Drawing on insights from optimizer differences and prioritizing efficiency, nanoT5 allows a T5-Base model to be pre-trained on a single GPU in just 16 hours, without any loss in performance. With the introduction of this open-source framework, we hope to widen the accessibility to language modelling research and cater to the community's demand for more user-friendly T5 (Encoder-Decoder) implementations. Our contributions, including configurations, codebase, software/hardware insights, and pre-trained models, are available to the public, aiming to strike a balance between research accessibility and resource constraints in NLP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2021

EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks

Encoder-decoder transformer architectures have become popular recently w...
research
03/11/2021

The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models

In this paper, we explore the effects of language variants, data sizes, ...
research
09/12/2019

UER: An Open-Source Toolkit for Pre-training Models

Existing works, including ELMO and BERT, have revealed the importance of...
research
04/17/2023

Typos-aware Bottlenecked Pre-Training for Robust Dense Retrieval

Current dense retrievers (DRs) are limited in their ability to effective...
research
02/01/2023

CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models

Code generation models based on the pre-training and fine-tuning paradig...
research
01/03/2020

On the comparability of Pre-trained Language Models

Recent developments in unsupervised representation learning have success...
research
04/14/2023

MedAlpaca – An Open-Source Collection of Medical Conversational AI Models and Training Data

As large language models (LLMs) like OpenAI's GPT series continue to mak...

Please sign up or login with your details

Forgot password? Click here to reset