Scaling Up Models and Data with and

03/31/2022
by   Adam Roberts, et al.
8

Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: simplifies the process of building and training large language models at scale while maintaining ease of use, and provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data. Along with the libraries, we release configurations and instructions for T5-like encoder-decoder models as well as GPT-like decoder-only architectures. and are open source and available at https://github.com/google-research/t5x and https://github.com/google/seqio, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2023

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

We introduce OpenFlamingo, a family of autoregressive vision-language mo...
research
12/14/2017

Rasa: Open Source Language Understanding and Dialogue Management

We introduce a pair of tools, Rasa NLU and Rasa Core, which are open sou...
research
05/03/2023

CodeGen2: Lessons for Training LLMs on Programming and Natural Languages

Large language models (LLMs) have demonstrated remarkable abilities in r...
research
04/03/2023

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

How do large language models (LLMs) develop and evolve over the course o...
research
07/27/2022

Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox Optimization

Vizier is the de-facto blackbox and hyperparameter optimization service ...
research
05/30/2023

Bigger, Better, Faster: Human-level Atari with human-level efficiency

We introduce a value-based RL agent, which we call BBF, that achieves su...
research
06/16/2022

PRANC: Pseudo RAndom Networks for Compacting deep models

Communication becomes a bottleneck in various distributed Machine Learni...

Please sign up or login with your details

Forgot password? Click here to reset