Generative AI systems across modalities, ranging from text, image, audio...
The growing need for accountability of the people behind AI systems can ...
The BigCode community, an open-scientific collaboration working on the
r...
The advancement of speech technologies has been remarkable, yet its
inte...
As machine learning-enabled Text-to-Image (TTI) systems are becoming
inc...
As language models grow ever larger, the need for large-scale high-quali...
ROOTS is a 1.6TB multilingual text corpus developed for the training of
...
Open Artificial Intelligence (Open source AI) collaboratives offer
alter...
The BigCode project is an open-scientific collaboration working on the
r...
We identify the task of measuring data to quantitatively characterize th...
The BigScience Workshop was a value-driven initiative that spanned one a...
Large Language Models (LLMs) play an ever-increasing role in the field o...
The infrastructure necessary for training state-of-the-art models is bec...
The recent emergence and adoption of Machine Learning technology, and
sp...
In recent years, large-scale data collection efforts have prioritized th...
The scale, variety, and quantity of publicly-available NLP datasets has ...
Developing documentation guidelines and easy-to-use templates for datase...
Modern deep learning applications require increasingly more compute to t...
With the success of large-scale pre-training and multilingual modeling i...
We introduce GEM, a living benchmark for natural language Generation (NL...
Challenging problems such as open-domain question answering, fact checki...
Neural sequence to sequence models are well established for applications...
Back-translation based approaches have recently lead to significant prog...
In this document we describe a rationale for a research program aimed at...
We introduce the first large-scale corpus for long-form question answeri...
This paper describes an implementation of a bot assistant in Minecraft, ...
We propose a large scale semantic parsing dataset focused on
instruction...
In this work, we present the Grounded Recurrent Neural Network (GRNN), a...
This work presents a novel objective function for the unsupervised train...
Recurrent neural networks (RNNs) have been used extensively and with
inc...
We consider multi-class classification where the predictor has a hierarc...
We describe a simple neural language model that relies only on
character...