b'Colin Raffel'

research

∙ 06/07/2023

Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models

Currently, most machine learning models are trained by centralized teams...

0 Nikhil Kandpal, et al. ∙

research

∙ 06/06/2023

Soft Merging of Experts with Adaptive Routing

Sparsely activated neural networks with conditional computation learn to...

0 Mohammed Muqeeth, et al. ∙

research

∙ 06/02/2023

Resolving Interference When Merging Models

Transfer learning - i.e., further fine-tuning a pre-trained model on a d...

2 Prateek Yadav, et al. ∙

research

∙ 05/25/2023

Scaling Data-Constrained Language Models

The current trend of scaling language models involves increasing both pa...

0 Niklas Muennighoff, et al. ∙

research

∙ 02/09/2023

Knowledge is a Region in Weight Space for Fine-tuned Language Models

Research on neural networks has largely focused on understanding a singl...

0 Almog Gueta, et al. ∙

research

∙ 12/02/2022

ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning

Pretraining has been shown to scale well with compute, data size and dat...

0 Shachar Don-Yehiya, et al. ∙

research

∙ 11/15/2022

Evaluating the Factual Consistency of Large Language Models Through Summarization

While large language models (LLMs) have proven to be effective on a larg...

6 Derek Tam, et al. ∙

research

∙ 11/15/2022

Large Language Models Struggle to Learn Long-Tail Knowledge

The internet contains a wealth of knowledge – from the birthdays of hist...

0 Nikhil Kandpal, et al. ∙

research

∙ 11/03/2022

Crosslingual Generalization through Multitask Finetuning

Multitask prompted finetuning (MTF) has been shown to help large languag...

0 Niklas Muennighoff, et al. ∙

research

∙ 10/27/2022

What Language Model to Train if You Have One Million GPU Hours?

The crystallization of modeling methods around the Transformer architect...

4 Teven Le Scao, et al. ∙

research

∙ 10/02/2022

Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language

Deep learning models struggle with compositional generalization, i.e. th...

0 Zhenlin Xu, et al. ∙

research

∙ 10/01/2022

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

The NP-hard problem of optimizing a shallow ReLU network can be characte...

0 Michael Matena, et al. ∙

research

∙ 09/29/2022

Bidirectional Language Models Are Also Few-shot Learners

Large language models such as GPT-3 (Brown et al., 2020) can perform arb...

3 Ajay Patel, et al. ∙

research

∙ 09/02/2022

Petals: Collaborative Inference and Fine-tuning of Large Models

Many NLP tasks benefit from using large language models (LLMs) that ofte...

0 Alexander Borzunov, et al. ∙

research

∙ 08/31/2022

Efficient Methods for Natural Language Processing: A Survey

Getting the most out of limited resources allows advances in natural lan...

9 Marcos Treviso, et al. ∙

research

∙ 05/11/2022

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Few-shot in-context learning (ICL) enables pre-trained language models t...

16 Haokun Liu, et al. ∙

research

∙ 04/12/2022

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

Large pretrained Transformer language models have been shown to exhibit ...

11 Thomas Wang, et al. ∙

research

∙ 03/31/2022

Scaling Up Models and Data with and

Recent neural network-based language models have benefited greatly from ...

8 Adam Roberts, et al. ∙

research

∙ 02/14/2022

Deduplicating Training Data Mitigates Privacy Risks in Language Models

Past work has shown that large language models are susceptible to privac...

0 Nikhil Kandpal, et al. ∙

research

∙ 02/02/2022

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

PromptSource is a system for creating, sharing, and using natural langua...

17 Stephen H. Bach, et al. ∙

research

∙ 12/20/2021

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

What are the units of text that we want to model? From bytes to multi-wo...

21 Sabrina J. Mielke, et al. ∙

research

∙ 11/18/2021

Training Neural Networks with Fixed Sparse Masks

During typical gradient-based training of deep neural networks, all of t...

0 Yi-Lin Sung, et al. ∙

research

∙ 11/18/2021

Merging Models with Fisher-Weighted Averaging

Transfer learning provides a way of leveraging knowledge from one task w...

0 Michael Matena, et al. ∙

research

∙ 10/15/2021

Multitask Prompted Training Enables Zero-Shot Task Generalization

Large language models have recently been shown to attain reasonable zero...

10 Victor Sanh, et al. ∙

research

∙ 06/14/2021

An Empirical Survey of Data Augmentation for Limited Data Learning in NLP

NLP has achieved great progress in the past decade through the use of ne...

30 Jiaao Chen, et al. ∙

research

∙ 06/06/2021

On Training Sample Memorization: Lessons from Benchmarking Generative Modeling with a Large-scale Competition

Many recent developments on generative models for natural images have re...

0 Ching-Yuan Bai, et al. ∙

research

∙ 05/28/2021

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Most widely-used pre-trained language models operate on sequences of tok...

0 Linting Xue, et al. ∙

research

∙ 03/22/2021

Improving and Simplifying Pattern Exploiting Training

Recently, pre-trained language models (LMs) have achieved strong perform...

8 Derek Tam, et al. ∙

research

∙ 02/23/2021

Do Transformer Modifications Transfer Across Implementations and Applications?

The research community has proposed copious modifications to the Transfo...

10 Sharan Narang, et al. ∙

research

∙ 01/01/2021

NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

We review the EfficientQA competition from NeurIPS 2020. The competition...

15 Sewon Min, et al. ∙

research

∙ 12/14/2020

Extracting Training Data from Large Language Models

It has become common to publish large (billion parameter) language model...

0 Nicholas Carlini, et al. ∙

research

∙ 10/22/2020

mT5: A massively multilingual pre-trained text-to-text transformer

The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified ...

0 Linting Xue, et al. ∙

research

∙ 04/30/2020

WT5?! Training Text-to-Text Models to Explain their Predictions

Neural networks have recently achieved human-level performance on variou...

0 Sharan Narang, et al. ∙

research

∙ 02/18/2020

Deflecting Adversarial Attacks

There has been an ongoing cycle where stronger defenses against adversar...

5 Yao Qin, et al. ∙

research

∙ 02/14/2020

Top-K Training of GANs: Improving Generators by Making Critics Less Critical

We introduce a simple (one line of code) modification to the Generative ...

16 Samarth Sinha, et al. ∙

research

∙ 02/10/2020

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

It has recently been observed that neural language models trained on uns...

0 Adam Roberts, et al. ∙

research

∙ 01/21/2020

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Semi-supervised learning (SSL) provides an effective means of leveraging...

7 Kihyuk Sohn, et al. ∙

research

∙ 01/10/2020

Towards GAN Benchmarks Which Require Generalization

For many evaluation metrics commonly used as benchmarks for unconditiona...

0 Ishaan Gulrajani, et al. ∙

research

∙ 11/21/2019

ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring

We improve the recently-proposed "MixMatch" semi-supervised learning alg...

24 David Berthelot, et al. ∙

research

∙ 10/23/2019

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Transfer learning, where a model is first pre-trained on a data-rich tas...

0 Colin Raffel, et al. ∙

research

∙ 07/05/2019

Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions

Adversarial examples raise questions about whether neural network models...

5 Yao Qin, et al. ∙

research

∙ 06/12/2019

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation

Simultaneous machine translation begins to translate each source sentenc...

0 Naveen Arivazhagan, et al. ∙

research

∙ 05/06/2019

MixMatch: A Holistic Approach to Semi-Supervised Learning

Semi-supervised learning has proven to be a powerful paradigm for levera...

20 David Berthelot, et al. ∙

research

∙ 03/22/2019

Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition

Adversarial examples are inputs to machine learning models designed by a...

40 Yao Qin, et al. ∙

research

∙ 02/21/2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Lingvo is a Tensorflow framework offering a complete solution for collab...

13 Jonathan Shen, et al. ∙

research

∙ 07/19/2018

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Autoencoders provide a powerful framework for learning compressed repres...

10 David Berthelot, et al. ∙

research

∙ 06/01/2018

Learning a Latent Space of Multitrack Measures

Discovering and exploring the underlying structure of multi-instrumental...

0 Ian Simon, et al. ∙

research

∙ 04/24/2018

Realistic Evaluation of Deep Semi-Supervised Learning Algorithms

Semi-supervised learning (SSL) provides a powerful framework for leverag...

2 Avital Oliver, et al. ∙

research

∙ 03/13/2018

A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music

The Variational Autoencoder (VAE) has proven to be an effective model fo...

0 Adam Roberts, et al. ∙

research

∙ 02/23/2018

Is Generator Conditioning Causally Related to GAN Performance?

Recent work (Pennington et al, 2017) suggests that controlling the entir...

0 Augustus Odena, et al. ∙

Colin Raffel

Featured Co-authors

Sign in with Google

Consider DeepAI Pro