b'Quoc V. Le'

research

∙ 09/07/2023

Large Language Models as Optimizers

Optimization is ubiquitous. While derivative-based algorithms have been ...

0 Chengrun Yang, et al. ∙

research

∙ 08/07/2023

Simple synthetic data reduces sycophancy in large language models

Sycophancy is an undesirable behavior where models tailor their response...

0 Jerry Wei, et al. ∙

research

∙ 05/17/2023

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

The mixture proportions of pretraining data domains (e.g., Wikipedia, bo...

0 Sang Michael Xie, et al. ∙

research

∙ 05/15/2023

Symbol tuning improves in-context learning in language models

We present symbol tuning - finetuning language models on in-context inpu...

0 Jerry Wei, et al. ∙

research

∙ 02/13/2023

Symbolic Discovery of Optimization Algorithms

We present a method to formulate algorithm discovery as program search, ...

0 Xiangning Chen, et al. ∙

research

∙ 02/10/2023

Unified Functional Hashing in Automatic Machine Learning

The field of Automatic Machine Learning (AutoML) has recently attained i...

0 Ryan Gillard, et al. ∙

research

∙ 02/08/2023

Noise2Music: Text-conditioned Music Generation with Diffusion Models

We introduce Noise2Music, where a series of diffusion models is trained ...

0 Qingqing Huang, et al. ∙

research

∙ 02/03/2023

PyGlove: Efficiently Exchanging ML Ideas as Code

The increasing complexity and scale of machine learning (ML) has led to ...

0 Daiyi Peng, et al. ∙

research

∙ 01/31/2023

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

We study the design decisions of publicly available instruction tuning m...

0 Shayne Longpre, et al. ∙

research

∙ 11/03/2022

Inverse scaling can become U-shaped

Although scaling language models improves performance on a range of task...

0 Jason Wei, et al. ∙

research

∙ 10/20/2022

Transcending Scaling Laws with 0.1

Scaling language models improves performance but comes with significant ...

2 Yi Tay, et al. ∙

research

∙ 10/19/2022

G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR

Data augmentation is a ubiquitous technique used to provide robustness t...

0 Gary Wang, et al. ∙

research

∙ 03/23/2022

Revisiting Multi-Scale Feature Fusion for Semantic Segmentation

It is commonly believed that high internal resolution combined with expe...

8 Tianjian Meng, et al. ∙

research

∙ 02/21/2022

Transformer Quality in Linear Time

We revisit the design choices in Transformers, and propose methods to ad...

0 Weizhe Hua, et al. ∙

research

∙ 12/13/2021

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Scaling language models with more data, compute and parameters has drive...

4 Nan Du, et al. ∙

research

∙ 11/19/2021

Combined Scaling for Zero-shot Transfer Learning

We present a combined scaling method called BASIC that achieves 85.7 zer...

0 Hieu Pham, et al. ∙

research

∙ 09/27/2021

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

We summarize the results of a host of efforts using giant automatic spee...

1 Yu Zhang, et al. ∙

research

∙ 09/17/2021

Primer: Searching for Efficient Transformers for Language Modeling

Large Transformer models have been central to recent advances in natural...

0 David R. So, et al. ∙

research

∙ 09/13/2021

STraTA: Self-Training with Task Augmentation for Better Few-shot Learning

Despite their recent successes in tackling many NLP tasks, large-scale p...

0 Tu Vu, et al. ∙

research

∙ 09/03/2021

Finetuned Language Models Are Zero-Shot Learners

This paper explores a simple method for improving the zero-shot learning...

0 Jason Wei, et al. ∙

research

∙ 08/25/2021

Multi-Task Self-Training for Learning General Representations

Despite the fast progress in training specialized models for various tas...

0 Golnaz Ghiasi, et al. ∙

research

∙ 06/09/2021

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Transformers have attracted increasing interests in computer vision, but...

0 Zihang Dai, et al. ∙

research

∙ 05/17/2021

Pay Attention to MLPs

Transformers have become one of the most important architectural innovat...

39 Hanxiao Liu, et al. ∙

research

∙ 04/01/2021

EfficientNetV2: Smaller Models and Faster Training

This paper introduces EfficientNetV2, a new family of convolutional netw...

0 Mingxing Tan, et al. ∙

research

∙ 02/11/2021

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

Pre-trained representations are becoming crucial for many NLP and percep...

0 Chao Jia, et al. ∙

research

∙ 01/21/2021

PyGlove: Symbolic Programming for Automated Machine Learning

Neural networks are sensitive to hyper-parameter and architecture choice...

30 Daiyi Peng, et al. ∙

research

∙ 01/08/2021

Evolving Reinforcement Learning Algorithms

We propose a method for meta-learning reinforcement learning algorithms ...

28 John D. Co-Reyes, et al. ∙

research

∙ 01/05/2021

AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

Neural networks are often over-parameterized and hence benefit from aggr...

7 Hieu Pham, et al. ∙

research

∙ 12/15/2020

Pre-Training Transformers as Energy-Based Cloze Models

We introduce Electric, an energy-based cloze model for representation le...

0 Kevin Clark, et al. ∙

research

∙ 12/13/2020

Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation

Building instance segmentation models that are data-efficient and can ha...

18 Golnaz Ghiasi, et al. ∙

research

∙ 11/09/2020

Towards Domain-Agnostic Contrastive Learning

Despite recent success, most contrastive self-supervised learning method...

0 Vikas Verma, et al. ∙

research

∙ 10/20/2020

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

We employ a combination of recent developments in semi-supervised learni...

0 Yu Zhang, et al. ∙

research

∙ 06/25/2020

Smooth Adversarial Training

It is commonly believed that networks cannot be both accurate and robust...

13 Cihang Xie, et al. ∙

research

∙ 06/11/2020

Rethinking Pre-training and Self-training

Pre-training is a dominant paradigm in computer vision. For example, sup...

0 Barret Zoph, et al. ∙

research

∙ 06/05/2020

AutoHAS: Differentiable Hyper-parameter and Architecture Search

Neural Architecture Search (NAS) has achieved significant progress in pu...

0 Xuanyi Dong, et al. ∙

research

∙ 06/05/2020

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

With the success of language pretraining, it is highly desirable to deve...

0 Zihang Dai, et al. ∙

research

∙ 05/19/2020

Improved Noisy Student Training for Automatic Speech Recognition

Recently, a semi-supervised learning method known as "noisy student trai...

0 Daniel S. Park, et al. ∙

research

∙ 04/22/2020

Chip Placement with Deep Reinforcement Learning

In this work, we present a learning-based approach to chip placement, on...

9 Azalia Mirhoseini, et al. ∙

research

∙ 04/06/2020

Evolving Normalization-Activation Layers

Normalization layers and activation functions are critical components in...

9 Hanxiao Liu, et al. ∙

research

∙ 04/02/2020

Improving 3D Object Detection through Progressive Population Based Augmentation

Data augmentation has been widely adopted for object detection in 3D poi...

1 Shuyang Cheng, et al. ∙

research

∙ 03/23/2020

Meta Pseudo Labels

Many training algorithms of a deep neural network can be interpreted as ...

0 Hieu Pham, et al. ∙

research

∙ 03/23/2020

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Masked language modeling (MLM) pre-training methods such as BERT corrupt...

0 Kevin Clark, et al. ∙

research

∙ 03/06/2020

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

Machine learning research has advanced in multiple aspects, including mo...

32 Esteban Real, et al. ∙

research

∙ 01/27/2020

Towards a Human-like Open-Domain Chatbot

We present Meena, a multi-turn open-domain chatbot trained end-to-end on...

16 Daniel Adiwardana, et al. ∙

research

∙ 12/11/2019

SpecAugment on Large Scale Datasets

Recently, SpecAugment, an augmentation scheme for automatic speech recog...

0 Daniel S. Park, et al. ∙

research

∙ 12/10/2019

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

Convolutional neural networks typically encode an input image into a ser...

0 Xianzhi Du, et al. ∙

research

∙ 12/02/2019

MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices

Despite the blooming success of architecture search for vision tasks in ...

0 Bo Chen, et al. ∙

research

∙ 11/21/2019

Adversarial Examples Improve Image Recognition

Adversarial examples are commonly viewed as a threat to ConvNets. Here w...

24 Cihang Xie, et al. ∙

research

∙ 11/20/2019

EfficientDet: Scalable and Efficient Object Detection

Model efficiency has become increasingly important in computer vision. I...

0 Mingxing Tan, et al. ∙

research

∙ 11/11/2019

Self-training with Noisy Student improves ImageNet classification

We present a simple self-training method that achieves 87.4 on ImageNet,...

36 Qizhe Xie, et al. ∙

Quoc V. Le

Featured Co-authors

Sign in with Google

Consider DeepAI Pro