Sharan Narang

research

∙ 07/18/2023

Llama 2: Open Foundation and Fine-Tuned Chat Models

In this work, we develop and release Llama 2, a collection of pretrained...

0 Hugo Touvron, et al. ∙

research

∙ 04/19/2023

A Theory on Adam Instability in Large-Scale Machine Learning

We present a theory for the previously unexplained divergent behavior no...

0 Igor Molybog, et al. ∙

research

∙ 04/18/2023

UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining

Pretrained multilingual large language models have typically used heuris...

0 Hyung Won Chung, et al. ∙

research

∙ 12/20/2022

Character-Aware Models Improve Visual Text Rendering

Current image generation models struggle to reliably produce well-formed...

0 Rosanne Liu, et al. ∙

research

∙ 10/24/2022

FCM: Forgetful Causal Masking Makes Causal Language Models Better Zero-Shot Learners

Large language models (LLM) trained using the next-token-prediction obje...

0 Hao Liu, et al. ∙

research

∙ 10/08/2022

Understanding HTML with Large Language Models

Large language models (LLMs) have shown exceptional performance on a var...

0 Izzeddin Gür, et al. ∙

research

∙ 07/21/2022

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?

There have been a lot of interest in the scaling properties of Transform...

0 Yi Tay, et al. ∙

research

∙ 04/05/2022

PaLM: Scaling Language Modeling with Pathways

Large language models have been shown to achieve remarkable performance ...

6 Aakanksha Chowdhery, et al. ∙

research

∙ 03/31/2022

Scaling Up Models and Data with and

Recent neural network-based language models have benefited greatly from ...

8 Adam Roberts, et al. ∙

research

∙ 09/22/2021

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

There remain many open questions pertaining to the scaling behaviour of ...

3 Yi Tay, et al. ∙

research

∙ 05/28/2021

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Most widely-used pre-trained language models operate on sequences of tok...

0 Linting Xue, et al. ∙

research

∙ 02/23/2021

Do Transformer Modifications Transfer Across Implementations and Applications?

The research community has proposed copious modifications to the Transfo...

10 Sharan Narang, et al. ∙

research

∙ 10/09/2020

On Task-Level Dialogue Composition of Generative Transformer Model

Task-oriented dialogue systems help users accomplish tasks such as booki...

0 Prasanna Parthasarathi, et al. ∙

research

∙ 04/30/2020

WT5?! Training Text-to-Text Models to Explain their Predictions

Neural networks have recently achieved human-level performance on variou...

0 Sharan Narang, et al. ∙

research

∙ 10/31/2019

Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning

Task-oriented dialog presents a difficult challenge encompassing multipl...

0 Arvind Neelakantan, et al. ∙

research

∙ 10/23/2019

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Transfer learning, where a model is first pre-trained on a data-rich tas...

0 Colin Raffel, et al. ∙

research

∙ 12/01/2017

Deep Learning Scaling is Predictable, Empirically

Deep learning (DL) creates impactful advances following a virtuous recip...

0 Joel Hestness, et al. ∙

research

∙ 11/08/2017

Block-Sparse Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are used in state-of-the-art models in ...

0 Sharan Narang, et al. ∙

research

∙ 10/20/2017

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

We present Deep Voice 3, a fully-convolutional attention-based neural te...

0 Wei Ping, et al. ∙

research

∙ 10/20/2017

Deep Voice 3: 2000-Speaker Neural Text-to-Speech

We present Deep Voice 3, a fully-convolutional attention-based neural te...

0 Wei Ping, et al. ∙

research

∙ 10/10/2017

Mixed Precision Training

Deep neural networks have enabled progress in a wide variety of applicat...

0 Paulius Micikevicius, et al. ∙

research

∙ 04/17/2017

Exploring Sparsity in Recurrent Neural Networks

Recurrent Neural Networks (RNN) are widely used to solve a variety of pr...

0 Sharan Narang, et al. ∙

research

∙ 07/15/2016

DSD: Dense-Sparse-Dense Training for Deep Neural Networks

Modern deep neural networks have a large number of parameters, making th...

0 Song Han, et al. ∙

research

∙ 12/08/2015

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

We show that an end-to-end deep learning approach can be used to recogni...

0 Dario Amodei, et al. ∙

Sharan Narang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro