Armen Aghajanyan

research

∙ 08/23/2023

D4: Improving LLM Pretraining via Document De-Duplication and Diversification

Over recent years, an increasing amount of compute and data has been pou...

0 Kushal Tirumala, et al. ∙

research

∙ 05/12/2023

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Autoregressive transformers are spectacular models for short sequences b...

0 Lili Yu, et al. ∙

research

∙ 01/10/2023

Scaling Laws for Generative Mixed-Modal Language Models

Generative language models define distributions over sequences of tokens...

0 Armen Aghajanyan, et al. ∙

research

∙ 11/29/2022

BARTSmiles: Generative Masked Language Models for Molecular Representations

We discover a robust self-supervised strategy tailored towards molecular...

0 Gayane Chilingaryan, et al. ∙

research

∙ 11/22/2022

Retrieval-Augmented Multimodal Language Modeling

Recent multimodal models such as DALL-E and CM3 have achieved remarkable...

28 Michihiro Yasunaga, et al. ∙

research

∙ 05/22/2022

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Despite their wide adoption, the underlying training and memorization dy...

9 Kushal Tirumala, et al. ∙

research

∙ 04/15/2022

Improving Passage Retrieval with Zero-Shot Question Generation

We propose a simple and effective re-ranking method for improving passag...

0 Devendra Singh Sachan, et al. ∙

research

∙ 04/12/2022

InCoder: A Generative Model for Code Infilling and Synthesis

Code is seldom written in a single left-to-right pass and is instead rep...

6 Daniel Fried, et al. ∙

research

∙ 01/19/2022

CM3: A Causal Masked Multimodal Model of the Internet

We introduce CM3, a family of causally masked generative models trained ...

8 Armen Aghajanyan, et al. ∙

research

∙ 10/14/2021

CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training

With the rise of large-scale pre-trained language models, open-domain qu...

0 Patrick Huber, et al. ∙

research

∙ 09/28/2021

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

We present VideoCLIP, a contrastive approach to pre-train a unified mode...

0 Hu Xu, et al. ∙

research

∙ 07/14/2021

HTLM: Hyper-Text Pre-Training and Prompting of Language Models

We introduce HTLM, a hyper-text language model trained on a large-scale ...

0 Armen Aghajanyan, et al. ∙

research

∙ 04/11/2021

Non-Autoregressive Semantic Parsing for Compositional Task-Oriented Dialog

Semantic parsing using sequence-to-sequence models allows parsing of dee...

3 Arun Babu, et al. ∙

research

∙ 01/26/2021

Muppet: Massive Multi-task Representations with Pre-Finetuning

We propose pre-finetuning, an additional large-scale learning stage betw...

4 Armen Aghajanyan, et al. ∙

research

∙ 12/22/2020

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

Although pretrained language models can be fine-tuned to produce state-o...

0 Armen Aghajanyan, et al. ∙

research

∙ 09/28/2020

Conversational Semantic Parsing

The structured representation for semantic parsing in task-oriented assi...

0 Armen Aghajanyan, et al. ∙

research

∙ 08/06/2020

Better Fine-Tuning by Reducing Representational Collapse

Although widely adopted, existing approaches for fine-tuning pre-trained...

2 Armen Aghajanyan, et al. ∙

research

∙ 06/26/2020

Pre-training via Paraphrasing

We introduce MARGE, a pre-trained sequence-to-sequence model learned wit...

7 Mike Lewis, et al. ∙

research

∙ 09/23/2018

Towards Language Agnostic Universal Representations

When a bilingual student learns to solve word problems in math, we expec...

0 Armen Aghajanyan, et al. ∙

research

∙ 02/21/2017

Convolution Aware Initialization

Initialization of parameters in deep neural networks has been shown to h...

0 Armen Aghajanyan, et al. ∙

Armen Aghajanyan

Featured Co-authors

Sign in with Google

Consider DeepAI Pro