Luke Zettlemoyer

research

∙ 08/31/2023

The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

We present Belebele, a multiple-choice machine reading comprehension (MR...

0 Lucas Bandarkar, et al. ∙

research

∙ 08/31/2023

The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages

Gender biases in language generation systems are challenging to mitigate...

0 Benjamin Müller, et al. ∙

research

∙ 08/11/2023

Self-Alignment with Instruction Backtranslation

We present a scalable method to build a high quality instruction followi...

0 Xian Li, et al. ∙

research

∙ 08/08/2023

Shepherd: A Critic for Language Model Generation

As large language models improve, there is increasing interest in techni...

0 Tianlu Wang, et al. ∙

research

∙ 08/08/2023

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

The legality of training language models (LMs) on copyrighted or otherwi...

0 Sewon Min, et al. ∙

research

∙ 07/31/2023

Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?

Coaxing out desired behavior from pretrained models, while avoiding unde...

0 Ari Holtzman, et al. ∙

research

∙ 05/24/2023

Trusting Your Evidence: Hallucinate Less with Context-aware Decoding

Language models (LMs) often struggle to pay enough attention to the inpu...

0 Weijia Shi, et al. ∙

research

∙ 05/24/2023

Mixture of Prompt Experts for Generalizable and Interpretable Question Answering

One of the ultimate quests of question answering (QA) is to deploy a sys...

0 Chenglei Si, et al. ∙

research

∙ 05/23/2023

QLoRA: Efficient Finetuning of Quantized LLMs

We present QLoRA, an efficient finetuning approach that reduces memory u...

0 Tim Dettmers, et al. ∙

research

∙ 05/23/2023

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Evaluating the factuality of long-form text generated by large language ...

0 Sewon Min, et al. ∙

research

∙ 05/23/2023

Revisiting Machine Translation for Cross-lingual Classification

Machine Translation (MT) has been widely used for cross-lingual classifi...

0 Mikel Artetxe, et al. ∙

research

∙ 05/18/2023

LIMA: Less Is More for Alignment

Large language models are trained in two stages: (1) unsupervised pretra...

0 Chunting Zhou, et al. ∙

research

∙ 05/12/2023

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Autoregressive transformers are spectacular models for short sequences b...

0 Lili Yu, et al. ∙

research

∙ 04/26/2023

Translate to Disambiguate: Zero-shot Multilingual Word Sense Disambiguation with Pretrained Language Models

Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and...

0 Haoqiang Kang, et al. ∙

research

∙ 04/25/2023

Stable and low-precision training for large-scale vision-language models

We introduce new methods for 1) accelerating and 2) stabilizing training...

0 Mitchell Wortsman, et al. ∙

research

∙ 03/24/2023

Scaling Expert Language Models with Unsupervised Domain Discovery

Large language models are typically trained densely: all parameters are ...

0 Suchin Gururangan, et al. ∙

research

∙ 03/16/2023

ART: Automatic multi-step reasoning and tool-use for large language models

Large language models (LLMs) can perform complex reasoning in few- and z...

0 Bhargavi Paranjape, et al. ∙

research

∙ 02/15/2023

Dictionary-based Phrase-level Prompting of Large Language Models for Machine Translation

Large language models (LLMs) demonstrate remarkable machine translation ...

0 Marjan Ghazvininejad, et al. ∙

research

∙ 02/09/2023

Toolformer: Language Models Can Teach Themselves to Use Tools

Language models (LMs) exhibit remarkable abilities to solve new tasks fr...

1 Timo Schick, et al. ∙

research

∙ 02/04/2023

Representation Deficiency in Masked Language Modeling

Masked Language Modeling (MLM) has been one of the most prominent approa...

0 Yu Meng, et al. ∙

research

∙ 01/30/2023

REPLUG: Retrieval-Augmented Black-Box Language Models

We introduce REPLUG, a retrieval-augmented language modeling framework t...

7 Weijia Shi, et al. ∙

research

∙ 01/25/2023

XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models

Large multilingual language models typically rely on a single vocabulary...

3 Davis Liang, et al. ∙

research

∙ 01/10/2023

Scaling Laws for Generative Mixed-Modal Language Models

Generative language models define distributions over sequences of tokens...

0 Armen Aghajanyan, et al. ∙

research

∙ 01/05/2023

CiT: Curation in Training for Effective Vision-Language Data

Large vision-language models are generally applicable to many downstream...

0 Hu Xu, et al. ∙

research

∙ 12/22/2022

OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

Recent work has shown that fine-tuning large pre-trained language models...

0 Srinivasan Iyer, et al. ∙

research

∙ 12/20/2022

Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?

Large language models can perform new tasks in a zero-shot fashion, give...

0 Weijia Shi, et al. ∙

research

∙ 12/19/2022

Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations

Although large language models can be prompted for both zero- and few-sh...

0 Xinxi Lyu, et al. ∙

research

∙ 12/19/2022

Training Trajectories of Language Models Across Scales

Scaling up language models has led to unprecedented performance gains, b...

0 Mengzhou Xia, et al. ∙

research

∙ 12/19/2022

The case for 4-bit precision: k-bit Inference Scaling Laws

Quantization methods reduce the number of bits required to represent eac...

0 Tim Dettmers, et al. ∙

research

∙ 12/15/2022

ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning

Large language models show improved downstream task performance when pro...

0 Olga Golovneva, et al. ∙

research

∙ 12/08/2022

Demystifying Prompts in Language Models via Perplexity Estimation

Language models can be prompted to perform a wide variety of zero- and f...

0 Hila Gonen, et al. ∙

research

∙ 12/05/2022

In-context Examples Selection for Machine Translation

Large-scale generative models show an impressive ability to perform a wi...

0 Sweta Agrawal, et al. ∙

research

∙ 12/02/2022

Nonparametric Masked Language Modeling

Existing language models (LMs) predict tokens with a softmax over a fini...

0 Sewon Min, et al. ∙

research

∙ 12/02/2022

AGRO: Adversarial Discovery of Error-prone groups for Robust Optimization

Models trained via empirical risk minimization (ERM) are known to rely o...

0 Bhargavi Paranjape, et al. ∙

research

∙ 11/30/2022

CREPE: Open-Domain Question Answering with False Presuppositions

Information seeking users often pose questions with false presupposition...

0 Xinyan Velocity Yu, et al. ∙

research

∙ 11/22/2022

Retrieval-Augmented Multimodal Language Modeling

Recent multimodal models such as DALL-E and CM3 have achieved remarkable...

28 Michihiro Yasunaga, et al. ∙

research

∙ 11/15/2022

Prompting Language Models for Linguistic Structure

Although pretrained language models (PLMs) can be prompted to perform a ...

0 Terra Blevins, et al. ∙

research

∙ 10/27/2022

Contrastive Decoding: Open-ended Text Generation as Optimization

Likelihood, although useful as a training loss, is a poor search objecti...

0 Xiang Lisa Li, et al. ∙

research

∙ 10/25/2022

RoMQA: A Benchmark for Robust, Multi-evidence, Multi-answer Question Answering

We introduce RoMQA, the first benchmark for robust, multi-evidence, mult...

0 Victor Zhong, et al. ∙

research

∙ 10/13/2022

M2D2: A Massively Multi-domain Language Modeling Dataset

We present M2D2, a fine-grained, massively multi-domain corpus for study...

0 Machel Reid, et al. ∙

research

∙ 10/10/2022

CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation

Counterfactual data augmentation (CDA) – i.e., adding minimally perturbe...

0 Tanay Dixit, et al. ∙

research

∙ 10/06/2022

Binding Language Models in Symbolic Languages

Though end-to-end neural approaches have recently been dominating NLP ta...

2 Zhoujun Cheng, et al. ∙

research

∙ 09/30/2022

Improving Policy Learning via Language Dynamics Distillation

Recent work has shown that augmenting environments with language descrip...

0 Victor Zhong, et al. ∙

research

∙ 09/21/2022

Mega: Moving Average Equipped Gated Attention

The design choices in the Transformer attention mechanism, including wea...

2 Xuezhe Ma, et al. ∙

research

∙ 08/15/2022

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Large language models have been widely adopted but require significant G...

0 Tim Dettmers, et al. ∙

research

∙ 08/05/2022

Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

We present Branch-Train-Merge (BTM), a communication-efficient algorithm...

0 Margaret Li, et al. ∙

research

∙ 06/21/2022

Questions Are All You Need to Train a Dense Passage Retriever

We introduce ART, a new corpus-level autoencoding approach for training ...

6 Devendra Singh Sachan, et al. ∙

research

∙ 06/07/2022

LegoNN: Building Modular Encoder-Decoder Models

State-of-the-art encoder-decoder models (e.g. for machine translation (M...

0 Siddharth Dalmia, et al. ∙

research

∙ 05/27/2022

Nearest Neighbor Zero-Shot Inference

We introduce kNN-Prompt, a simple and effective technique to use k-neare...

0 Weijia Shi, et al. ∙

research

∙ 05/25/2022

Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI

Evaluating an explanation's faithfulness is desired for many reasons suc...

0 Suzanna Sia, et al. ∙

Luke Zettlemoyer

Featured Co-authors

Sign in with Google

Consider DeepAI Pro