b'Omer Levy'

research

∙ 08/11/2023

Self-Alignment with Instruction Backtranslation

We present a scalable method to build a high quality instruction followi...

0 Xian Li, et al. ∙

research

∙ 05/23/2023

ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding

We introduce ZeroSCROLLS, a zero-shot benchmark for natural language und...

0 Uri Shaham, et al. ∙

research

∙ 05/18/2023

LIMA: Less Is More for Alignment

Large language models are trained in two stages: (1) unsupervised pretra...

0 Chunting Zhou, et al. ∙

research

∙ 04/01/2023

Vision Transformers with Mixed-Resolution Tokenization

Vision Transformer models process input images by dividing them into a s...

0 Tomer Ronen, et al. ∙

research

∙ 03/02/2023

X Fuse: Fusing Visual Information in Text-to-Image Generation

We introduce X Fuse, a general approach for conditioning on visual inf...

0 Yuval Kirstain, et al. ∙

research

∙ 01/10/2023

Scaling Laws for Generative Mixed-Modal Language Models

Generative language models define distributions over sequences of tokens...

0 Armen Aghajanyan, et al. ∙

research

∙ 12/19/2022

Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

Instruction tuning enables pretrained language models to perform new tas...

0 Or Honovich, et al. ∙

research

∙ 12/17/2022

A Simple Baseline for Beam Search Reranking

Reranking methods in machine translation aim to close the gap between co...

0 Lior Vassertail, et al. ∙

research

∙ 12/14/2022

Causes and Cures for Interference in Multilingual Translation

Multilingual machine translation models can benefit from synergy between...

0 Uri Shaham, et al. ∙

research

∙ 11/03/2022

LMentry: A Language Model Benchmark of Elementary Language Tasks

As the performance of large language models rapidly improves, benchmarks...

0 Avia Efrat, et al. ∙

research

∙ 05/22/2022

Instruction Induction: From Few Examples to Natural Language Task Descriptions

Large language models are able to perform a task by conditioning on a fe...

0 Or Honovich, et al. ∙

research

∙ 04/10/2022

Breaking Character: Are Subwords Good Enough for MRLs After All?

Large pretrained language models (PLMs) typically tokenize the input str...

0 Omri Keren, et al. ∙

research

∙ 03/30/2022

Transformer Language Models without Positional Encodings Still Learn Positional Information

Transformers typically require some form of positional encoding, such as...

0 Adi Haviv, et al. ∙

research

∙ 01/31/2022

Are Mutually Intelligible Languages Easier to Translate?

Two languages are considered mutually intelligible if their native speak...

0 Avital Friedland, et al. ∙

research

∙ 01/10/2022

SCROLLS: Standardized CompaRison Over Long Language Sequences

NLP benchmarks have largely focused on short texts, such as sentences an...

0 Uri Shaham, et al. ∙

research

∙ 12/14/2021

Learning to Retrieve Passages without Supervision

Dense retrievers for open-domain question answering (ODQA) have been sho...

1 Ori Ram, et al. ∙

research

∙ 12/14/2021

Simple Local Attentions Remain Competitive for Long-Context Tasks

Many NLP tasks require processing long contexts beyond the length limit ...

0 Wenhan Xiong, et al. ∙

research

∙ 10/08/2021

A Few More Examples May Be Worth Billions of Parameters

We investigate the dynamics of increasing the number of model parameters...

0 Yuval Kirstain, et al. ∙

research

∙ 09/23/2021

ParaShoot: A Hebrew Question Answering Dataset

NLP research in Hebrew has largely focused on morphology and syntax, whe...

0 Omri Keren, et al. ∙

research

∙ 08/25/2021

Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens

Standard pretrained language models operate on sequences of subword toke...

5 Itay Itzhak, et al. ∙

research

∙ 08/12/2021

How Optimal is Greedy Decoding for Extractive Question Answering?

Fine-tuned language models use greedy decoding to answer reading compreh...

0 Or Castel, et al. ∙

research

∙ 07/20/2021

What Do You Get When You Cross Beam Search with Nucleus Sampling?

We combine beam search with the probabilistic pruning technique of nucle...

0 Uri Shaham, et al. ∙

research

∙ 04/19/2021

Can Latent Alignments Improve Autoregressive Machine Translation?

Latent alignment objectives such as CTC and AXE significantly improve no...

0 Adi Haviv, et al. ∙

research

∙ 04/15/2021

How to Train BERT with an Academic Budget

While large language models à la BERT are used ubiquitously in NLP, pret...

0 Peter Izsak, et al. ∙

research

∙ 03/01/2021

Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language

Current NLP datasets targeting ambiguity can be solved by a native speak...

0 Avia Efrat, et al. ∙

research

∙ 01/02/2021

Few-Shot Question Answering by Pretraining Span Selection

In a number of question answering (QA) benchmarks, pretrained models hav...

0 Ori Ram, et al. ∙

research

∙ 01/02/2021

Coreference Resolution without Span Representations

Since the introduction of deep pretrained language models, most task-spe...

0 Yuval Kirstain, et al. ∙

research

∙ 12/29/2020

Transformer Feed-Forward Layers Are Key-Value Memories

Feed-forward layers constitute two-thirds of a transformer model's param...

29 Mor Geva, et al. ∙

research

∙ 10/22/2020

The Turking Test: Can Language Models Understand Instructions?

Supervised machine learning provides the learner with a set of input-out...

0 Avia Efrat, et al. ∙

research

∙ 08/21/2020

Neural Machine Translation without Embeddings

Many NLP models follow the embed-contextualize-predict paradigm, in whic...

0 Uri Shaham, et al. ∙

research

∙ 04/03/2020

Aligned Cross Entropy for Non-Autoregressive Machine Translation

Non-autoregressive machine translation models significantly speed up dec...

9 Marjan Ghazvininejad, et al. ∙

research

∙ 01/23/2020

Semi-Autoregressive Training Improves Mask-Predict Decoding

The recently proposed mask-predict decoding algorithm has narrowed the p...

3 Marjan Ghazvininejad, et al. ∙

research

∙ 11/10/2019

Improving Transformer Models by Reordering their Sublayers

Multilayer transformer networks consist of interleaved self-attention an...

0 Ofir Press, et al. ∙

research

∙ 11/07/2019

Blockwise Self-Attention for Long Document Understanding

We present BlockBERT, a lightweight and efficient BERT model that is des...

0 Jiezhong Qiu, et al. ∙

research

∙ 11/01/2019

Generalization through Memorization: Nearest Neighbor Language Models

We introduce kNN-LMs, which extend a pre-trained neural language model (...

0 Urvashi Khandelwal, et al. ∙

research

∙ 10/29/2019

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

We present BART, a denoising autoencoder for pretraining sequence-to-seq...

35 Mike Lewis, et al. ∙

research

∙ 09/30/2019

Structural Language Models of Code

We address the problem of any-code completion - generating a missing pie...

0 Uri Alon, et al. ∙

research

∙ 09/30/2019

Structural Language Models for Any-Code Generation

We address the problem of Any-Code Generation (AnyGen) - generating code...

0 Uri Alon, et al. ∙

research

∙ 08/24/2019

BERT for Coreference Resolution: Baselines and Analysis

We apply BERT to coreference resolution, achieving strong improvements o...

0 Mandar Joshi, et al. ∙

research

∙ 07/26/2019

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Language model pretraining has led to significant performance gains but ...

0 Yinhan Liu, et al. ∙

research

∙ 07/24/2019

SpanBERT: Improving Pre-training by Representing and Predicting Spans

We present SpanBERT, a pre-training method that is designed to better re...

0 Mandar Joshi, et al. ∙

research

∙ 06/11/2019

What Does BERT Look At? An Analysis of BERT's Attention

Large pre-trained neural networks such as BERT have had great recent suc...

0 Kevin Clark, et al. ∙

research

∙ 05/25/2019

Are Sixteen Heads Really Better than One?

Attention is a powerful and ubiquitous mechanism for allowing neural mod...

0 Paul Michel, et al. ∙

research

∙ 05/02/2019

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

In the last year, new models and methods for pretraining and transfer le...

4 Alex Wang, et al. ∙

research

∙ 04/19/2019

Constant-Time Machine Translation with Conditional Masked Language Models

Most machine translation systems generate text autoregressively, by sequ...

6 Marjan Ghazvininejad, et al. ∙

research

∙ 02/05/2019

Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation

We consider the problem of making machine translation more robust to cha...

0 Vladimir Karpukhin, et al. ∙

research

∙ 10/20/2018

pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference

Reasoning about implied relationships (e.g. paraphrastic, common sense, ...

0 Mandar Joshi, et al. ∙

research

∙ 08/04/2018

code2seq: Generating Sequences from Structured Representations of Code

The ability to generate natural language sequences from source code snip...

0 Uri Alon, et al. ∙

research

∙ 07/13/2018

Ultra-Fine Entity Typing

We introduce a new entity typing task: given a sentence with an entity m...

2 Eunsol Choi, et al. ∙

research

∙ 05/29/2018

LSTMs Exploit Linguistic Attributes of Data

While recurrent neural networks have found success in a variety of natur...

0 Nelson F. Liu, et al. ∙

Omer Levy

Featured Co-authors

Sign in with Google

Consider DeepAI Pro