We present a scalable method to build a high quality instruction followi...
We introduce ZeroSCROLLS, a zero-shot benchmark for natural language
und...
Large language models are trained in two stages: (1) unsupervised pretra...
Vision Transformer models process input images by dividing them into a
s...
We introduce X Fuse, a general approach for conditioning on visual
inf...
Generative language models define distributions over sequences of tokens...
Instruction tuning enables pretrained language models to perform new tas...
Reranking methods in machine translation aim to close the gap between co...
Multilingual machine translation models can benefit from synergy between...
As the performance of large language models rapidly improves, benchmarks...
Large language models are able to perform a task by conditioning on a fe...
Large pretrained language models (PLMs) typically tokenize the input str...
Transformers typically require some form of positional encoding, such as...
Two languages are considered mutually intelligible if their native speak...
NLP benchmarks have largely focused on short texts, such as sentences an...
Dense retrievers for open-domain question answering (ODQA) have been sho...
Many NLP tasks require processing long contexts beyond the length limit ...
We investigate the dynamics of increasing the number of model parameters...
NLP research in Hebrew has largely focused on morphology and syntax, whe...
Standard pretrained language models operate on sequences of subword toke...
Fine-tuned language models use greedy decoding to answer reading
compreh...
We combine beam search with the probabilistic pruning technique of nucle...
Latent alignment objectives such as CTC and AXE significantly improve
no...
While large language models à la BERT are used ubiquitously in NLP,
pret...
Current NLP datasets targeting ambiguity can be solved by a native speak...
In a number of question answering (QA) benchmarks, pretrained models hav...
Since the introduction of deep pretrained language models, most task-spe...
Feed-forward layers constitute two-thirds of a transformer model's
param...
Supervised machine learning provides the learner with a set of input-out...
Many NLP models follow the embed-contextualize-predict paradigm, in whic...
Non-autoregressive machine translation models significantly speed up dec...
The recently proposed mask-predict decoding algorithm has narrowed the
p...
Multilayer transformer networks consist of interleaved self-attention an...
We present BlockBERT, a lightweight and efficient BERT model that is des...
We introduce kNN-LMs, which extend a pre-trained neural language model (...
We present BART, a denoising autoencoder for pretraining sequence-to-seq...
We address the problem of any-code completion - generating a missing pie...
We address the problem of Any-Code Generation (AnyGen) - generating code...
We apply BERT to coreference resolution, achieving strong improvements o...
Language model pretraining has led to significant performance gains but
...
We present SpanBERT, a pre-training method that is designed to better
re...
Large pre-trained neural networks such as BERT have had great recent suc...
Attention is a powerful and ubiquitous mechanism for allowing neural mod...
In the last year, new models and methods for pretraining and transfer
le...
Most machine translation systems generate text autoregressively, by
sequ...
We consider the problem of making machine translation more robust to
cha...
Reasoning about implied relationships (e.g. paraphrastic, common sense,
...
The ability to generate natural language sequences from source code snip...
We introduce a new entity typing task: given a sentence with an entity
m...
While recurrent neural networks have found success in a variety of natur...