
-
Few-Shot Question Answering by Pretraining Span Selection
In a number of question answering (QA) benchmarks, pretrained models hav...
read it
-
Coreference Resolution without Span Representations
Since the introduction of deep pretrained language models, most task-spe...
read it
-
Transformer Feed-Forward Layers Are Key-Value Memories
Feed-forward layers constitute two-thirds of a transformer model's param...
read it
-
The Turking Test: Can Language Models Understand Instructions?
Supervised machine learning provides the learner with a set of input-out...
read it
-
Neural Machine Translation without Embeddings
Many NLP models follow the embed-contextualize-predict paradigm, in whic...
read it
-
Aligned Cross Entropy for Non-Autoregressive Machine Translation
Non-autoregressive machine translation models significantly speed up dec...
read it
-
Semi-Autoregressive Training Improves Mask-Predict Decoding
The recently proposed mask-predict decoding algorithm has narrowed the p...
read it
-
Improving Transformer Models by Reordering their Sublayers
Multilayer transformer networks consist of interleaved self-attention an...
read it
-
Blockwise Self-Attention for Long Document Understanding
We present BlockBERT, a lightweight and efficient BERT model that is des...
read it
-
Generalization through Memorization: Nearest Neighbor Language Models
We introduce kNN-LMs, which extend a pre-trained neural language model (...
read it
-
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
We present BART, a denoising autoencoder for pretraining sequence-to-seq...
read it
-
Structural Language Models of Code
We address the problem of any-code completion - generating a missing pie...
read it
-
Structural Language Models for Any-Code Generation
We address the problem of Any-Code Generation (AnyGen) - generating code...
read it
-
BERT for Coreference Resolution: Baselines and Analysis
We apply BERT to coreference resolution, achieving strong improvements o...
read it
-
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Language model pretraining has led to significant performance gains but ...
read it
-
SpanBERT: Improving Pre-training by Representing and Predicting Spans
We present SpanBERT, a pre-training method that is designed to better re...
read it
-
What Does BERT Look At? An Analysis of BERT's Attention
Large pre-trained neural networks such as BERT have had great recent suc...
read it
-
Are Sixteen Heads Really Better than One?
Attention is a powerful and ubiquitous mechanism for allowing neural mod...
read it
-
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
In the last year, new models and methods for pretraining and transfer le...
read it
-
Constant-Time Machine Translation with Conditional Masked Language Models
Most machine translation systems generate text autoregressively, by sequ...
read it
-
Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation
We consider the problem of making machine translation more robust to cha...
read it
-
pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
Reasoning about implied relationships (e.g. paraphrastic, common sense, ...
read it
-
code2seq: Generating Sequences from Structured Representations of Code
The ability to generate natural language sequences from source code snip...
read it
-
Ultra-Fine Entity Typing
We introduce a new entity typing task: given a sentence with an entity m...
read it
-
LSTMs Exploit Linguistic Attributes of Data
While recurrent neural networks have found success in a variety of natur...
read it
-
Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling
Recent BIO-tagging-based neural semantic role labeling models are very h...
read it
-
Deep RNNs Encode Soft Hierarchical Syntax
We present a set of experiments to demonstrate that deep recurrent neura...
read it
-
Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum
LSTMs were introduced to combat vanishing gradients in simple RNNs by au...
read it
-
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
For natural language understanding (NLU) technology to be maximally usef...
read it
-
A General Path-Based Representation for Predicting Program Properties
Predicting program properties such as names or expression types has a wi...
read it
-
code2vec: Learning Distributed Representations of Code
We present a neural model for representing snippets of code as continuou...
read it
-
Annotation Artifacts in Natural Language Inference Data
Large-scale datasets for natural language inference are created by prese...
read it
-
Simulating Action Dynamics with Neural Process Networks
Understanding procedural language requires anticipating the causal effec...
read it
-
Named Entity Disambiguation for Noisy Text
We address the task of Named Entity Disambiguation (NED) for noisy text....
read it
-
Zero-Shot Relation Extraction via Reading Comprehension
We show that relation extraction can be reduced to answering simple read...
read it
-
Recurrent Additive Networks
We introduce recurrent additive networks (RANs), a new gated RNN which i...
read it
-
A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments
While cross-lingual word embeddings have been studied extensively in rec...
read it
-
word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method
The word2vec software of Tomas Mikolov and colleagues (https://code.goog...
read it