
-
BASE Layers: Simplifying Training of Large, Sparse Models
We introduce a new balanced assignment of experts (BASE) layer for large...
read it
-
Multilingual Autoregressive Entity Linking
We present mGENRE, a sequence-to-sequence system for the Multilingual En...
read it
-
FEWS: Large-Scale, Low-Shot Word Sense Disambiguation with the Dictionary
Current models for Word Sense Disambiguation (WSD) struggle to disambigu...
read it
-
Muppet: Massive Multi-task Representations with Pre-Finetuning
We propose pre-finetuning, an additional large-scale learning stage betw...
read it
-
Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment
Bilingual lexicons map words in one language to their translations in an...
read it
-
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Although pretrained language models can be fine-tuned to produce state-o...
read it
-
Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles
Many datasets have been shown to contain incidental correlations created...
read it
-
Detecting Hallucinated Content in Conditional Neural Sequence Generation
Neural sequence models can generate highly fluent sentences but recent s...
read it
-
Low-Resource Domain Adaptation for Compositional Task-Oriented Semantic Parsing
Task-oriented semantic parsing is a critical component of virtual assist...
read it
-
Nearest Neighbor Machine Translation
We introduce k-nearest-neighbor machine translation (kNN-MT), which pred...
read it
-
Grounded Adaptation for Zero-shot Executable Semantic Parsing
We propose Grounded Adaptation for Zero-shot Executable Semantic Parsing...
read it
-
Better Fine-Tuning by Reducing Representational Collapse
Although widely adopted, existing approaches for fine-tuning pre-trained...
read it
-
DeLighT: Very Deep and Light-weight Transformer
We introduce a very deep and light-weight transformer, DeLighT, that del...
read it
-
Pre-training via Paraphrasing
We introduce MARGE, a pre-trained sequence-to-sequence model learned wit...
read it
-
Moving Down the Long Tail of Word Sense Disambiguation with Gloss-Informed Biencoders
A major obstacle in Word Sense Disambiguation (WSD) is that word senses ...
read it
-
An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction
Decisions of complex language understanding models can be rationalized b...
read it
-
Active Learning for Coreference Resolution using Discrete Annotation
We improve upon pairwise annotation for active learning in coreference r...
read it
-
AmbigQA: Answering Ambiguous Open-domain Questions
Ambiguity is inherent to open-domain question answering; especially when...
read it
-
Aligned Cross Entropy for Non-Autoregressive Machine Translation
Non-autoregressive machine translation models significantly speed up dec...
read it
-
Semi-Autoregressive Training Improves Mask-Predict Decoding
The recently proposed mask-predict decoding algorithm has narrowed the p...
read it
-
Multilingual Denoising Pre-training for Neural Machine Translation
This paper demonstrates that multilingual denoising pre-training produce...
read it
-
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
We present ALFRED (Action Learning From Realistic Environments and Direc...
read it
-
Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering
This paper presents a general approach for open-domain question answerin...
read it
-
Zero-shot Entity Linking with Dense Entity Retrieval
We consider the zero-shot entity-linking challenge where each entity is ...
read it
-
Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models
Inspired by modular software design principles of independence, intercha...
read it
-
Crowdsourcing a High-Quality Gold Standard for QA-SRL
Question-answer driven Semantic Role Labeling (QA-SRL) has been proposed...
read it
-
Unsupervised Cross-lingual Representation Learning at Scale
This paper shows that pretraining multilingual language models at scale ...
read it
-
Emerging Cross-lingual Structure in Pretrained Language Models
We study the problem of multilingual masked language modeling, i.e. the ...
read it
-
Generalization through Memorization: Nearest Neighbor Language Models
We introduce kNN-LMs, which extend a pre-trained neural language model (...
read it
-
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
We present BART, a denoising autoencoder for pretraining sequence-to-seq...
read it
-
JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation
Interactive programming with interleaved code snippet cells and natural ...
read it
-
A Discrete Hard EM Approach for Weakly Supervised Question Answering
Many question answering (QA) tasks only provide weak supervision for how...
read it
-
Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases
State-of-the-art models often make use of superficial patterns in the da...
read it
-
BERT for Coreference Resolution: Baselines and Analysis
We apply BERT to coreference resolution, achieving strong improvements o...
read it
-
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Language model pretraining has led to significant performance gains but ...
read it
-
SpanBERT: Improving Pre-training by Representing and Predicting Spans
We present SpanBERT, a pre-training method that is designed to better re...
read it
-
Vision-and-Dialog Navigation
Robots navigating in human environments should use language to ask for a...
read it
-
Sparse Networks from Scratch: Faster Training without Losing Performance
We demonstrate the possibility of what we call sparse learning: accelera...
read it
-
E3: Entailment-driven Extracting and Editing for Conversational Machine Reading
Conversational machine reading systems help users answer high-level ques...
read it
-
Multi-hop Reading Comprehension through Question Decomposition and Rescoring
Multi-hop Reading Comprehension (RC) requires reasoning and aggregation ...
read it
-
Compositional Questions Do Not Necessitate Multi-hop Reasoning
Multi-hop reading comprehension (RC) questions are challenging because t...
read it
-
Better Character Language Modeling Through Morphology
We incorporate morphological supervision into character language models ...
read it
-
Evaluating Gender Bias in Machine Translation
We present the first challenge set and evaluation protocol for the analy...
read it
-
Transformers with convolutional context for ASR
The recent success of transformer networks for neural machine translatio...
read it
-
Constant-Time Machine Translation with Conditional Masked Language Models
Most machine translation systems generate text autoregressively, by sequ...
read it
-
Learning Programmatic Idioms for Scalable Semantic Parsing
Programmers typically organize executable source code using high-level c...
read it
-
Cloze-driven Pretraining of Self-attention Networks
We present a new approach for pretraining a bi-directional transformer m...
read it
-
Improving Semantic Parsing for Task Oriented Dialog
Semantic parsing using hierarchical representations has recently been pr...
read it
-
The Referential Reader: A Recurrent Entity Network for Anaphora Resolution
We present a new architecture for storing and accessing entity mentions ...
read it
-
pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
Reasoning about implied relationships (e.g. paraphrastic, common sense, ...
read it