
-
Robustness Gym: Unifying the NLP Evaluation Landscape
Despite impressive performance on standard benchmarks, deep neural netwo...
read it
-
FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging
Influence functions approximate the 'influences' of training data-points...
read it
-
I like fish, especially dolphins: Addressing Contradictions in Dialogue Modeling
To quantify how well natural language understanding models can capture c...
read it
-
To what extent do human explanations of model behavior align with actual model behavior?
Given the increasingly prominent role NLP models (will) play in our live...
read it
-
ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments
For embodied agents, navigation is an important ability but not an isola...
read it
-
DORB: Dynamically Optimizing Multiple Rewards with Bandits
Policy gradients-based reinforcement learning has proven to be a promisi...
read it
-
HoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification
We introduce HoVer (HOppy VERification), a dataset for many-hop evidence...
read it
-
ConjNLI: Natural Language Inference Over Conjunctive Sentences
Reasoning about conjuncts in conjunctive sentences is important for a de...
read it
-
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Given a video with aligned dialogue, people can often infer what is more...
read it
-
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
Humans learn language by listening, speaking, writing, reading, and also...
read it
-
ChrEn: Cherokee-English Machine Translation for Endangered Language Revitalization
Cherokee is a highly endangered Native American language spoken by the C...
read it
-
Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?
Data collection for natural language (NL) understanding tasks has increa...
read it
-
What Can We Learn from Collective Human Opinions on Natural Language Inference Data?
Despite the subjective nature of many NLP tasks, most NLU evaluations ha...
read it
-
Evaluating Interactive Summarization: an Expansion-Based Framework
Allowing users to interact with multi-document summarizers is a promisin...
read it
-
SuperPAL: Supervised Proposition ALignment for Multi-Document Summarization and Derivative Sub-Tasks
Multi-document summarization (MDS) is a challenging task, often decompos...
read it
-
Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA
Videos convey rich information. Dynamic spatio-temporal relationships be...
read it
-
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
Generating multi-sentence descriptions for videos is one of the most cha...
read it
-
Towards Robustifying NLI Models Against Lexical Dataset Biases
While deep learning models are making fast progress on the task of Natur...
read it
-
Diagnosing the Environment Bias in Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires an agent to follow natural...
read it
-
Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?
Algorithmic approaches to interpreting machine learning models have prol...
read it
-
The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions
We find that the performance of state-of-the-art models on Natural Langu...
read it
-
Adversarial Augmentation Policy Search for Domain and Cross-Lingual Generalization in Reading Comprehension
Reading comprehension models often overfit to nuances of training datase...
read it
-
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
We introduce a new multimodal retrieval task - TV show Retrieval (TVR), ...
read it
-
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
We present a new multimodal question answering challenge, ManyModalQA, i...
read it
-
Modality-Balanced Models for Visual Dialogue
The Visual Dialog task requires a model to exploit both image and conver...
read it
-
AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses
Many sequence-to-sequence dialogue models tend to generate safe, uninfor...
read it
-
Multi-Source Domain Adaptation for Text Classification via DistanceNet-Bandits
Domain adaptation performance of a learning algorithm on a target domain...
read it
-
Adversarial NLI: A New Benchmark for Natural Language Understanding
We introduce a new large-scale NLI benchmark dataset, collected via an i...
read it
-
Automatically Learning Data Augmentation Policies for Dialogue Tasks
Automatic data augmentation (AutoAugment) (Cubuk et al., 2019) searches ...
read it
-
Revealing the Importance of Semantic Retrieval for Machine Reading at Scale
Machine Reading at Scale (MRS) is a challenging task in which a system i...
read it
-
Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering
Text-based Question Generation (QG) aims at generating natural and relev...
read it
-
Self-Assembling Modular Networks for Interpretable Multi-Hop Reasoning
Multi-hop QA requires a model to connect multiple pieces of evidence sca...
read it
-
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Vision-and-language reasoning requires an understanding of visual concep...
read it
-
Expressing Visual Relationships via Language
Describing images with text is a fundamental problem in vision-language ...
read it
-
Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training, and Model Development for Multi-Hop QA
Multi-hop question answering requires a model to connect multiple pieces...
read it
-
Improving Visual Question Answering by Referring to Generated Paragraph Captions
Paragraph-style image captions describe diverse aspects of an image as o...
read it
-
Continual and Multi-Task Architecture Search
Architecture search is the process of automatically learning the neural ...
read it
-
Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension
Multi-hop reading comprehension requires the model to explore and connec...
read it
-
PaperRobot: Incremental Draft Generation of Scientific Ideas
We present a PaperRobot who performs as an automatic research assistant ...
read it
-
Enabling Robots to Understand Incomplete Natural Language Instructions Using Commonsense Reasoning
Enabling robots to understand instructions provided via spoken natural l...
read it
-
TVQA+: Spatio-Temporal Grounding for Video Question Answering
We present the task of Spatio-Temporal Video Question Answering, which r...
read it
-
Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation
Conducting a manual evaluation is considered an essential part of summar...
read it
-
Multi-Target Embodied Question Answering
Embodied Question Answering (EQA) is a relatively new task where an agen...
read it
-
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
A grand goal in AI is to build a robot that can accurately navigate base...
read it
-
AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning
Multi-task learning (MTL) has achieved success over a wide range of prob...
read it
-
Combining Fact Extraction and Verification with Neural Semantic Matching Networks
The increasing concern with misinformation has stimulated research effor...
read it
-
Analyzing Compositionality-Sensitivity of NLI Models
Success in natural language inference (NLI) should require a model to un...
read it
-
Commonsense for Generative Multi-Hop Question Answering Tasks
Reading comprehension QA tasks have seen a recent surge in popularity, y...
read it
-
SafeCity: Understanding Diverse Forms of Sexual Harassment Personal Stories
With the recent rise of #MeToo, an increasing number of personal stories...
read it
-
Closed-Book Training to Improve Summarization Encoder Memory
A good neural sequence-to-sequence summarization model should have a str...
read it