
-
Memory-Augmented Reinforcement Learning for Image-Goal Navigation
In this work, we address the problem of image-goal navigation in the con...
read it
-
How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget
PointGoal navigation has seen significant recent interest and progress, ...
read it
-
Bi-directional Domain Adaptation for Sim2Real Transfer of Embodied Navigation Agents
Deep reinforcement learning models are notoriously data hungry, yet real...
read it
-
Learning Navigation Skills for Legged Robots with Learned Robot Embeddings
Navigation policies are commonly learned on idealized cylinder agents in...
read it
-
Where Are You? Localization from Embodied Dialog
We present Where Are You? (WAY), a dataset of 6k dialogs in which two h...
read it
-
Sim-to-Real Transfer for Vision-and-Language Navigation
We study the challenging problem of releasing a robot in a previously un...
read it
-
Rearrangement: A Challenge for Embodied AI
We describe a framework for research and evaluation in Embodied AI. Our ...
read it
-
SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency
Recent research in Visual Question Answering (VQA) has revealed state-of...
read it
-
Contrast and Classify: Alternate Training for Robust VQA
Recent Visual Question Answering (VQA) models have shown impressive perf...
read it
-
Semantic MapNet: Building Allocentric SemanticMaps and Representations from Egocentric Views
We study the task of semantic mapping - specifically, an embodied agent ...
read it
-
Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents
Recent work has presented embodied agents that can navigate to point-goa...
read it
-
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
Can we develop visually grounded dialog agents that can efficiently adap...
read it
-
Spatially Aware Multimodal Transformers for TextVQA
Textual cues are essential for everyday tasks like buying groceries and ...
read it
-
Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation
We introduce a learning-based approach for room navigation using semanti...
read it
-
Auxiliary Tasks Speed Up Learning PointGoal Navigation
PointGoal Navigation is an embodied task that requires agents to navigat...
read it
-
ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects
We revisit the problem of Object-Goal Navigation (ObjectNav). In its sim...
read it
-
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
Following a navigation instruction such as 'Walk down the stairs and sto...
read it
-
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments
We develop a language-guided navigation task set in a continuous 3D envi...
read it
-
Analyzing Visual Representations in Embodied Navigation Tasks
Recent advances in deep reinforcement learning require a large amount of...
read it
-
Are We Making Real Progress in Simulated Environments? Measuring the Sim2Real Gap in Embodied Visual Navigation
Does progress in simulation translate to progress in robotics? Specifica...
read it
-
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
Prior work in visual dialog has focused on training deep neural models o...
read it
-
Decentralized Distributed PPO: Solving PointGoal Navigation
We present Decentralized Distributed Proximal Policy Optimization (DD-PP...
read it
-
Improving Generative Visual Dialog by Answering Diverse Questions
Prior work on training generative Visual Dialog models with reinforcemen...
read it
-
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
Diverse and accurate vision+language modeling is an important goal to re...
read it
-
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
We present ViLBERT (short for Vision-and-Language BERT), a model for lea...
read it
-
Unsupervised Discovery of Decision States for Transfer in Reinforcement Learning
We present a hierarchical reinforcement learning (HRL) or options framew...
read it
-
Chasing Ghosts: Instruction Following as Bayesian State Tracking
A visually-grounded navigation instruction can be interpreted as a seque...
read it
-
The Replica Dataset: A Digital Replica of Indoor Spaces
We introduce Replica, a dataset of 18 highly photo-realistic 3D indoor s...
read it
-
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation
We propose SplitNet, a method for decoupling visual perception and polic...
read it
-
Emergence of Compositional Language with Deep Generational Transmission
Consider a collaborative task that requires communication. Two agents ar...
read it
-
Towards VQA Models that can Read
Studies have shown that a dominant class of questions asked by visually ...
read it
-
Counterfactual Visual Explanations
A counterfactual query is typically of the form 'For situation X, why wa...
read it
-
Multi-Target Embodied Question Answering
Embodied Question Answering (EQA) is a relatively new task where an agen...
read it
-
Embodied Visual Recognition
Passive visual systems typically fail to recognize objects in the amodal...
read it
-
Embodied Question Answering in Photorealistic Environments with Point Cloud Perception
To help bridge the gap between internet vision-style problems and the go...
read it
-
Habitat: A Platform for Embodied AI Research
We present Habitat, a new platform for research in embodied artificial i...
read it
-
CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
Visual Dialog is a multimodal task of answering a sequence of questions ...
read it
-
Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future
In model-based reinforcement learning, the agent interleaves between mod...
read it
-
Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering
We propose a new class of probabilistic neural-symbolic models, that hav...
read it
-
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
Many vision and language models suffer from poor visual grounding - ofte...
read it
-
EvalAI: Towards Better Evaluation Systems for AI Agents
We introduce EvalAI, an open source platform for evaluating and comparin...
read it
-
Embodied Multimodal Multitask Learning
Recent efforts on training visual navigation agents conditioned on langu...
read it
-
Audio-Visual Scene-Aware Dialog
We introduce the task of scene-aware dialog. Given a follow-up question ...
read it
-
Response to "Visual Dialogue without Vision or Dialogue" (Massiceti et al., 2018)
In a recent workshop paper, Massiceti et al. presented a baseline model ...
read it
-
Dialog System Technology Challenge 7
This paper introduces the Seventh Dialog System Technology Challenges (D...
read it
-
nocaps: novel object captioning at scale
Image captioning models have achieved impressive results on datasets con...
read it
-
Fabrik: An Online Collaborative Neural Network Editor
We present Fabrik, an online neural network editor that provides tools t...
read it
-
TarMAC: Targeted Multi-Agent Communication
We explore a collaborative multi-agent reinforcement learning setting wh...
read it
-
Neural Modular Control for Embodied Question Answering
We present a modular approach for learning policies for navigation over ...
read it
-
Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition
In an open-world setting, it is inevitable that an intelligent agent (e....
read it