b'Stefan Lee'

research

∙ 09/13/2023

VLSlice: Interactive Vision-and-Language Slice Discovery

Recent work in vision-and-language demonstrates that large-scale pretrai...

0 Eric Slyman, et al. ∙

research

∙ 07/20/2023

Behavioral Analysis of Vision-and-Language Navigation Agents

To be successful, Vision-and-Language Navigation (VLN) agents must be ab...

0 Zijiao Yang, et al. ∙

research

∙ 04/03/2023

Navigating to Objects Specified by Images

Images are a convenient way to specify which particular object instance ...

0 Jacob Krantz, et al. ∙

research

∙ 01/30/2023

Emergence of Maps in the Memories of Blind Navigation Agents

Animal navigation research posits that organisms build and maintain inte...

0 Erik Wijmans, et al. ∙

research

∙ 11/29/2022

Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances

We consider the problem of embodied visual navigation given an image-goa...

0 Jacob Krantz, et al. ∙

research

∙ 04/20/2022

Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments

Recent work in Vision-and-Language Navigation (VLN) has presented two en...

0 Jacob Krantz, et al. ∙

research

∙ 01/19/2022

PROMPT: Learning Dynamic Resource Allocation Policies for Edge-Network Applications

A growing number of service providers are exploring methods to improve s...

0 Drew Penney, et al. ∙

research

∙ 10/27/2021

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

Natural language instructions for visual navigation often use scene desc...

7 Abhinav Moudgil, et al. ∙

research

∙ 10/05/2021

Waypoint Models for Instruction-guided Navigation in Continuous Environments

Little inquiry has explicitly addressed the role of action spaces in lan...

7 Jacob Krantz, et al. ∙

research

∙ 09/10/2021

Improving Multilingual Translation by Representation and Gradient Regularization

Multilingual Neural Machine Translation (NMT) enables one model to serve...

7 Yilin Yang, et al. ∙

research

∙ 06/11/2021

Piecewise-constant Neural ODEs

Neural networks are a popular tool for modeling sequential data but they...

0 Sam Greydanus, et al. ∙

research

∙ 05/01/2021

Deep Convolution for Irregularly Sampled Temporal Point Clouds

We consider the problem of modeling the dynamics of continuous spatial-t...

11 Erich Merrill, et al. ∙

research

∙ 11/16/2020

Where Are You? Localization from Embodied Dialog

We present Where Are You? (WAY), a dataset of 6k dialogs in which two h...

14 Meera Hahn, et al. ∙

research

∙ 11/07/2020

Sim-to-Real Transfer for Vision-and-Language Navigation

We study the challenging problem of releasing a robot in a previously un...

3 Peter Anderson, et al. ∙

research

∙ 10/22/2020

Language-Conditioned Imitation Learning for Robot Manipulation Tasks

Imitation learning is a popular approach for teaching motor skills to ro...

0 Simon Stepputtis, et al. ∙

research

∙ 10/18/2020

DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs

We study an approach to offline reinforcement learning (RL) based on opt...

0 Aayam Shrestha, et al. ∙

research

∙ 10/06/2020

On the Sub-Layer Functionalities of Transformer Decoder

There have been significant efforts to interpret the encoder of Transfor...

8 Yilin Yang, et al. ∙

research

∙ 10/02/2020

Semantic MapNet: Building Allocentric SemanticMaps and Representations from Egocentric Views

We study the task of semantic mapping - specifically, an embodied agent ...

1 Vincent Cartillier, et al. ∙

research

∙ 09/07/2020

Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents

Recent work has presented embodied agents that can navigate to point-goa...

1 Samyak Datta, et al. ∙

research

∙ 07/24/2020

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

Can we develop visually grounded dialog agents that can efficiently adap...

6 Michael Cogswell, et al. ∙

research

∙ 04/30/2020

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

Following a navigation instruction such as 'Walk down the stairs and sto...

10 Arjun Majumdar, et al. ∙

research

∙ 04/06/2020

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

We develop a language-guided navigation task set in a continuous 3D envi...

3 Jacob Krantz, et al. ∙

research

∙ 12/13/2019

Are We Making Real Progress in Simulated Environments? Measuring the Sim2Real Gap in Embodied Visual Navigation

Does progress in simulation translate to progress in robotics? Specifica...

37 Abhishek Kadian, et al. ∙

research

∙ 12/05/2019

12-in-1: Multi-Task Vision and Language Representation Learning

Much of vision-and-language research focuses on a small but diverse set ...

22 Jiasen Lu, et al. ∙

research

∙ 11/14/2019

Question-Conditioned Counterfactual Image Generation for VQA

While Visual Question Answering (VQA) models continue to push the state-...

0 Jingjing Pan, et al. ∙

research

∙ 11/01/2019

Decentralized Distributed PPO: Solving PointGoal Navigation

We present Decentralized Distributed Proximal Policy Optimization (DD-PP...

45 Erik Wijmans, et al. ∙

research

∙ 09/10/2019

Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation

While models for Visual Question Answering (VQA) have steadily improved ...

0 Arijit Ray, et al. ∙

research

∙ 08/06/2019

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

We present ViLBERT (short for Vision-and-Language BERT), a model for lea...

9 Jiasen Lu, et al. ∙

research

∙ 07/03/2019

Chasing Ghosts: Instruction Following as Bayesian State Tracking

A visually-grounded navigation instruction can be interpreted as a seque...

3 Peter Anderson, et al. ∙

research

∙ 04/19/2019

Emergence of Compositional Language with Deep Generational Transmission

Consider a collaborative task that requires communication. Two agents ar...

12 Michael Cogswell, et al. ∙

research

∙ 04/16/2019

Counterfactual Visual Explanations

A counterfactual query is typically of the form 'For situation X, why wa...

24 Yash Goyal, et al. ∙

research

∙ 04/06/2019

Embodied Question Answering in Photorealistic Environments with Point Cloud Perception

To help bridge the gap between internet vision-style problems and the go...

12 Erik Wijmans, et al. ∙

research

∙ 02/21/2019

Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering

We propose a new class of probabilistic neural-symbolic models, that hav...

25 Ramakrishna Vedantam, et al. ∙

research

∙ 02/11/2019

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

Many vision and language models suffer from poor visual grounding - ofte...

26 Ramprasaath R. Selvaraju, et al. ∙

research

∙ 02/10/2019

EvalAI: Towards Better Evaluation Systems for AI Agents

We introduce EvalAI, an open source platform for evaluating and comparin...

24 Deshraj Yadav, et al. ∙

research

∙ 01/25/2019

Audio-Visual Scene-Aware Dialog

We introduce the task of scene-aware dialog. Given a follow-up question ...

48 Huda Alamri, et al. ∙

research

∙ 12/20/2018

nocaps: novel object captioning at scale

Image captioning models have achieved impressive results on datasets con...

46 Harsh Agrawal, et al. ∙

research

∙ 10/26/2018

Neural Modular Control for Embodied Question Answering

We present a modular approach for learning policies for navigation over ...

6 Abhishek Das, et al. ∙

research

∙ 10/08/2018

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Modern Visual Question Answering (VQA) models have been shown to rely he...

0 Sainandan Ramakrishnan, et al. ∙

research

∙ 10/01/2018

Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition

In an open-world setting, it is inevitable that an intelligent agent (e....

20 Jianwei Yang, et al. ∙

research

∙ 08/08/2018

Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance

Individual neurons in convolutional neural networks supervised for image...

16 Ramprasaath R. Selvaraju, et al. ∙

research

∙ 08/01/2018

Graph R-CNN for Scene Graph Generation

We propose a novel scene graph generation model called Graph R-CNN, that...

6 Jianwei Yang, et al. ∙

research

∙ 06/08/2018

Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse Annotations

Many structured prediction problems (particularly in vision and language...

8 Ashwin Kalyan, et al. ∙

research

∙ 11/30/2017

Embodied Question Answering

We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- ...

0 Abhishek Das, et al. ∙

research

∙ 08/17/2017

Evaluating Visual Conversational Agents via Cooperative Human-AI Games

As AI continues to advance, human-AI teams are inevitable. However, prog...

0 Prithvijit Chattopadhyay, et al. ∙

research

∙ 06/26/2017

Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog

A number of recent works have proposed techniques for end-to-end learnin...

0 Satwik Kottur, et al. ∙

research

∙ 05/24/2017

Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning

We develop the first approximate inference algorithm for 1-Best (and M-B...

0 Qing Sun, et al. ∙

research

∙ 05/01/2017

The Promise of Premise: Harnessing Question Premises in Visual Question Answering

In this paper, we make a simple observation that questions about images ...

0 Aroma Mahendru, et al. ∙

research

∙ 03/20/2017

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

We introduce the first goal-driven training for visual question answerin...

0 Abhishek Das, et al. ∙

research

∙ 10/07/2016

Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models

Neural sequence models are widely used to model time-series data in many...

0 Ashwin K Vijayakumar, et al. ∙

Stefan Lee

Featured Co-authors

Sign in with Google

Consider DeepAI Pro