
-
Identity-Aware Multi-Sentence Video Description
Standard video and movie description tasks abstract away from person ide...
read it
-
Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires grounding instructions, su...
read it
-
Language-Conditioned Graph Networks for Relational Reasoning
Solving grounded language tasks often requires reasoning about relations...
read it
-
Viewpoint Invariant Change Captioning
The ability to detect that something has changed in an environment is va...
read it
-
Adversarial Inference for Multi-Sentence Video Description
While significant progress has been made in the image captioning task, v...
read it
-
Object Hallucination in Image Captioning
Despite continuously improving performance, contemporary image captionin...
read it
-
Textual Explanations for Self-Driving Vehicles
Deep neural perception and control networks have become key components o...
read it
-
Women also Snowboard: Overcoming Bias in Captioning Models (Extended Abstract)
Most machine learning methods are known to capture and exploit biases of...
read it
-
Speaker-Follower Models for Vision-and-Language Navigation
Navigation guided by natural language instructions presents a challengin...
read it
-
Women also Snowboard: Overcoming Bias in Captioning Models
Most machine learning methods are known to capture and exploit biases of...
read it
-
Video Object Segmentation with Language Referring Expressions
Most state-of-the-art semi-supervised video object segmentation methods ...
read it
-
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
Deep models that are both effective and explainable are desirable in man...
read it
-
Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract)
Deep models are the defacto standard in visual decision problems due to ...
read it
-
Gradient-free Policy Architecture Search and Adaptation
We develop a method for policy architecture search and adaptation via gr...
read it
-
Can you fool AI with adversarial examples on a visual Turing test?
Deep learning has achieved impressive results in many areas of Computer ...
read it
-
Generating Descriptions with Grounded and Co-Referenced People
Learning how to generate descriptions of images or videos received major...
read it
-
A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering
While deep convolutional neural networks frequently approach or exceed h...
read it
-
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Modeling textual or visual information with vector representations train...
read it
-
Movie Description
Audio Description (AD) provides linguistic descriptions of movies and al...
read it
-
Grounding of Textual Phrases in Images by Reconstruction
Grounding (i.e. localizing) arbitrary, free-form textual phrases in visu...
read it
-
The Long-Short Story of Movie Description
Generating descriptions for videos has many applications including assis...
read it
-
Recognizing Fine-Grained and Composite Activities using Hand-Centric Features and Script Data
Activity recognition has shown impressive progress in recent years. Howe...
read it
-
A Dataset for Movie Description
Descriptive video service (DVS) provides linguistic descriptions of movi...
read it