
-
Learning Neural Network Subspaces
Recent observations have advanced our understanding of the neural networ...
read it
-
Layer-Wise Data-Free CNN Compression
We present an efficient method for compressing a trained neural network ...
read it
-
What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions
Learning effective representations of visual data that generalize to a v...
read it
-
A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks
Autonomous agents must learn to collaborate. It is not scalable to devel...
read it
-
In the Wild: From ML Models to Pragmatic ML Systems
Enabling robust intelligence in the wild entails learning systems that o...
read it
-
Supermasks in Superposition
We present the Supermasks in Superposition (SupSup) model, capable of se...
read it
-
Probing Text Models for Common Ground with Visual Representations
Vision, as a central component of human perception, plays a fundamental ...
read it
-
Visual Commonsense Graphs: Reasoning about the Dynamic Context of a Still Image
Even from a single frame of a still image, people can reason about the d...
read it
-
RoboTHOR: An Open Simulation-to-Real Embodied AI Platform
Visual recognition ecosystems (e.g. ImageNet, Pascal, COCO) have undenia...
read it
-
Evaluating Machines by their Real-World Language Use
There is a fundamental gap between how humans understand and use languag...
read it
-
Grounded Situation Recognition
We introduce Grounded Situation Recognition (GSR), a task that requires ...
read it
-
Watching the World Go By: Representation Learning from Unlabeled Videos
Recent single image unsupervised representation learning techniques show...
read it
-
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Fine-tuning pretrained contextual word embedding models to supervised do...
read it
-
Soft Threshold Weight Reparameterization for Learnable Sparsity
Sparsity in Deep Neural Networks (DNNs) is studied extensively with the ...
read it
-
Artificial Agents Learn Flexible Visual Representations by Playing a Hiding Game
The ubiquity of embodied gameplay, observed in a wide variety of animal ...
read it
-
Visual Reaction: Learning to Play Catch with Your Drone
In this paper we address the problem of visual reaction: the task of int...
read it
-
What's Hidden in a Randomly Weighted Neural Network?
Training a neural network is synonymous with learning the values of the ...
read it
-
Conditional Driving from Natural Language Instructions
Widespread adoption of self-driving cars will depend not only on their s...
read it
-
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index
Existing open-domain question answering (QA) models are not suitable for...
read it
-
Butterfly Transform: An Efficient FFT Based Neural Architecture Design
In this paper, we introduce the Butterfly Transform (BFT), a light weigh...
read it
-
Discovering Neural Wirings
The success of neural networks has driven a shift in focus from feature ...
read it
-
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
Visual Question Answering (VQA) in its ideal form lets us study reasonin...
read it
-
Defending Against Neural Fake News
Recent progress in natural language generation has raised dual-use conce...
read it
-
HellaSwag: Can a Machine Really Finish Your Sentence?
Recent work by Zellers et al. (2018) introduced a new task of commonsens...
read it
-
Two Body Problem: Collaborative Visual Task Completion
Collaboration is a necessary skill to perform tasks that are beyond one ...
read it
-
Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph
Visual relationship reasoning is a crucial yet challenging task for unde...
read it
-
What Should I Do Now? Marrying Reinforcement Learning and Symbolic Planning
Long-term planning poses a major difficulty to many reinforcement learni...
read it
-
ELASTIC: Improving CNNs with Instance Specific Scaling Policies
Scale variation has been a challenge from traditional to modern approach...
read it
-
Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning
Learning is an inherently continuous phenomenon. When humans learn a new...
read it
-
From Recognition to Cognition: Visual Commonsense Reasoning
Visual understanding goes well beyond object recognition. With one glanc...
read it
-
Visual Semantic Navigation using Scene Priors
How do humans navigate to target objects in novel scenes? Do we use the ...
read it
-
PhotoShape: Photorealistic Materials for Large-Scale Shape Collections
Existing online 3D shape repositories contain thousands of 3D models but...
read it
-
Label Refinery: Improving ImageNet Classification through Label Progression
Among the three main components (data, labels, and models) of any superv...
read it
-
Actor and Observer: Joint Modeling of First and Third-Person Videos
Several theories in cognitive neuroscience suggest that when people inte...
read it
-
Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos
In Actor and Observer we introduced a dataset linking the first and thir...
read it
-
Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension
The current trend of extractive question answering (QA) heavily relies o...
read it
-
Imagine This! Scripts to Compositions to Videos
Imagining a scene described in natural language with realistic layout an...
read it
-
YOLOv3: An Incremental Improvement
We present some updates to YOLO! We made a bunch of little design change...
read it
-
Transferring Common-Sense Knowledge for Object Detection
We propose the idea of transferring common-sense knowledge from source c...
read it
-
Who Let The Dogs Out? Modeling Dog Behavior From Visual Data
We introduce the task of directly modeling a visually intelligent agent....
read it
-
AI2-THOR: An Interactive 3D Environment for Visual AI
We introduce The House Of inteRactions (THOR), a framework for visual AI...
read it
-
IQA: Visual Question Answering in Interactive Environments
We introduce Interactive Question Answering (IQA), the task of answering...
read it
-
Structured Set Matching Networks for One-Shot Part Labeling
Diagrams often depict complex phenomena and serve as a good test bed for...
read it
-
Neural Speed Reading via Skim-RNN
Inspired by the principles of speed reading, we introduce Skim-RNN, a re...
read it
-
AJILE Movement Prediction: Multimodal Deep Learning for Natural Human Neural Recordings and Video
Developing useful interfaces between brains and machines is a grand chal...
read it
-
Visual Semantic Planning using Deep Successor Representations
A crucial capability of real-world intelligent agents is their ability t...
read it
-
Re3 : Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects
Robust object tracking requires knowledge and understanding of the objec...
read it
-
SeGAN: Segmenting and Generating the Invisible
Objects often occlude each other in scenes; Inferring their appearance b...
read it
-
See the Glass Half Full: Reasoning about Liquid Containers, their Volume and Content
Humans have rich understanding of liquid containers and their contents; ...
read it
-
YOLO9000: Better, Faster, Stronger
We introduce YOLO9000, a state-of-the-art, real-time object detection sy...
read it