
-
The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark for Physically Realistic Embodied AI
We introduce a visually-guided and physics-driven task-and-motion planni...
read it
-
Paint by Word
We investigate the problem of zero-shot semantic image painting. Instead...
read it
-
Deep Feedback Inverse Problem Solver
We present an efficient, effective, and generic approach towards solving...
read it
-
Energy-Based Models for Continual Learning
We motivate Energy-Based Models (EBMs) as a promising model class for co...
read it
-
Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration
In this paper, we introduce Watch-And-Help (WAH), a challenge for testin...
read it
-
Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering
Differentiable rendering has paved the way to training neural networks t...
read it
-
LID 2020: The Learning from Imperfect Data Challenge Results
Learning from imperfect data becomes an issue in many industrial applica...
read it
-
Improving Inversion and Generation Diversity in StyleGAN using a Gaussianized Latent Space
Modern Generative Adversarial Networks are capable of creating artificia...
read it
-
Understanding the Role of Individual Units in a Deep Neural Network
Deep neural networks excel at finding hierarchical representations that ...
read it
-
The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement
Existing disentanglement methods for deep generative models rely on hand...
read it
-
Detecting natural disasters, damage, and incidents in the wild
Responding to natural disasters, such as earthquakes, floods, and wildfi...
read it
-
Rewriting a Deep Generative Model
A deep generative model such as a GAN learns to model a rich set of sema...
read it
-
Noisy Agents: Self-supervised Exploration by Predicting Auditory Events
Humans integrate multiple sensory modalities (e.g. visual and audio) to ...
read it
-
Foley Music: Learning to Generate Music from Videos
In this paper, we introduce Foley Music, a system that can synthesize pl...
read it
-
Estimating Generalization under Distribution Shifts via Domain-Invariant Representations
When machine learning models are deployed on a test distribution differe...
read it
-
Causal Discovery in Physical Systems from Videos
Causal discovery is at the core of human cognition. It enables us to rea...
read it
-
Debiased Contrastive Learning
A prominent technique for self-supervised representation learning has be...
read it
-
Diverse Image Generation via Self-Conditioned GANs
We introduce a simple but effective unsupervised method for generating r...
read it
-
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Current methods for learning visually grounded language from videos ofte...
read it
-
Learning to Simulate Dynamic Environments with GameGAN
Simulation is a crucial component of any robotic system. In order to sim...
read it
-
Semantic Photo Manipulation with a Generative Image Prior
Despite the recent success of GANs in synthesizing images conditioned on...
read it
-
Visual Grounding of Learned Physical Models
Humans intuitively recognize objects' physical properties and predict th...
read it
-
Music Gesture for Visual Sound Separation
Recent deep learning approaches have achieved impressive performance on ...
read it
-
Self-supervised Moving Vehicle Tracking with Stereo Sound
Humans are able to localize objects in the environment using both visual...
read it
-
Seeing What a GAN Cannot Generate
Despite the success of Generative Adversarial Networks (GANs), mode coll...
read it
-
Gaze360: Physically Unconstrained Gaze Estimation in the Wild
Understanding where people are looking is an informative social cue. In ...
read it
-
Learning Compositional Koopman Operators for Model-Based Control
Finding an embedding space for a linear approximation of a nonlinear dyn...
read it
-
The Role of Embedding Complexity in Domain-invariant Representations
Unsupervised domain adaptation aims to generalize the hypothesis trained...
read it
-
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
The ability to reason about temporal and causal events from videos lies ...
read it
-
Connecting Touch and Vision via Cross-Modal Prediction
Humans perceive the world using multi-modal sensory inputs such as visio...
read it
-
How to make a pizza: Learning a compositional layer-based GAN model
A food recipe is an ordered set of instructions for preparing a particul...
read it
-
Meta-Sim: Learning to Generate Synthetic Datasets
Training models to high-end performance requires availability of large l...
read it
-
Self-Supervised Audio-Visual Co-Segmentation
Segmenting objects in images and separating sound sources in audio are c...
read it
-
The Sound of Motions
Sounds originate from object motions and vibrations of surrounding air. ...
read it
-
Visualizing and Understanding Generative Adversarial Networks (Extended Abstract)
Generative Adversarial Networks (GANs) have achieved impressive results ...
read it
-
Visual Object Networks: Image Generation with Disentangled 3D Representation
Recent progress in deep generative models has led to tremendous breakthr...
read it
-
Dataset Distillation
Model distillation aims to distill the knowledge of a complex model into...
read it
-
GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
Generative Adversarial Networks (GANs) have recently achieved impressive...
read it
-
Recipe1M: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images
In this paper, we introduce Recipe1M, a new large-scale, structured corp...
read it
-
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
We marry two powerful ideas: deep representation learning for visual rec...
read it
-
Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids
Real-life control tasks involve matter of various substances---rigid or ...
read it
-
Propagation Networks for Model-Based Control Under Partial Observation
There has been an increasing interest in learning dynamics simulators fo...
read it
-
Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks
We introduce a saliency-based distortion layer for convolutional neural ...
read it
-
3D-Aware Scene Manipulation via Inverse Graphics
We aim to obtain an interpretable, expressive and disentangled scene rep...
read it
-
Real-Time Object Pose Estimation with Pose Interpreter Networks
In this work, we introduce pose interpreter networks for 6-DoF object po...
read it
-
VirtualHome: Simulating Household Activities via Programs
In this paper, we are interested in modeling complex activities that occ...
read it
-
Revisiting the Importance of Individual Units in CNNs via Ablation
We revisit the importance of the individual units in Convolutional Neura...
read it
-
The Sound of Pixels
We introduce PixelPlayer, a system that, by leveraging large amounts of ...
read it
-
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input
In this paper, we explore neural network models that learn to associate ...
read it
-
3D Interpreter Networks for Viewer-Centered Wireframe Modeling
Understanding 3D object structure from a single image is an important bu...
read it