
-
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Deep learning frameworks have often focused on either usability or speed...
read it
-
Feel The Music: Automatically Generating A Dance For An Input Song
We present a general computational approach that enables a machine to ge...
read it
-
Learning Causal State Representations of Partially Observable Environments
Intelligent agents can cope with sensory-rich environments by learning t...
read it
-
A Survey of Reinforcement Learning Informed by Natural Language
To be successful in real-world tasks, Reinforcement Learning (RL) needs ...
read it
-
Streamlining Tensor and Network Pruning in PyTorch
In order to contrast the explosion in size of state-of-the-art machine l...
read it
-
nocaps: novel object captioning at scale
Image captioning models have achieved impressive results on datasets con...
read it
-
Exploratory Combinatorial Optimization with Reinforcement Learning
Many real-world problems can be reduced to combinatorial optimization on...
read it
-
Learning with Random Learning Rates
Hyperparameter tuning is a bothersome step in the training of deep learn...
read it
-
Holistic Large Scale Video Understanding
Action recognition has been advanced in recent years by benchmarks with ...
read it
-
Training with Quantization Noise for Extreme Model Compression
We tackle the problem of producing compact models, maximizing their accu...
read it
-
Generating Interactive Worlds with Text
Procedurally generating cohesive and interesting game environments is ch...
read it
-
Monte-Carlo Tree Search for Efficient Visually Guided Rearrangement Planning
In this paper, we address the problem of visually guided rearrangement p...
read it
-
Bayesian Relational Memory for Semantic Visual Navigation
We introduce a new memory architecture, Bayesian Relational Memory (BRM)...
read it
-
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
The visual and audio modalities are highly correlated yet they contain d...
read it
-
DONeRF: Towards Real-Time Rendering of Neural Radiance Fields using Depth Oracle Networks
The recent research explosion around implicit neural representations, su...
read it
-
Convolutional Networks with Dense Connectivity
Recent work has shown that convolutional networks can be substantially d...
read it
-
Single-Network Whole-Body Pose Estimation
We present the first single-network approach for 2D whole-body pose esti...
read it
-
Visual Transformers: Token-based Image Representation and Processing for Computer Vision
Computer vision has achieved great success using standardized image repr...
read it
-
Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Many visual scenes contain text that carries crucial information, and it...
read it
-
Learning with AMIGo: Adversarially Motivated Intrinsic Goals
A key challenge for reinforcement learning (RL) consists of learning in ...
read it
-
Emergent Linguistic Phenomena in Multi-Agent Communication Games
In this work, we propose a computational framework in which agents equip...
read it
-
Supervised Multimodal Bitransformers for Classifying Images and Text
Self-supervised bidirectional transformer models such as BERT have led t...
read it
-
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions
We present our view of what is necessary to build an engaging open-domai...
read it
-
Live Face De-Identification in Video
We propose a method for face de-identification that enables fully automa...
read it
-
Unsupervised Discovery of Decision States for Transfer in Reinforcement Learning
We present a hierarchical reinforcement learning (HRL) or options framew...
read it
-
Linformer: Self-Attention with Linear Complexity
Large transformer models have shown extraordinary success in achieving s...
read it
-
Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance
Individual neurons in convolutional neural networks supervised for image...
read it
-
Re-Examining Linear Embeddings for High-Dimensional Bayesian Optimization
Bayesian optimization (BO) is a popular approach to optimize expensive-t...
read it
-
TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors
Simulators perform an important role in prototyping, debugging and bench...
read it
-
Unbiased Teacher for Semi-Supervised Object Detection
Semi-supervised learning, i.e., training networks with both labeled and ...
read it
-
Learning to Follow Language Instructions with Adversarial Reward Induction
Recent work has shown that deep reinforcement-learning agents can learn ...
read it
-
Talk the Walk: Navigating New York City through Grounded Dialogue
We introduce "Talk The Walk", the first large-scale dialogue dataset gro...
read it
-
2.5D Visual Sound
Binaural audio provides a listener with 3D sound sensation, allowing a r...
read it
-
Cycle-Consistency for Robust Visual Question Answering
Despite significant progress in Visual Question Answering over the years...
read it
-
Analysing Mathematical Reasoning Abilities of Neural Models
Mathematical reasoning---a core ability within human intelligence---pres...
read it
-
Data-efficient Learning of Morphology and Controller for a Microrobot
Robot design is often a slow and difficult process requiring the iterati...
read it
-
Data-efficient Co-Adaptation of Morphology and Behaviour with Deep Reinforcement Learning
Humans and animals are capable of quickly learning new behaviours to sol...
read it
-
ContactPose: A Dataset of Grasps with Object Contact and Hand Pose
Grasping is natural for humans. However, it involves complex hand config...
read it
-
Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets
Ideally Open-Domain Question Answering models should exhibit a number of...
read it
-
DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare
We present DenseRaC, a novel end-to-end framework for jointly estimating...
read it
-
EGO-TOPO: Environment Affordances from Egocentric Video
First-person video naturally brings the use of a physical environment to...
read it
-
An Imitation Game for Learning Semantic Parsers from User Interaction
Despite the widely successful applications, bootstrapping and fine-tunin...
read it
-
Unsupervised Video Object Segmentation with Distractor-Aware Online Adaptation
Unsupervised video object segmentation is a crucial application in video...
read it
-
Embodied Question Answering in Photorealistic Environments with Point Cloud Perception
To help bridge the gap between internet vision-style problems and the go...
read it
-
Generalized Inner Loop Meta-Learning
Many (but not all) approaches self-qualifying as "meta-learning" in deep...
read it
-
TIES: Temporal Interaction Embeddings For Enhancing Social Media Integrity At Facebook
Since its inception, Facebook has become an integral part of the online ...
read it
-
Video Object Grounding using Semantic Roles in Language Description
We explore the task of Video Object Grounding (VOG), which grounds objec...
read it
-
3D Photography using Context-aware Layered Depth Inpainting
We propose a method for converting a single RGB-D input image into a 3D ...
read it
-
Overcoming Statistical Shortcuts for Open-ended Visual Counting
Machine learning models tend to over-rely on statistical shortcuts. Thes...
read it
-
DIGIT: A Novel Design for a Low-Cost Compact High-Resolution Tactile Sensor with Application to In-Hand Manipulation
Despite decades of research, general purpose in-hand manipulation remain...
read it