
-
Semantics for Robotic Mapping, Perception and Interaction: A Survey
For robots to navigate and interact more richly with the world around th...
read it
-
Fine-grained Classification via Categorical Memory Networks
Motivated by the desire to exploit patterns shared across classes, we pr...
read it
-
Spatio-attentive Graphs for Human-Object Interaction Detection
We address the problem of detecting human–object interactions in images ...
read it
-
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking
Despite the recent advances in multiple object tracking (MOT), achieved ...
read it
-
A Recurrent Vision-and-Language BERT for Navigation
Accuracy of many visiolinguistic tasks has benefited significantly from ...
read it
-
How to train your conditional GAN: An approach using geometrically structured latent manifolds
Conditional generative modeling typically requires capturing one-to-many...
read it
-
Language and Visual Entity Relationship Graph for Agent Navigation
Vision-and-Language Navigation (VLN) requires an agent to navigate in a ...
read it
-
DORi: Discovering Object Relationship for Moment Localization of a Natural-Language Query in Video
This paper studies the task of temporal moment localization in a long un...
read it
-
Conditional Generative Modeling via Learning the Latent Space
Although deep learning has achieved appealing results on several machine...
read it
-
Solving the Blind Perspective-n-Point Problem End-To-End With Robust Differentiable Geometric Optimization
Blind Perspective-n-Point (PnP) is the problem of estimating the positio...
read it
-
The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose
The availability of a large labeled dataset is a key requirement for app...
read it
-
Bidirectional Self-Normalizing Neural Networks
The problem of exploding and vanishing gradients has been a long-standin...
read it
-
A Multi-modal Approach to Fine-grained Opinion Mining on Video Reviews
Despite the recent advances in opinion mining for written reviews, few w...
read it
-
Inferring Temporal Compositions of Actions Using Probabilistic Automata
This paper presents a framework to recognize temporal compositions of at...
read it
-
ArTIST: Autoregressive Trajectory Inpainting and Scoring for Tracking
One of the core components in online multiple object tracking (MOT) fram...
read it
-
Sub-Instruction Aware Vision-and-Language Navigation
Vision-and-language navigation requires an agent to navigate through a r...
read it
-
DeepFit: 3D Surface Fitting via Neural Network Weighted Least Squares
We propose a surface fitting method for unstructured 3D point clouds. Th...
read it
-
Learning to Structure an Image with Few Colors
Color and structure are the two pillars that construct an image. Usually...
read it
-
Joint Unsupervised Learning of Optical Flow and Egomotion with Bi-Level Optimization
We address the problem of joint optical flow and camera motion estimatio...
read it
-
Sampling Good Latent Variables via CPP-VAEs: VAEs with Condition Posterior as Prior
In practice, conditional variational autoencoders (CVAEs) perform condit...
read it
-
Spectral-GANs for High-Resolution 3D Point-cloud Generation
Point-clouds are a popular choice for vision and graphics tasks due to t...
read it
-
Representation Learning on Unit Ball with 3D Roto-Translational Equivariance
Convolution is an integral operation that defines how the shape of one f...
read it
-
Deep Declarative Networks: A New Hope
We introduce a new class of end-to-end learnable models wherein data pro...
read it
-
Blended Convolution and Synthesis for Efficient Discrimination of 3D Shapes
Existing networks directly learn feature representations on 3D point clo...
read it
-
Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention
This paper studies the problem of temporal moment localization in a long...
read it
-
Learning Variations in Human Motion via Mix-and-Match Perturbation
Human motion prediction is a stochastic process: Given an observed seque...
read it
-
A Signal Propagation Perspective for Pruning Neural Networks at Initialization
Network pruning is a promising avenue for compressing deep neural networ...
read it
-
Learning to Find Common Objects Across Image Collections
We address the problem of finding a set of images containing a common, b...
read it
-
The Alignment of the Spheres: Globally-Optimal Spherical Mixture Alignment for Camera Pose Estimation
Determining the position and orientation of a calibrated camera from a s...
read it
-
Partially-Supervised Image Captioning
Image captioning models are becoming increasingly successful at describi...
read it
-
Non-Linear Temporal Subspace Representations for Activity Recognition
Representations that can compactly and effectively capture the temporal ...
read it
-
Video Representation Learning Using Discriminative Pooling
Popular deep models for action recognition in videos generate independen...
read it
-
Neural Algebra of Classifiers
The world is fundamentally compositional, so it is natural to think of v...
read it
-
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
A robot that can carry out a natural-language instruction has been a dre...
read it
-
Human Action Forecasting by Learning Task Grammars
For effective human-robot interaction, it is important that a robotic as...
read it
-
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Top-down visual attention mechanisms have been used extensively in image...
read it
-
Human Pose Forecasting via Deep Markov Models
Human pose forecasting is an important problem in computer vision with a...
read it
-
Incorporating Network Built-in Priors in Weakly-supervised Semantic Segmentation
Pixel-level annotations are expensive and time consuming to obtain. Henc...
read it
-
Discriminatively Learned Hierarchical Rank Pooling Networks
In this work, we present novel temporal encoding methods for action and ...
read it
-
DeepPermNet: Visual Permutation Learning
We present a principled approach to uncover the structure of visual data...
read it
-
Generalized Rank Pooling for Activity Recognition
Most popular deep models for action recognition split video sequences in...
read it
-
Action Representation Using Classifier Decision Boundaries
Most popular deep learning based models for action recognition are desig...
read it
-
Higher-order Pooling of CNN Features via Kernel Linearization for Action Recognition
Most successful deep learning algorithms for action recognition extend m...
read it
-
Guided Open Vocabulary Image Captioning with Constrained Beam Search
Existing image captioning models do not generalize well to out-of-domain...
read it
-
Unsupervised Human Action Detection by Action Matching
We propose a new task of unsupervised action detection by action matchin...
read it
-
Self-Supervised Video Representation Learning With Odd-One-Out Networks
We propose a new self-supervised CNN pre-training technique based on a n...
read it
-
Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation
Pixel-level annotations are expensive and time consuming to obtain. Henc...
read it
-
SPICE: Semantic Propositional Image Caption Evaluation
There is considerable interest in the task of automatically generating i...
read it
-
Multi-Target Tracking with Time-Varying Clutter Rate and Detection Profile: Application to Time-lapse Cell Microscopy Sequences
Quantitative analysis of the dynamics of tiny cellular and sub-cellular ...
read it
-
Deep CNN Ensemble with Data Augmentation for Object Detection
We report on the methods used in our recent DeepEnsembleCoco submission ...
read it