
-
An Empirical Study of Training Self-Supervised Vision Transformers
This paper does not describe a novel method. Instead, it studies a strai...
read it
-
Understanding self-supervised Learning Dynamics without Contrastive Pairs
Contrastive approaches to self-supervised learning (SSL) learn represent...
read it
-
Exploring Simple Siamese Representation Learning
Siamese networks have become a common structure in various recent models...
read it
-
Understanding Self-supervised Learning with Dual Deep Networks
We propose a novel theoretical framework to understand self-supervised l...
read it
-
Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation
We introduce a learning-based approach for room navigation using semanti...
read it
-
Overcoming Statistical Shortcuts for Open-ended Visual Counting
Machine learning models tend to over-rely on statistical shortcuts. Thes...
read it
-
Revisiting Modulated Convolutions for Visual Counting and Beyond
This paper targets at visual counting, where the setup is to estimate th...
read it
-
Improved Baselines with Momentum Contrastive Learning
Contrastive unsupervised learning has recently shown encouraging progres...
read it
-
ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes
3D object detection has seen quick progress thanks to advances in deep l...
read it
-
In Defense of Grid Features for Visual Question Answering
Popularized as 'bottom-up' attention, bounding box (or region) based vis...
read it
-
Towards VQA Models that can Read
Studies have shown that a dominant class of questions asked by visually ...
read it
-
Prior-aware Neural Network for Partially-Supervised Multi-Organ Segmentation
Accurate multi-organ abdominal CT segmentation is essential to many clin...
read it
-
Multi-Target Embodied Question Answering
Embodied Question Answering (EQA) is a relatively new task where an agen...
read it
-
Embodied Visual Recognition
Passive visual systems typically fail to recognize objects in the amodal...
read it
-
TensorMask: A Foundation for Dense Object Segmentation
Sliding-window object detectors that generate bounding-box object predic...
read it
-
Cycle-Consistency for Robust Visual Question Answering
Despite significant progress in Visual Question Answering over the years...
read it
-
Relay-Assisted and QoS Aware Scheduling to Overcome Blockage in mmWave Backhaul Networks
In the scenario where small cells are densely deployed, the millimeter w...
read it
-
nocaps: novel object captioning at scale
Image captioning models have achieved impressive results on datasets con...
read it
-
Grounded Video Description
Video description is one of the most challenging problems in vision and ...
read it
-
Pythia v0.1: the Winning Entry to the VQA Challenge 2018
This document describes Pythia v0.1, the winning entry from Facebook AI ...
read it
-
Iterative Visual Reasoning Beyond Convolutions
We present a novel framework for iterative visual reasoning. Our framewo...
read it
-
Device-to-Device Communications Enabled Energy Efficient Multicast Scheduling in mmWave Small Cells
To keep pace with the rapid growth of mobile traffic demands, dense depl...
read it
-
PixelNet: Representation of the pixels, by the pixels, and for the pixels
We explore design principles for general pixel-level prediction problems...
read it
-
An Implementation of Faster RCNN with Study for Region Sampling
We adapted the join-training scheme of Faster RCNN framework from Caffe ...
read it
-
PixelNet: Towards a General Pixel-level Architecture
We explore architectures for general pixel-level prediction problems, fr...
read it
-
Learning Visual Storylines with Skipping Recurrent Neural Networks
What does a typical visit to Paris look like? Do people first take photo...
read it
-
Webly Supervised Learning of Convolutional Networks
We present an approach to utilize large amounts of web data for learning...
read it
-
Microsoft COCO Captions: Data Collection and Evaluation Server
In this paper we describe the Microsoft COCO Caption dataset and evaluat...
read it
-
Learning a Recurrent Visual Representation for Image Caption Generation
In this paper we explore the bi-directional mapping between images and t...
read it