
-
An Improved Attention for Visual Question Answering
We consider the problem of Visual Question Answering (VQA). Given an ima...
read it
-
Person-in-Context Synthesiswith Compositional Structural Space
Despite significant progress, controlled generation of complex images wi...
read it
-
Attribute-guided image generation from layout
Recent approaches have achieved great success in image generation from s...
read it
-
Unsupervised Video Decomposition using Spatio-temporal Iterative Inference
Unsupervised multi-object scene decomposition is a fast-emerging problem...
read it
-
Discriminative Feature Alignment: Improving Transferability of Unsupervised Domain Adaptation by Gaussian-guided Latent Alignment
In this study, we focus on the unsupervised domain adaptation problem wh...
read it
-
Discriminative Feature Alignment: ImprovingTransferability of Unsupervised DomainAdaptation by Gaussian-guided LatentAlignment
In this study, we focus on the unsupervised domain adaptation problem wh...
read it
-
Weakly-supervised Any-shot Object Detection
Methods for object detection and segmentation rely on large scale instan...
read it
-
Consistent Multiple Sequence Decoding
Sequence decoding is one of the core components of most visual-lingual m...
read it
-
Variational Hyper RNN for Sequence Modeling
In this work, we propose a novel probabilistic sequence model that excel...
read it
-
Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction
Reconstruction of a 3D shape from a single 2D image is a classical compu...
read it
-
Improved Few-Shot Visual Classification
Few-shot learning is a fundamental task in computer vision that carries ...
read it
-
Zero-Shot Generation of Human-Object Interaction Videos
Generation of videos of complex scenes is an important open problem in c...
read it
-
OptiBox: Breaking the Limits of Proposals for Visual Grounding
The problem of language grounding has attracted much attention in recent...
read it
-
DwNet: Dense warp-based network for pose-guided human video generation
Generation of realistic high-resolution videos of human subjects is a ch...
read it
-
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Multi-modal learning, particularly among imaging and linguistic modaliti...
read it
-
LayoutVAE: Stochastic Scene Layout Generation from a Label Set
Recently there is an increasing interest in scene generation within the ...
read it
-
AttentionRNN: A Structured Spatial Attention Mechanism
Visual attention mechanisms have proven to be integrally important const...
read it
-
A Variational Auto-Encoder Model for Stochastic Point Processes
We propose a novel probabilistic generative model for action sequences. ...
read it
-
Neural Sequential Phrase Grounding (SeqGROUND)
We propose an end-to-end approach for phrase grounding in images. Unlike...
read it
-
Walking on Thin Air: Environment-Free Physics-based Markerless Motion Capture
We propose a generative approach to physics-based motion capture. Unlike...
read it
-
Towards Traversing the Continuous Spectrum of Image Retrieval
Image retrieval is one of the most popular tasks in computer vision. How...
read it
-
Image Generation from Layout
Despite significant recent progress on generative models, controlled gen...
read it
-
Middle-Out Decoding
Despite being virtually ubiquitous, sequence-to-sequence models are chal...
read it
-
Learning to Separate Domains in Generalized Zero-Shot and Open Set Learning: a probabilistic perspective
This paper studies the problem of domain division problem which aims to ...
read it
-
Where and When to Look? Spatio-temporal Attention for Action Recognition in Videos
Inspired by the observation that humans are able to process videos effic...
read it
-
Semantic Feature Augmentation in Few-shot Learning
A fundamental problem with few-shot learning is the scarcity of data in ...
read it
-
Text-to-Clip Video Retrieval with Early Fusion and Re-Captioning
We propose a novel method capable of retrieving clips from untrimmed vid...
read it
-
Modular Generative Adversarial Networks
Existing methods for multi-domain image-to-image translation (or generat...
read it
-
Probabilistic Video Generation using Holistic Attribute Control
Videos express highly structured spatio-temporal patterns of visual data...
read it
-
Joint Event Detection and Description in Continuous Video Streams
As a fine-grained video understanding task, dense video captioning invol...
read it
-
LSTM stack-based Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
The alignment of heterogeneous sequential data (video to text) is an imp...
read it
-
Recent Advances in Zero-shot Recognition
With the recent renaissance of deep convolution neural networks, encoura...
read it
-
Visual Reference Resolution using Attention Memory for Visual Dialog
Visual dialog is a task of answering a series of inter-dependent questio...
read it
-
Predicting Personality from Book Preferences with User-Generated Content Labels
Psychological studies have shown that personality traits are associated ...
read it
-
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
We propose a weakly-supervised approach that takes image-sentence pairs ...
read it
-
Weakly-Supervised Spatial Context Networks
We explore the power of spatial context as a self-supervisory signal for...
read it
-
Semi-Latent GAN: Learning to generate and modify facial images from attributes
Generating and manipulating human facial images using high-level attribu...
read it
-
Learning to Generate Posters of Scientific Papers by Probabilistic Graphical Models
Researchers often summarize their work in the form of scientific posters...
read it
-
Learning Language-Visual Embedding for Movie Understanding with Natural-Language
Learning a joint language-visual embedding has a number of very appealin...
read it
-
Semi-supervised Vocabulary-informed Learning
Despite significant progress in object categorization, in recent years, ...
read it
-
Learning to Generate Posters of Scientific Papers
Researchers often summarize their work in the form of posters. Posters p...
read it
-
Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web
Recently, attempts have been made to collect millions of videos to train...
read it
-
Robust Classification by Pre-conditioned LASSO and Transductive Diffusion Component Analysis
Modern machine learning-based recognition approaches require large-scale...
read it
-
Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization
Emotional content is a key element in user-generated videos. However, it...
read it
-
Learning from Synthetic Data Using a Stacked Multichannel Autoencoder
Learning from synthetic data has many important and practical applicatio...
read it
-
Learning Classifiers from Synthetic Data Using a Multichannel Autoencoder
We propose a method for using synthetic data to help learning classifier...
read it
-
Hierarchical Maximum-Margin Clustering
We present a hierarchical maximum-margin clustering method for unsupervi...
read it
-
A Unified Semantic Embedding: Relating Taxonomies and Attributes
We propose a method that learns a discriminative yet semantic space for ...
read it
-
Dependence Maximizing Temporal Alignment via Squared-Loss Mutual Information
The goal of temporal alignment is to establish time correspondence betwe...
read it
-
High-Dimensional Feature Selection by Feature-Wise Non-Linear Lasso
The goal of supervised feature selection is to find a subset of input fe...
read it