
-
SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering
Medical visual question answering (Med-VQA) has tremendous potential in ...
read it
-
Similarity Reasoning and Filtration for Image-Text Matching
Image-text matching plays a critical role in bridging the vision and lan...
read it
-
Unpaired Image Enhancement with Quality-Attention Generative Adversarial Network
In this work, we aim to learn an unpaired image enhancement model, which...
read it
-
Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network
Improving the aesthetic quality of images is challenging and eager for t...
read it
-
Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis
We tackle human image synthesis, including human motion imitation, appea...
read it
-
Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning
We propose a parsimonious quantile regression framework to learn the dyn...
read it
-
Intrinsic Relationship Reasoning for Small Object Detection
The small objects in images and videos are usually not independent indiv...
read it
-
Consensus-Aware Visual-Semantic Embedding for Image-Text Matching
Image-text matching plays a central role in bridging vision and language...
read it
-
Deblurring by Realistic Blurring
Existing deep learning methods for image deblurring typically train mode...
read it
-
STH: Spatio-Temporal Hybrid Convolution for Efficient Action Recognition
Effective and Efficient spatio-temporal modeling is essential for action...
read it
-
Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos
The task of temporally grounding textual queries in videos is to localiz...
read it
-
Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension
Referring expression comprehension (REF) aims at identifying a particula...
read it
-
Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video
In this paper, we study the problem of weakly-supervised temporal ground...
read it
-
Fine-grained Image-to-Image Transformation towards Visual Recognition
Existing image-to-image transformation approaches primarily focus on syn...
read it
-
Coupled Network for Robust Pedestrian Detection with Gated Multi-Layer Feature Extraction and Deformable Occlusion Handling
Pedestrian detection methods have been significantly improved with the d...
read it
-
LaFIn: Generative Landmark Guided Face Inpainting
It is challenging to inpaint face images in the wild, due to the large v...
read it
-
EDIT: Exemplar-Domain Aware Image-to-Image Translation
Image-to-image translation is to convert an image of the certain style t...
read it
-
Exploiting Local and Global Structure for Point Cloud Semantic Segmentation with Contextual Point Representations
In this paper, we propose one novel model for point cloud semantic segme...
read it
-
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos
Temporal sentence grounding in videos aims to detect and localize one ta...
read it
-
Context-Gated Convolution
As the basic building block of Convolutional Neural Networks (CNNs), the...
read it
-
Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis
We tackle the human motion imitation, appearance transfer, and novel vie...
read it
-
Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction
The task of temporally grounding language queries in videos is to tempor...
read it
-
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network
In this paper, we propose to guide the video caption generation with Par...
read it
-
MemeFaceGenerator: Adversarial Synthesis of Chinese Meme-face from Natural Sentences
Chinese meme-face is a special kind of internet subculture widely spread...
read it
-
Sentence Specified Dynamic Video Thumbnail Generation
With the tremendous growth of videos over the Internet, video thumbnails...
read it
-
Position Focused Attention Network for Image-Text Matching
Image-text matching tasks have recently attracted a lot of attention in ...
read it
-
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
In this paper, we address a novel task, namely weakly-supervised spatio-...
read it
-
Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning
In this paper, the problem of describing visual contents of a video sequ...
read it
-
Hallucinating Optical Flow Features for Video Classification
Appearance and motion are two key components to depict and characterize ...
read it
-
Image Deformation Meta-Networks for One-Shot Learning
Humans can robustly learn novel visual concepts even when images undergo...
read it
-
Spatio-temporal Video Re-localization by Warp LSTM
The need for efficiently finding the video content a user wants is incre...
read it
-
PFLD: A Practical Facial Landmark Detector
Being accurate, efficient, and compact is essential to a facial landmark...
read it
-
Hierarchical Photo-Scene Encoder for Album Storytelling
In this paper, we propose a novel model with a hierarchical photo-scene ...
read it
-
Predictive Indexing
There has been considerable research on automated index tuning in databa...
read it
-
Real-Time Referring Expression Comprehension by Single-Stage Grounding Network
In this paper, we propose a novel end-to-end model, namely Single-Stage ...
read it
-
Multi-granularity Generator for Temporal Action Proposal
Temporal action proposal generation is an important task, aiming to loca...
read it
-
Unsupervised Image Captioning
Deep neural networks have achieved great successes on the image captioni...
read it
-
Non-local NetVLAD Encoding for Video Classification
This paper describes our solution for the 2^nd YouTube-8M video understa...
read it
-
Baidu Apollo Auto-Calibration System - An Industry-Level Data-Driven and Learning based Vehicle Longitude Dynamic Calibrating Algorithm
For any autonomous driving vehicle, control module determines its road p...
read it
-
Video Re-localization
Many methods have been developed to help people find the video contents ...
read it
-
Recurrent Fusion Network for Image Captioning
Recently, much advance has been made in image captioning, and an encoder...
read it
-
Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks
Recent studies on unsupervised image-to-image translation have made rema...
read it
-
Hierarchical Attention-Based Recurrent Highway Networks for Time Series Prediction
Time series prediction has been studied in a variety of domains. However...
read it
-
Safe Element Screening for Submodular Function Minimization
Submodular functions are discrete analogs of convex functions, which hav...
read it
-
Long-Term Human Motion Prediction by Modeling Motion Context and Enhancing Motion Dynamic
Human motion prediction aims at generating future frames of human motion...
read it
-
Fine-grained Video Attractiveness Prediction Using Multimodal Deep Learning on a Large Real-world Dataset
Nowadays, billions of videos are online ready to be viewed and shared. A...
read it
-
Learning to Guide Decoding for Image Captioning
Recently, much advance has been made in image captioning, and an encoder...
read it
-
Gated Fusion Network for Single Image Dehazing
In this paper, we propose an efficient algorithm to directly restore a c...
read it
-
Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
Dense video captioning is a newly emerging task that aims at both locali...
read it
-
Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present
Recently, caption generation with an encoder-decoder framework has been ...
read it