
-
Attributes Aware Face Generation with Generative Adversarial Networks
Recent studies have shown remarkable success in face image generations. ...
read it
-
Learn an Effective Lip Reading Model without Pains
Lip reading, also known as visual speech recognition, aims to recognize ...
read it
-
IAUnet: Global Context-Aware Feature Learning for Person Re-Identification
Person re-identification (reID) by CNNs based networks has achieved favo...
read it
-
Temporal Complementary Learning for Video Person Re-Identification
This paper proposes a Temporal Complementary Learning Network that extra...
read it
-
Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation
Scene graph aims to faithfully reveal humans' perception of image conten...
read it
-
Appearance-Preserving 3D Convolution for Video-based Person Re-identification
Due to the imperfect person detection results and posture changes, tempo...
read it
-
Synchronous Bidirectional Learning for Multilingual Lip Reading
Lip reading has received increasing attention in recent years. This pape...
read it
-
Single-Side Domain Generalization for Face Anti-Spoofing
Existing domain generalization methods for face anti-spoofing endeavor t...
read it
-
Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training
Although two-stage object detectors have continuously advanced the state...
read it
-
Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation
Image-level weakly supervised semantic segmentation is a challenging pro...
read it
-
Cross-domain Face Presentation Attack Detection via Multi-domain Disentangled Representation Learning
Face presentation attack detection (PAD) has been an urgent problem to b...
read it
-
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Answering questions that require reading texts in an image is challengin...
read it
-
Mutual Information Maximization for Effective Lip Reading
Lip reading has received an increasing research interest in recent years...
read it
-
Deformation Flow Based Two-Stream Network for Lip Reading
Lip reading is the task of recognizing the speech content by analyzing m...
read it
-
Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading
Lip-reading aims to infer the speech content from the lip movement seque...
read it
-
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
Recent advances in deep learning have heightened interest among research...
read it
-
UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
We propose UniViLM: a Unified Video and Language pre-training Model for ...
read it
-
Emotion Recognition for In-the-wild Videos
This paper is a brief introduction to our submission to the seven basic ...
read it
-
M^3T: Multi-Modal Continuous Valence-Arousal Estimation in the Wild
This report describes a multi-modal multi-task (M^3T) approach underlyin...
read it
-
Deep Heterogeneous Hashing for Face Video Retrieval
Retrieving videos of a particular person with face image as a query via ...
read it
-
FCSR-GAN: Joint Face Completion and Super-resolution via Multi-task Learning
Combined variations containing low-resolution and occlusion often presen...
read it
-
Learning-based Real-time Detection of Intrinsic Reflectional Symmetry
Reflectional symmetry is ubiquitous in nature. While extrinsic reflectio...
read it
-
LaplacianNet: Learning on 3D Meshes with Laplacian Encoding and Pooling
3D models are commonly used in computer vision and graphics. With the wi...
read it
-
RhythmNet: End-to-end Heart Rate Estimation from Face via Spatial-temporal Representation
Heart rate (HR) is an important physiological signal that reflects the p...
read it
-
Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition
Facial action units (AUs) recognition is essential for emotion analysis ...
read it
-
Cross Attention Network for Few-shot Classification
Few-shot classification aims to recognize unlabeled samples from unseen ...
read it
-
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
Image-text retrieval of natural scenes has been a popular research topic...
read it
-
Object-Contextual Representations for Semantic Segmentation
In this paper, we address the problem of semantic segmentation and focus...
read it
-
Self-supervised Scale Equivariant Network for Weakly Supervised Semantic Segmentation
Weakly supervised semantic segmentation has attracted much research inte...
read it
-
Transferable Contrastive Network for Generalized Zero-Shot Learning
Zero-shot learning (ZSL) is a challenging problem that aims to recognize...
read it
-
Temporal Knowledge Propagation for Image-to-Video Person Re-identification
In many scenarios of Person Re-identification (Re-ID), the gallery set c...
read it
-
From Two Graphs to N Questions: A VQA Dataset for Compositional Reasoning on Vision and Commonsense
Visual Question Answering (VQA) is a challenging task for evaluating the...
read it
-
Interlaced Sparse Self-Attention for Semantic Segmentation
In this paper, we present a so-called interlaced sparse self-attention a...
read it
-
Interaction-and-Aggregation Network for Person Re-identification
Person re-identification (reID) benefits greatly from deep convolutional...
read it
-
VRSTC: Occlusion-Free Video Person Re-Identification
Video person re-identification (re-ID) plays an important role in survei...
read it
-
Cascade RetinaNet: Maintaining Consistency for Single-Stage Object Detection
Recent researches attempt to improve the detection performance by adopti...
read it
-
Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation
Non-Autoregressive Transformer (NAT) aims to accelerate the Transformer ...
read it
-
Weakly Supervised Object Detection with Segmentation Collaboration
Weakly supervised object detection aims at learning precise object detec...
read it
-
Fully Learnable Group Convolution for Acceleration of Deep Neural Networks
Benefitted from its great success on many tasks, deep learning is increa...
read it
-
WIDER Face and Pedestrian Challenge 2018: Methods and Results
This paper presents a review of the 2018 WIDER Challenge on Face and Ped...
read it
-
Tattoo Image Search at Scale: Joint Detection and Compact Representation Learning
The explosive growth of digital images in video surveillance and social ...
read it
-
LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild
Large-scale datasets have successively proven their fundamental importan...
read it
-
VIPL-HR: A Multi-modal Database for Pulse Estimation from Less-constrained Face Video
Heart rate (HR) is an important physiological signal that reflects the p...
read it
-
Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation
Neural machine translation (NMT) models are usually trained with the wor...
read it
-
Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition
Zero-shot learning (ZSL) aims to recognize objects of novel classes with...
read it
-
Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships
Context is important for accurate visual recognition. In this work we pr...
read it
-
Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks
Rotation-invariant face detection, i.e. detecting faces with arbitrary r...
read it
-
Texture Classification in Extreme Scale Variations using GANet
Research in texture recognition often concentrates on recognizing textur...
read it
-
Arbitrary Facial Attribute Editing: Only Change What You Want
Facial attribute editing aims to modify either single or multiple attrib...
read it
-
Hyperspectral Light Field Stereo Matching
In this paper, we describe how scene depth can be extracted using a hype...
read it