
-
M6: A Chinese Multimodal Pretrainer
In this work, we construct the largest dataset for multimodal pretrainin...
read it
-
A Collaborative Visual SLAM Framework for Service Robots
With the rapid deployment of service robots, a method should be establis...
read it
-
Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation
Holistically understanding an object and its 3D movable parts through vi...
read it
-
Fully-Automated Liver Tumor Localization and Characterization from Multi-Phase MR Volumes Using Key-Slice ROI Parsing: A Physician-Inspired Approach
Using radiological scans to identify liver tumors is crucial for proper ...
read it
-
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps
Texts appearing in daily scenes that can be recognized by OCR (Optical C...
read it
-
TP-TIO: A Robust Thermal-Inertial Odometry with Deep ThermalPoint
To achieve robust motion estimation in visually degraded environments, t...
read it
-
Hyperspectral Classification Based on Lightweight 3-D-CNN With Transfer Learning
Recently, hyperspectral image (HSI) classification approaches based on d...
read it
-
Quantum Dynamics of Optimization Problems
In this letter, by establishing the Schrödinger equation of the optimiza...
read it
-
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
Aspect-based summarization is the task of generating focused summaries b...
read it
-
Where to Look and How to Describe: Fashion Image Retrieval with an Attentional Heterogeneous Bilinear Network
Fashion products typically feature in compositions of a variety of style...
read it
-
Few-shot Action Recognition with Implicit Temporal Alignment and Pair Similarity Optimization
Few-shot learning aims to recognize instances from novel classes with fe...
read it
-
Disentangled Neural Architecture Search
Neural architecture search has shown its great potential in various area...
read it
-
Personalized Speech2Video with 3D Skeleton Regularization and Expressive Body Poses
In this paper, we propose a novel approach to convert given speech audio...
read it
-
Zero Correlation Zone Sequences With Flexible Block-Repetitive Spectral Constraints
A general construction of a set of time-domain sequences with sparse per...
read it
-
Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks
Most existing crowd counting systems rely on the availability of the obj...
read it
-
ODE-CNN: Omnidirectional Depth Extension Networks
Omnidirectional 360 camera proliferates rapidly for autonomous robots si...
read it
-
Non-Convex Exact Community Recovery in Stochastic Block Model
Learning community structures in graphs that are randomly generated by s...
read it
-
A Robust Attentional Framework for License Plate Recognition in the Wild
Recognizing car license plates in natural scene images is an important y...
read it
-
Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge
Conventional referring expression comprehension (REF) assumes people to ...
read it
-
Structured Multimodal Attentions for TextVQA
Text based Visual Question Answering (TextVQA) is a recently raised chal...
read it
-
Non-equilibrium transport of inhomogeneous shale gas under ultra-tight confinement
The non-equilibrium transport of inhomogeneous and dense gases highly co...
read it
-
Vid2Curve: Simultaneous Camera Motion Estimation and Thin Structure Reconstruction from an RGB Video
Thin structures, such as wire-frame sculptures, fences, cables, power li...
read it
-
Vid2Curve: Simultaneously Camera Motion Estimation and Thin Structure Reconstruction from an RGB Video
Thin structures, such as wire-frame sculptures, fences, cables, power li...
read it
-
Challenge Closed-book Science Exam: A Meta-learning Based Question Answering System
Prior work in standardized science exams requires support from large tex...
read it
-
Anisotropic Convolutional Networks for 3D Semantic Scene Completion
As a voxel-wise labeling task, semantic scene completion (SSC) tries to ...
read it
-
Toward Interpretability of Dual-Encoder Models for Dialogue Response Suggestions
This work shows how to improve and interpret the commonly used dual enco...
read it
-
Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension
Referring expression comprehension (REF) aims at identifying a particula...
read it
-
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Humans are able to describe image contents with coarse to fine details a...
read it
-
Deep Domain Adaptive Object Detection: a Survey
Deep learning (DL) based object detection has achieved great progress. T...
read it
-
Unsupervised Image-generation Enhanced Adaptation for Object Detection in Thermal images
Object detection in thermal images is an important computer vision task ...
read it
-
Using Sampled Network Data With The Autologistic Actor Attribute Model
Social science research increasingly benefits from statistical methods f...
read it
-
Real-time Segmentation and Facial Skin Tones Grading
Modern approaches for semantic segmention usually pay too much attention...
read it
-
To Balance or Not to Balance: An Embarrassingly Simple Approach for Learning with Long-Tailed Distributions
Real-world visual data often exhibits a long-tailed distribution, where ...
read it
-
CSPN++: Learning Context and Resource Aware Convolutional Spatial Propagation Networks for Depth Completion
Depth Completion deals with the problem of converting a sparse depth map...
read it
-
Resilient Load Restoration in Microgrids Considering Mobile Energy Storage Fleets: A Deep Reinforcement Learning Approach
Mobile energy storage systems (MESSs) provide mobility and flexibility t...
read it
-
Attend to the Difference: Cross-Modality Person Re-identification via Contrastive Correlation
The problem of cross-modality person re-identification has been receivin...
read it
-
Discriminative and Robust Online Learning for Siamese Visual Tracking
The problem of visual object tracking has traditionally been handled by ...
read it
-
Efficient Automatic Meta Optimization Search for Few-Shot Learning
Previous works on meta-learning either relied on elaborately hand-design...
read it
-
PrTransH: Embedding Probabilistic Medical Knowledge from Real World EMR Data
This paper proposes an algorithm named as PrTransH to learn embedding ve...
read it
-
Person Re-identification in Aerial Imagery
Nowadays, with the rapid development of consumer Unmanned Aerial Vehicle...
read it
-
V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices
One of the primary challenges faced by deep learning is the degree to wh...
read it
-
EPNAS: Efficient Progressive Neural Architecture Search
In this paper, we propose Efficient Progressive Neural Architecture Sear...
read it
-
A Performance Evaluation of Correspondence Grouping Methods for 3D Rigid Data Matching
Seeking consistent point-to-point correspondences between 3D rigid data ...
read it
-
Evaluating Local Geometric Feature Representations for 3D Rigid Data Matching
Local geometric descriptors remain an essential component for 3D rigid d...
read it
-
Towards End-to-End Text Spotting in Natural Scenes
Text spotting in natural scene images is of great importance for many im...
read it
-
NAS-FCOS: Fast Neural Architecture Search for Object Detection
The success of deep neural networks relies on significant architecture e...
read it
-
Multi-Label Image Recognition with Graph Convolutional Networks
The task of multi-label image recognition is to predict a set of object ...
read it
-
Vehicle Re-identification in Aerial Imagery: Dataset and Approach
In this work, we construct a large-scale dataset for vehicle re-identifi...
read it
-
A Simple and Robust Convolutional-Attention Network for Irregular Text Recognition
Reading irregular text of arbitrary shape in natural scene images is sti...
read it
-
Pixel-aware Deep Function-mixture Network for Spectral Super-Resolution
Spectral super-resolution (SSR) aims at generating a hyperspectral image...
read it