The integration of different modalities, such as audio and visual
inform...
Semi-supervised semantic segmentation involves assigning pixel-wise labe...
Recent works have proposed to craft adversarial clothes for evading pers...
We propose Audio-Visual Lightweight ITerative model (AVLIT), an effectiv...
Recent works found that deep neural networks (DNNs) can be fooled by
adv...
Object detection is a critical component of various security-sensitive
a...
Natural language understanding is one of the most challenging topics in
...
Automatic speech recognition (ASR) systems based on deep neural networks...
Multi-scale features are essential for dense prediction tasks, including...
Semi-supervised learning (SSL) has been widely explored in recent years,...
Deep neural networks are vulnerable to adversarial attacks. We consider
...
Audio-visual approaches involving visual inputs have laid the foundation...
Adversarial attacks can easily fool object recognition systems based on ...
Recently, unsupervised learning has made impressive progress on various
...
Deep neural networks have shown excellent prospects in speech separation...
Machine learning poses severe privacy concerns as it is shown that the
l...
In this paper, we present a novel protocol of annotation and evaluation ...
The requirement of expensive annotations is a major burden for training ...
Semi-supervised learning (SSL) has been widely explored in recent years,...
Most of the recent neural source separation systems rely on a masking-ba...
Thermal infrared imaging is widely used in body temperature measurement,...
The adversarial risk of a machine learning model has been widely studied...
Compared with rate-based artificial neural networks, Spiking Neural Netw...
Nowadays, cameras equipped with AI systems can capture and analyze image...
Extracting cultivated land accurately from high-resolution remote images...
Recent advances in the design of neural network architectures, in partic...
Goal-conditioned Hierarchical Reinforcement Learning (HRL) is a promisin...
Imbalanced datasets widely exist in practice and area great challenge fo...
The convolutional neural network (CNN) has become a basic model for solv...
In authentication scenarios, applications of practical speaker verificat...
The two-stage methods for instance segmentation, e.g. Mask R-CNN, have
a...
Tremendous efforts have been made on instance segmentation but the mask
...
It is often desired to train 6D pose estimation systems on synthetic dat...
Recently, it was found that many real-world examples without intentional...
The video captioning task is to describe the video contents with natural...
Neural style transfer models have been used to stylize an ordinary video...
Thermal infrared detection systems play an important role in many areas ...
Goal-conditioned hierarchical reinforcement learning (HRL) is a promisin...
Face parsing is an important computer vision task that requires accurate...
This paper addresses the task of estimating the 6 degrees of freedom pos...
Video captioning is an advanced multi-modal task which aims to describe ...
Pedestrian attribute recognition has been an emerging research topic in ...
We propose a novel perspective to understand deep neural networks in an
...
Network pruning is an important research field aiming at reducing
comput...
Given the features of a video, recurrent neural network can be used to
a...
The Convolutional Neural Networks (CNNs) generate the feature representa...
Distillation-based learning boosts the performance of the miniaturized n...
In standard Convolutional Neural Networks (CNNs), the receptive fields o...
Human-object interactions (HOI) recognition and pose estimation are two
...
Face parsing is a basic task in face image analysis. It amounts to label...