Camouflaged objects that blend into natural scenes pose significant
chal...
Cross-modal Unsupervised Domain Adaptation (UDA) aims to exploit the
com...
Text-to-image diffusion models can create stunning images from natural
l...
Achieving machine autonomy and human control often represent divergent
o...
Trajectory prediction is a crucial undertaking in understanding entity
m...
Existing video recognition algorithms always conduct different training
...
Contrastive vision-language models (e.g. CLIP) are typically created by
...
Text-to-image (T2I) models based on diffusion processes have achieved
re...
The field of image super-resolution (SR) has witnessed extensive neural
...
What is an image and how to extract latent features? Convolutional Netwo...
Anomaly detection and localization of visual data, including images and
...
Anomaly detection in videos is a significant yet challenging problem.
Pr...
The state of neural network pruning has been noticed to be unclear and e...
Vision Transformers have shown great promise recently for many vision ta...
Recent efforts in Neural Rendering Fields (NeRF) have shown impressive
r...
Existing action recognition methods typically sample a few frames to
rep...
A deeper network structure generally handles more complicated non-linear...
Several recent works empirically find finetuning learning rate is critic...
Image rasterization is a mature technique in computer graphics, while im...
The topic of generalizing machine learning models learned on a collectio...
Recent research explosion on Neural Radiance Field (NeRF) shows the
enco...
Convolutional neural network (CNN) has achieved great success on image
s...
Pedestrian trajectory prediction is an essential component in a wide ran...
Fully exploiting the learning capacity of neural networks requires
overp...
Point cloud analysis is challenging due to irregularity and unordered da...
Action prediction aims to infer the forthcoming human action with
partia...
Semi-supervised domain adaptation (SSDA) is quite a challenging problem
...
Anomaly detection is a fundamental yet challenging problem in machine
le...
Recognizing Families In the Wild (RFIW), held as a data challenge in
con...
Sign language is commonly used by deaf or mute people to communicate but...
The study of 3D hyperspectral image (HSI) reconstruction refers to the
i...
Multimodal target/aspect sentiment classification combines multimodal
se...
Adaptive gradient methods, such as Adam, have achieved tremendous
succes...
Several recent works [40, 24] observed an interesting phenomenon in neur...
Domain adaptation enhances generalizability of a model across domains wi...
Cross-Domain Detection (XDD) aims to train an object detector using labe...
In this paper, we address the space-time video super-resolution, which a...
There are demographic biases in the SOTA CNN used for FR. Our BFW datase...
Sign language is a visual language that is used by deaf or speech impair...
Over-parameterization of neural networks benefits the optimization and
g...
Searching for relative mobile user interface (UI) design examples can ai...
Computer vision is widely deployed, has highly visible, society altering...
Learning visual knowledge from massive weakly-labeled web videos has
att...
Regularization has long been utilized to learn sparsity in deep neural
n...
Humans spend vast hours in bed – about one-third of the lifetime on aver...
Advances in face rotation, along with other face-based generative tasks,...
Several methods of knowledge distillation have been developed for neural...
Knowledge distillation (KD) is a general deep neural network training
fr...
Multi-view action recognition (MVAR) leverages complementary temporal
in...
In this paper, we aim to develop an efficient and compact deep network f...