In this work, we propose a new transformer-based regularization to bette...
Large Language Models (LLMs) have shown excellent generalization capabil...
The Segment Anything Model (SAM) has demonstrated exceptional performanc...
Self-supervised audio-visual source localization aims to locate sound-so...
Point cloud analysis is receiving increasing attention, however, most
ex...
Linear transformers aim to reduce the quadratic space-time complexity of...
Zero-Shot Learning (ZSL) models aim to classify object classes that are ...
Vision transformers have shown great success on numerous computer vision...
With the human pursuit of knowledge, open-set object detection (OSOD) ha...
Existing deep learning based unsupervised video object segmentation meth...
Given the rapid development of 3D scanners, point clouds are becoming po...
Existing RGB-D saliency detection models do not explicitly encourage RGB...
With the help of the deep learning paradigm, many point cloud networks h...
Conventional saliency prediction models typically learn a deterministic
...
Confidence-aware learning is proven as an effective solution to prevent
...
The transformer networks, which originate from machine translation, are
...
Compared with expensive pixel-wise annotations, image-level labels make ...
Significant performance improvement has been achieved for fully-supervis...
Given the prominence of current 3D sensors, a fine-grained analysis on t...
Camouflage is a key defence mechanism across species that is critical to...
General purpose semantic segmentation relies on a backbone CNN network t...
Normalizing flows (NFs) are a class of generative models that allows exa...
Existing deep neural network based salient object detection (SOD) method...
Pixel-wise clean annotation is necessary for fully-supervised semantic
s...
Conditional generative modeling typically requires capturing one-to-many...
Although deep learning has achieved appealing results on several machine...
Humans explain inter-object relationships with semantic labels that
demo...
We propose the first stochastic framework to employ uncertainty for RGB-...
In this paper, we propose a noise-aware encoder-decoder framework to
dis...
Point cloud analysis is attracting attention from Artificial Intelligenc...
Deep convolutional neural networks perform better on images containing
s...
To make the best use of the underlying minute and subtle differences,
fi...
Event cameras are paradigm-shifting novel sensors that report asynchrono...
Previous work on novel object detection considers zero or few-shot setti...
Visual Question Answering (VQA) has emerged as a Visual Turing Test to
v...
Point-clouds are a popular choice for vision and graphics tasks due to t...
Convolution is an integral operation that defines how the shape of one
f...
As the basic task of point cloud learning, classification is fundamental...
Existing networks directly learn feature representations on 3D point clo...
Visual Question Answering (VQA) models employ attention mechanisms to
di...
Super-Resolution convolutional neural networks have recently demonstrate...
3D shape generation is a challenging problem due to the high-dimensional...
Event cameras are novel, bio-inspired visual sensors, whose pixels outpu...
Deep convolutional networks based super-resolution is a fast-growing fie...
Deep convolutional neural networks perform better on images containing
s...
Convolution is an efficient technique to obtain abstract feature
represe...
Spatial convolution is arguably the most fundamental of 2D image process...
Current Visual Question Answering (VQA) systems can answer intelligent
q...
Zero-shot object detection is an emerging research topic that aims to
re...
Event cameras provide asynchronous, data-driven measurements of local
te...