Semi-supervised semantic segmentation involves assigning pixel-wise labe...
In this paper, we study the denoising diffusion probabilistic model (DDP...
Despite the promising progress in multi-modal tasks, current large
multi...
Multimodal summarization with multimodal output (MSMO) has emerged as a
...
Network Function Virtualization (NFV) seeks to replace hardware middlebo...
We propose MM-REACT, a system paradigm that integrates ChatGPT with a po...
3D photography renders a static image into a video with appealing 3D vis...
Semi-supervised learning (SSL) has been widely explored in recent years,...
We present X-Decoder, a generalized decoding model that can predict
pixe...
This paper presents a Generative RegIon-to-Text transformer, GRiT, for o...
The image captioning task is typically realized by an auto-regressive me...
Large language models (LLMs) show impressive abilities via few-shot
prom...
In this paper, we present NUWA-Infinity, a generative model for infinite...
Semi-supervised learning (SSL) has been widely explored in recent years,...
Recently, several Bayesian deep learning methods have been proposed for
...
Intellectual property (IP) piracy has become a non-negligible problem as...
Vision-language (VL) pre-training has recently received considerable
att...
Leveraging large-scale data can introduce performance gains on many comp...
In this paper, we design and train a Generative Image-to-text Transforme...
This study investigated the climate effect under consecutive winters on ...
Human-Object Interaction (HOI) recognition is challenging due to two fac...
We propose DEFR, a DEtection-FRee method to recognize Human-Object
Inter...
Tremendous progress has been made in recent years in developing better i...
In recent years, we have witnessed significant performance boost in the ...
In this paper, we propose UNICORN, a vision-language (VL) model that uni...
Automated visual understanding of our diverse and open world demands com...
In this paper, we propose a single UniFied transfOrmer (UFO), which is
c...
Vision-and-language (VL) pre-training has proven to be highly effective ...
The spatial panel regression model has shown great success in modelling
...
Motion deblurring has witnessed rapid development in recent years, and m...
Knowledge-based visual question answering (VQA) involves answering quest...
This study investigated the effect of harsh winter climate on the perfor...
This paper revisits human-object interaction (HOI) recognition at image ...
Imbalanced datasets widely exist in practice and area great challenge fo...
This paper presents an end-to-end semi-supervised object detection appro...
The convolutional neural network (CNN) has become a basic model for solv...
Despite exciting progress in pre-training for visual-linguistic (VL)
rep...
This paper presents a detection-aware pre-training (DAP) approach, which...
Recent advances in computer vision take advantage of adversarial data
au...
Software is often used for Network Functions (NFs) – such as firewalls, ...
This paper is concerned with self-supervised learning for small models. ...
Label assignment has been widely studied in general object detection bec...
Recent vision-language (VL) studies have shown remarkable progress by
le...
In this paper, we propose Text-Aware Pre-training (TAP) for Text-VQA and...
Mainstream object detectors based on the fully convolutional network has...
Harsh winter climate can cause various problems for both public and priv...
In this paper, we propose an effective knowledge transfer framework to b...
In this paper, we propose an anchor-free object detector with a fully
di...
In this paper, we propose an algorithm, named hashing-based non-maximum
...
In this paper we introduce the q-ratio block constrained minimal singula...