Audio-visual video segmentation (AVVS) aims to generate pixel-level maps...
Video-based scene graph generation (VidSGG) is an approach that aims to
...
Since the advent of Neural Radiance Fields, novel view synthesis has rec...
In this paper, we focus on an under-explored issue of biased activation ...
Current video-based scene graph generation (VidSGG) methods have been fo...
Weakly-supervised object localization aims to indicate the category as w...
In recent years, supervised or unsupervised learning-based MVS methods
a...
High dynamic range (HDR) imaging is a fundamental problem in image
proce...
Recently, increasing efforts have been focused on Weakly Supervised Scen...
Solid results from Transformers have made them prevailing architectures ...
Nearly all existing scene graph generation (SGG) models have overlooked ...
Distinctive Image Captioning (DIC) – generating distinctive captions tha...
Given an image and a reference caption, the image caption editing task a...
Data Augmentation (DA) – generating extra training samples beyond origin...
Reconstructing ghosting-free high dynamic range (HDR) images of dynamic
...
Structured sentiment analysis, which aims to extract the complex semanti...
Reasoning about causal and temporal event relations in videos is a new
d...
A thriving trend for domain adaptive segmentation endeavors to generate ...
Limited by the computational efficiency and accuracy, generating complex...
The expensive annotation cost is notoriously known as a main constraint ...
Depth estimation is a crucial step for 3D reconstruction with panorama i...
Today's VidSGG models are all proposal-based methods, i.e., they first
g...
The contemporary visual captioning models frequently hallucinate objects...
Grounded video description (GVD) encourages captioning models to attend ...
Federated learning (FL) has emerged as an important machine learning par...
Today's VQA models still tend to capture superficial linguistic correlat...
Given an untrimmed video and a natural language query, Natural Language ...
Video Visual Relation Detection (VidVRD), has received significant atten...
This paper considers the problem of generating an HDR image of a scene f...
Centralized Training with Decentralized Execution (CTDE) has been a popu...
Weakly-Supervised Object Detection (WSOD) and Localization (WSOL), i.e.,...
The prevailing framework for matching multimodal inputs is based on a
tw...
The recent emerged weakly supervised object localization (WSOL) methods ...
By leveraging deep learning based technologies, the data-driven based
ap...
Controllable Image Captioning (CIC) – generating image descriptions
foll...
We aim to address the problem of Natural Language Video Localization
(NL...
With the successful application of deep learning models in many real-wor...
Due to people's emerging concern about data privacy, federated learning(...
To leverage enormous unlabeled data on distributed edge devices, we form...
The prevailing framework for solving referring expression grounding is b...
Visual Storytelling (VIST) is a task to tell a narrative story about a
c...
Accurate 2D lung nodules segmentation from medical Computed Tomography (...
Multi-task learning is an effective learning strategy for deep-learning-...
Fashion outfit recommendation has attracted increasing attentions from o...
Despite Visual Question Answering (VQA) has realized impressive progress...
Federated learning is proposed as a machine learning setting to enable
d...
Automatic question generation according to an answer within the given pa...
The recent rapid development of artificial intelligence (AI, mainly driv...
Scene graphs -- objects as nodes and visual relationships as edges --
de...
Retweet prediction is a challenging problem in social media sites (SMS)....