This paper proposes a novel transformer-based framework that aims to enh...
Implicit neural representations have shown powerful capacity in modeling...
Learning discriminative task-specific features simultaneously for multip...
Talking head video generation aims to animate a human face in a still im...
This paper targets the problem of multi-task dense prediction which aims...
Multi-task scene understanding aims to design models that can simultaneo...
Predominant techniques on talking head generation largely depend on 2D
i...
This paper presents DetCLIPv2, an efficient and scalable training framew...
This report serves as a supplementary document for TaskPrompter, detaili...
We introduce You Only Train Once (YOTO), a dynamic human generation
fram...
Deep neural networks are vulnerable to adversarial attacks. Most white-b...
Open-world object detection, as a more general and challenging goal, aim...
Relying on the premise that the performance of a binary neural network c...
A counter-intuitive property of convolutional neural networks (CNNs) is ...
Neural network binarization accelerates deep models by quantizing their
...
A fundamental and challenging problem in deep learning is catastrophic
f...
Topology Optimization (TO) provides a systematic approach for obtaining
...
Multi-task dense scene understanding is a thriving research domain that
...
Talking head video generation aims to produce a synthetic human face vid...
This paper proposes a new transformer-based framework to learn class-spe...
Over the past years, semantic segmentation, as many other tasks in compu...
Multi-view Stereo (MVS) with known camera parameters is essentially a 1D...
Existing domain adaptation methods for crowd counting view each crowd im...
Labeling is onerous for crowd counting as it should annotate each indivi...
As a crucial task of autonomous driving, 3D object detection has made gr...
Weakly supervised temporal action localization (WS-TAL) is a challenging...
Surface reconstruction from point clouds is a fundamental problem in the...
We propose a method to train deep networks to decompose videos into 3D
g...
This paper focuses on the task of 4D shape reconstruction from a sequenc...
Estimating 3D bounding boxes from monocular images is an essential compo...
Convolutional neural networks have enabled major progress in addressing
...
Although Transformer has made breakthrough success in widespread domains...
Multi-scale representations deeply learned via convolutional neural netw...
Online action detection is a task with the aim of identifying ongoing ac...
Existing anchor-based and anchor-free object detectors in multi-stage or...
Existing anchor-based and anchor-free object detectors in multi-stage or...
We propose a novel Edge guided Generative Adversarial Network (EdgeGAN) ...
We propose a novel model named Multi-Channel Attention Selection Generat...
In order to predict the development trend of the 2019 coronavirus
(2019-...
In this paper, we address the task of semantic-guided scene generation. ...
State-of-the-art models for unpaired image-to-image translation with
Gen...
State-of-the-art methods in the unpaired image-to-image translation are
...
Recent deep monocular depth estimation approaches based on supervised
re...
Recent saliency models extensively explore to incorporate multi-scale
co...
In this paper we propose a geometry-aware model for video object detecti...
Modelling long-range dependencies is critical for complex scene understa...
Inspired by the success of adversarial learning, we propose a new end-to...
In this work, we propose a novel Cycle In Cycle Generative Adversarial
N...
In this paper, we focus on the facial expression translation task and pr...
Cross-view image translation is challenging because it involves images w...