Data-driven research in Additive Manufacturing (AM) has gained significa...
In-context learning, i.e., learning from in-context samples, is an impre...
Recent work on Neural Radiance Fields (NeRF) has demonstrated significan...
Diffusion-based methods can generate realistic images and videos, but th...
Recently, integrating video foundation models and large language models ...
Recent vision transformers, large-kernel CNNs and MLPs have attained
rem...
Generating realistic human motion from given action descriptions has
exp...
Transformer is popular in recent 3D human pose estimation, which utilize...
The recent success of Large Language Models (LLMs) signifies an impressi...
Recognizing elementary underlying concepts from observations
(disentangl...
Because of the ambiguous and subjective property of the facial expressio...
Noise suppression (NS) models have been widely applied to enhance speech...
Clothes-invariant feature extraction is critical to the clothes-changing...
The aim of this paper is to investigate the connection between learning
...
Talking head generation is to generate video based on a given source ide...
Most Neural Radiance Fields (NeRFs) have poor generalization ability,
li...
The Multiplane Image (MPI), containing a set of fronto-parallel RGBA lay...
Layout generation aims to synthesize realistic graphic scenes consisting...
For any video codecs, the coding efficiency highly relies on whether the...
Packet loss concealment (PLC) is challenging in concealing missing conte...
For the task of speech separation, previous study usually treats
multi-c...
Existing deep learning based speech enhancement (SE) methods either use ...
Neural image compression has surpassed state-of-the-art traditional code...
Representing a signal as a continuous function parameterized by neural
n...
We present a new method for estimating the Neural Reflectance Field (NRe...
Temporal modeling is crucial for various video learning tasks. Most rece...
Graph-based models have achieved great success in person re-identificati...
Video frame synthesis, which consists of interpolation and extrapolation...
For neural video codec, it is critical, yet challenging, to design an
ef...
In this paper we propose a multi-modal multi-correlation learning framew...
Neural audio coding has shown very promising results recently in the
lit...
In the booming video era, video segmentation attracts increasing researc...
Edge computing is being widely used for video analytics. To alleviate th...
Deep neural networks often suffer the data distribution shift between
tr...
Obtaining the human-like perception ability of abstracting visual concep...
We propose a method for self-supervised image representation learning un...
Improving the generalization capability of Deep Neural Networks (DNNs) i...
How to efficiently utilize the temporal features is crucial, yet challen...
One-shot object detection aims at detecting novel objects according to m...
Contrastive learning between different views of the data achieves outsta...
This paper presents ActiveMLP, a general MLP-like backbone for computer
...
Distribution forecast can quantify forecast uncertainty and provide vari...
For deep reinforcement learning (RL) from pixels, learning effective sta...
Error propagation is a general but crucial problem in online semi-superv...
We address end-to-end learned video compression with a special focus on
...
Most of the existing neural video compression methods adopt the predicti...
Self-supervised learning has been successfully applied to pre-train vide...
Geometry Projection is a powerful depth estimation method in monocular 3...
Self-attention has been successfully applied to video representation lea...
Detecting and localizing objects in the real 3D space, which plays a cru...