Audio-visual video segmentation (AVVS) aims to generate pixel-level maps...
Audio-driven talking-head synthesis is a popular research topic for virt...
Tracking any given object(s) spatially and temporally is a common purpos...
In this study, we focus on the problem of 3D human mesh recovery from a
...
In this paper, we focus on the task of generalizable neural human render...
Large-scale pre-trained vision-language models allow for the zero-shot
t...
The Associating Objects with Transformers (AOT) framework has exhibited
...
The Associating Objects with Transformers (AOT) framework has exhibited
...
In real-world scenarios, collected and annotated data often exhibit the
...
This work aims to provide a deep-learning solution for the motion
interp...
Recovering noise-covered details from low-light images is challenging, a...
This report presents a framework called Segment And Track Anything (SAMT...
In this paper, we introduce semi-supervised video object segmentation (V...
Video-based 3D human pose and shape estimations are evaluated by intra-f...
This paper focuses on developing a more effective method of hierarchical...
Product retrieval is of great importance in the ecommerce domain. This p...
In this paper, we focus on the unsupervised Video Object Segmentation (V...
This paper investigates how to realize better and more efficient embeddi...
This paper investigates how to realize better and more efficient embeddi...
Referring video object segmentation (RVOS) aims to segment video objects...
Compared to 2D object bounding-box labeling, it is very difficult for hu...
This paper investigates the principles of embedding learning to tackle t...
In this paper, we investigate the principles of embedding learning betwe...
Comparing to image inpainting, image outpainting receives less attention...
In this work, we propose a generally applicable transformation unit for
...