In 3D Referring Expression Segmentation (3D-RES), the earlier approach a...
In recent years, 3D representation learning has turned to 2D vision-lang...
Text-driven 3D stylization is a complex and crucial task in the fields o...
In this paper, we study the local visual modeling with grid features for...
Video-text retrieval has been a crucial and fundamental task in multi-mo...