3D vision-language grounding (3D-VL) is an emerging field that aims to
c...
Recent advances in Scene Graph Generation (SGG) typically model the
rela...
This paper show a work on better use of LLMs with SelfzCoT a self-prompt...
Dynamic scene graphs generated from video clips could help enhance the
s...
Existing permutation-invariant methods can be divided into two categorie...
In a complex road traffic scene, illegal lane intrusion of pedestrians o...
Graph convolutional neural network provides good solutions for node
clas...
Pedestrian detection in crowd scenes poses a challenging problem due to ...
Object detection in videos has drawn increasing attention recently since...
Semantic image segmentation, which becomes one of the key applications i...
Disparity estimation for binocular stereo images finds a wide range of
a...