Text-based Person Retrieval aims to retrieve the target person images gi...
The recent video grounding works attempt to introduce vanilla contrastiv...
Large-scale pre-training has brought unimodal fields such as computer vi...
Automatic radiology report generation has attracted enormous research
in...
With the rise of short videos, the demand for selecting appropriate
back...
Video grounding aims to locate a moment of interest matching the given q...