We investigate knowledge retrieval with multi-modal queries, i.e. querie...
Incremental or continual learning has been extensively studied for image...
In videos that contain actions performed unintentionally, agents do not
...
Tremendous progress has been made in recent years in developing better i...
Despite exciting progress in pre-training for visual-linguistic (VL)
rep...
This paper is concerned with self-supervised learning for small models. ...
A system capturing the association between video frames and textual quer...
Small object detection is challenging because small objects do not conta...
Person search by natural language aims at retrieving a specific person i...
Captioning is a crucial and challenging task for video understanding. In...
The process of identifying changes or transformations in a scene along w...
Computer Vision applications often require a textual grounding module wi...
Grounding textual phrases in visual content is a meaningful yet challeng...
Convolutional neural networks have achieved great improvement on face
re...