In this work, we investigate performing semantic segmentation solely thr...
Image-text pretrained models, e.g., CLIP, have shown impressive general
...
Video-Language Pre-training models have recently significantly improved
...
Semi-supervised learning based methods are current SOTA solutions to the...
Human Object Interaction (HOI) detection is a challenging task that requ...
Building a universal video-language model for solving various video
unde...
In recent years, memory-augmented neural networks(MANNs) have shown prom...
Existing action detection algorithms usually generate action proposals
t...