Most recent works focus on answering first order logical queries to expl...
Recent compositional zero-shot learning (CZSL) methods adapt pre-trained...
Existing audio analysis methods generally first transform the audio stre...
Video temporal grounding aims to pinpoint a video segment that matches t...
Foundation models are pre-trained on massive data and transferred to
dow...
Parameter-efficient transfer learning (PETL) based on large-scale pre-tr...
This work presents a unified knowledge protocol, called UKnow, which
fac...
Many recent studies leverage the pre-trained CLIP for text-video cross-m...
Standard approaches for video recognition usually operate on the full in...
Existing video copy detection methods generally measure video similarity...
Content-based video retrieval aims to find videos from a large video dat...