research
∙
06/08/2023
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Conversation agents fueled by Large Language Models (LLMs) are providing...
research
∙
12/06/2022
Fine-tuned CLIP Models are Efficient Video Learners
Large-scale multi-modal training with image-text pairs imparts strong ge...
research
∙
10/06/2022
MaPLe: Multi-modal Prompt Learning
Pre-trained vision-language (V-L) models such as CLIP have shown excelle...
research
∙
07/07/2022