FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks

03/24/2022
by   Santiago Castro, et al.
2

Large-scale pretrained image-text models have shown incredible zero-shot performance in a handful of tasks, including video ones such as action recognition and text-to-video retrieval. However, these models haven't been adapted to video, mainly because they don't account for the time dimension but also because video frames are different from the typical images (e.g., containing motion blur, less sharpness). In this paper, we present a fine-tuning strategy to refine these large-scale pretrained image-text models for zero-shot video understanding tasks. We show that by carefully adapting these models we obtain considerable improvements on two zero-shot Action Recognition tasks and three zero-shot Text-to-video Retrieval tasks. The code is available at https://github.com/bryant1410/fitclip

READ FULL TEXT
research
03/15/2023

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

Large scale Vision-Language (VL) models have shown tremendous success in...
research
08/03/2020

RareAct: A video dataset of unusual interactions

This paper introduces a manually annotated video dataset of unusual acti...
research
09/28/2021

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

We present VideoCLIP, a contrastive approach to pre-train a unified mode...
research
04/06/2023

Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting

Adopting contrastive image-text pretrained models like CLIP towards vide...
research
12/14/2022

Understanding Zero-Shot Adversarial Robustness for Large-Scale Models

Pretrained large-scale vision-language models like CLIP have exhibited s...
research
08/12/2023

TongueSAM: An Universal Tongue Segmentation Model Based on SAM with Zero-Shot

Tongue segmentation serves as the primary step in automated TCM tongue d...
research
11/03/2022

Zero-shot Video Moment Retrieval With Off-the-Shelf Models

For the majority of the machine learning community, the expensive nature...

Please sign up or login with your details

Forgot password? Click here to reset