The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation

08/17/2023
by   Giacomo Zara, et al.
0

Source-Free Video Unsupervised Domain Adaptation (SFVUDA) methods consists in the task of adapting an action recognition model, trained on a labelled source dataset, to an unlabelled target dataset, without accessing the actual source data. Previous approaches have attempted to address SFVUDA by leveraging self-supervision (e.g., enforcing temporal consistency) derived from the target data itself. In this work we take an orthogonal approach by exploiting "web-supervision" from Large Language-Vision Models (LLVMs), driven by the rationale that LLVMs contain rich world prior, which is surprisingly robust to domain-shift. We showcase the unreasonable effectiveness of integrating LLVMs for SFVUDA by devising an intuitive and parameter efficient method, which we name as Domain Adaptation with Large Language-Vision models (DALL-V), that distills the world prior and complementary source model information into a student network tailored for the target. Despite the simplicity, DALL-V achieves significant improvement over state-of-the-art SFVUDA methods.

READ FULL TEXT
research
03/09/2022

Learning Temporal Consistency for Source-Free Video Domain Adaptation

Video-based Unsupervised Domain Adaptation (VUDA) methods improve the ro...
research
09/02/2021

On-target Adaptation

Domain adaptation seeks to mitigate the shift between training on the so...
research
08/10/2022

EXTERN: Leveraging Endo-Temporal Regularization for Black-box Video Domain Adaptation

To enable video models to be applied seamlessly across video tasks in di...
research
02/13/2023

In Search for a Generalizable Method for Source Free Domain Adaptation

Source-free domain adaptation (SFDA) is compelling because it allows ada...
research
06/29/2023

Prompt Ensemble Self-training for Open-Vocabulary Domain Adaptation

Traditional domain adaptation assumes the same vocabulary across source ...
research
08/17/2021

Channel-Temporal Attention for First-Person Video Domain Adaptation

Unsupervised Domain Adaptation (UDA) can transfer knowledge from labeled...
research
03/30/2022

CycDA: Unsupervised Cycle Domain Adaptation from Image to Video

Although action recognition has achieved impressive results over recent ...

Please sign up or login with your details

Forgot password? Click here to reset