MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models

03/23/2023
by   Dohwan Ko, et al.
0

Foundation models have shown outstanding performance and generalization capabilities across domains. Since most studies on foundation models mainly focus on the pretraining phase, a naive strategy to minimize a single task-specific loss is adopted for fine-tuning. However, such fine-tuning methods do not fully leverage other losses that are potentially beneficial for the target task. Therefore, we propose MEta Loss TRansformer (MELTR), a plug-in module that automatically and non-linearly combines various loss functions to aid learning the target task via auxiliary learning. We formulate the auxiliary learning as a bi-level optimization problem and present an efficient optimization algorithm based on Approximate Implicit Differentiation (AID). For evaluation, we apply our framework to various video foundation models (UniVL, Violet and All-in-one), and show significant performance gain on all four downstream tasks: text-to-video retrieval, video question answering, video captioning, and multi-modal sentiment analysis. Our qualitative analyses demonstrate that MELTR adequately `transforms' individual loss functions and `melts' them into an effective unified loss. Code is available at https://github.com/mlvlab/MELTR.

READ FULL TEXT

page 4

page 7

page 11

research
02/09/2023

Offsite-Tuning: Transfer Learning without Full Model

Transfer learning is important for foundation models to adapt to downstr...
research
08/11/2023

Foundation Model is Efficient Multimodal Multitask Model Selector

This paper investigates an under-explored but important problem: given a...
research
09/15/2022

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

This paper presents OmniVL, a new foundation model to support both image...
research
06/14/2022

LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks

Fine-tuning pretrained language models (LMs) without making any architec...
research
12/20/2022

Recycling diverse models for out-of-distribution generalization

Foundation models are redefining how AI systems are built. Practitioners...
research
05/23/2023

TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale

The ultimate goal for foundation models is realizing task-agnostic, i.e....
research
11/29/2022

On the power of foundation models

With infinitely many high-quality data points, infinite computational po...

Please sign up or login with your details

Forgot password? Click here to reset