An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

09/18/2023
by   Yadong Lu, et al.
0

Visual instruction tuning has recently shown encouraging progress with open-source large multimodal models (LMM) such as LLaVA and MiniGPT-4. However, most existing studies of open-source LMM are performed using models with 13B parameters or smaller. In this paper we present an empirical study of scaling LLaVA up to 33B and 65B/70B, and share our findings from our explorations in image resolution, data mixing and parameter-efficient training methods such as LoRA/QLoRA. These are evaluated by their impact on the multi-modal and language capabilities when completing real-world tasks in the wild. We find that scaling LMM consistently enhances model performance and improves language capabilities, and performance of LoRA/QLoRA tuning of LMM are comparable to the performance of full-model fine-tuning. Additionally, the study highlights the importance of higher image resolutions and mixing multimodal-language data to improve LMM performance, and visual instruction tuning can sometimes improve LMM's pure language capability. We hope that this study makes state-of-the-art LMM research at a larger scale more accessible, thus helping establish stronger baselines for future research. Code and checkpoints will be made public.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2023

Large Multimodal Models: Notes on CVPR 2023 Tutorial

This tutorial note summarizes the presentation on “Large Multimodal Mode...
research
09/13/2023

Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics

Multi-modal large language models (MLLMs) are trained based on large lan...
research
07/31/2023

FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis

In this paper, we propose FinVis-GPT, a novel multimodal large language ...
research
08/20/2023

StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data

The remarkable multimodal capabilities demonstrated by OpenAI's GPT-4 ha...
research
08/25/2023

SoTaNa: The Open-Source Software Development Assistant

Software development plays a crucial role in driving innovation and effi...
research
06/14/2023

Revealing the structure of language model capabilities

Building a theoretical understanding of the capabilities of large langua...

Please sign up or login with your details

Forgot password? Click here to reset