VideoLLM: Modeling Video Sequence with Large Language Models

05/22/2023
by   Guo Chen, et al.
0

With the exponential growth of video data, there is an urgent need for automated technology to analyze and comprehend video content. However, existing video understanding models are often task-specific and lack a comprehensive capability of handling diverse tasks. The success of large language models (LLMs) like GPT has demonstrated their impressive abilities in sequence causal reasoning. Building upon this insight, we propose a novel framework called VideoLLM that leverages the sequence reasoning capabilities of pre-trained LLMs from natural language processing (NLP) for video sequence understanding. VideoLLM incorporates a carefully designed Modality Encoder and Semantic Translator, which convert inputs from various modalities into a unified token sequence. This token sequence is then fed into a decoder-only LLM. Subsequently, with the aid of a simple task head, our VideoLLM yields an effective unified framework for different kinds of video understanding tasks. To evaluate the efficacy of VideoLLM, we conduct extensive experiments using multiple LLMs and fine-tuning methods. We evaluate our VideoLLM on eight tasks sourced from four different datasets. The experimental results demonstrate that the understanding and reasoning capabilities of LLMs can be effectively transferred to video understanding tasks. We release the code at https://github.com/cg1177/VideoLLM.

READ FULL TEXT
research
08/18/2023

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Large language models (LLMs), such as GPT-4, have shown remarkable perfo...
research
05/24/2023

Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis

Large Language Models (LLMs) have demonstrated great capabilities in sol...
research
06/27/2020

Video-Grounded Dialogues with Pretrained Generation Language Models

Pre-trained language models have shown remarkable success in improving v...
research
09/02/2021

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

Pre-trained models for Natural Languages (NL) like BERT and GPT have bee...
research
05/24/2023

EvEval: A Comprehensive Evaluation of Event Semantics for Large Language Models

Events serve as fundamental units of occurrence within various contexts....
research
11/03/2021

OpenPrompt: An Open-source Framework for Prompt-learning

Prompt-learning has become a new paradigm in modern natural language pro...
research
05/03/2023

CodeGen2: Lessons for Training LLMs on Programming and Natural Languages

Large language models (LLMs) have demonstrated remarkable abilities in r...

Please sign up or login with your details

Forgot password? Click here to reset