ArtGPT-4: Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4

05/12/2023
by   Zhengqing Yuan, et al.
1

In recent years, large language models (LLMs) have made significant progress in natural language processing (NLP), with models like ChatGPT and GPT-4 achieving impressive capabilities in various linguistic tasks. However, training models on such a large scale is challenging, and finding datasets that match the model's scale is often difficult. Fine-tuning and training models with fewer parameters using novel methods have emerged as promising approaches to overcome these challenges. One such model is MiniGPT-4, which achieves comparable vision-language understanding to GPT-4 by leveraging novel pre-training models and innovative training strategies. However, the model still faces some challenges in image understanding, particularly in artistic pictures. A novel multimodal model called ArtGPT-4 has been proposed to address these limitations. ArtGPT-4 was trained on image-text pairs using a Tesla A100 device in just 2 hours, using only about 200 GB of data. The model can depict images with an artistic flair and generate visual code, including aesthetically pleasing HTML/CSS web pages. Furthermore, the article proposes novel benchmarks for evaluating the performance of vision-language models. In the subsequent evaluation methods, ArtGPT-4 scored more than 1 point higher than the current state-of-the-art model and was only 0.25 points lower than artists on a 6-point scale. Our code and pre-trained model are available at <https://huggingface.co/Tyrannosaurus/ArtGPT-4>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2021

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Pre-trained models have achieved state-of-the-art results in various Nat...
research
12/23/2021

ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Pre-trained language models have achieved state-of-the-art results in va...
research
04/19/2019

Challenges and Prospects in Vision and Language Research

Language grounded image understanding tasks have often been proposed as ...
research
02/13/2023

Paparazzi: A Deep Dive into the Capabilities of Language and Vision Models for Grounding Viewpoint Descriptions

Existing language and vision models achieve impressive performance in im...
research
06/29/2022

Is it possible not to cheat on the Turing Test_Exploring the potential and challenges for true natural language 'understanding' by computers

The increasing sophistication of NLP models has renewed optimism regardi...
research
05/12/2021

News Headline Grouping as a Challenging NLU Task

Recent progress in Natural Language Understanding (NLU) has seen the lat...
research
08/16/2023

Painter: Teaching Auto-regressive Language Models to Draw Sketches

Large language models (LLMs) have made tremendous progress in natural la...

Please sign up or login with your details

Forgot password? Click here to reset