Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

06/09/2023
by   Mu Cai, et al.
0

Recently, large language models (LLMs) have made significant advancements in natural language understanding and generation. However, their potential in computer vision remains largely unexplored. In this paper, we introduce a new, exploratory approach that enables LLMs to process images using the Scalable Vector Graphics (SVG) format. By leveraging the XML-based textual descriptions of SVG representations instead of raster images, we aim to bridge the gap between the visual and textual modalities, allowing LLMs to directly understand and manipulate images without the need for parameterized visual components. Our method facilitates simple image classification, generation, and in-context learning using only LLM capabilities. We demonstrate the promise of our approach across discriminative and generative tasks, highlighting its (i) robustness against distribution shift, (ii) substantial improvements achieved by tapping into the in-context learning abilities of LLMs, and (iii) image understanding and generation capabilities with human guidance. Our code, data, and models can be found here https://github.com/mu-cai/svg-llm.

READ FULL TEXT

page 1

page 6

page 7

page 8

page 9

page 12

page 13

research
08/31/2023

TouchStone: Evaluating Vision-Language Models by Language Models

Large vision-language models (LVLMs) have recently witnessed rapid advan...
research
08/29/2023

Where Would I Go Next? Large Language Models as Human Mobility Predictors

Accurate human mobility prediction underpins many important applications...
research
05/24/2023

Cream: Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models

Advances in Large Language Models (LLMs) have inspired a surge of resear...
research
07/22/2020

DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation

Scalable Vector Graphics (SVG) are ubiquitous in modern 2D interfaces du...
research
07/27/2023

How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges

Google's Bard has emerged as a formidable competitor to OpenAI's ChatGPT...
research
04/27/2023

IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers

Scalable Vector Graphics (SVG) is a popular vector image format that off...
research
08/16/2023

Painter: Teaching Auto-regressive Language Models to Draw Sketches

Large language models (LLMs) have made tremendous progress in natural la...

Please sign up or login with your details

Forgot password? Click here to reset