Transferring General Multimodal Pretrained Models to Text Recognition

12/19/2022
by   Junyang Lin, et al.
0

This paper proposes a new method, OFA-OCR, to transfer multimodal pretrained models to text recognition. Specifically, we recast text recognition as image captioning and directly transfer a unified vision-language pretrained model to the end task. Without pretraining on large-scale annotated or synthetic text recognition data, OFA-OCR outperforms the baselines and achieves state-of-the-art performance in the Chinese text recognition benchmark. Additionally, we construct an OCR pipeline with OFA-OCR, and we demonstrate that it can achieve competitive performance with the product-level API. The code (https://github.com/OFA-Sys/OFA) and demo (https://modelscope.cn/studios/damo/ofa_ocr_pipeline/summary) are publicly available.

READ FULL TEXT
research
11/02/2022

Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese

The tremendous success of CLIP (Radford et al., 2021) has promoted the r...
research
07/30/2023

Unified Model for Image, Video, Audio and Language Tasks

Large Language Models (LLMs) have made the ambitious quest for generalis...
research
08/04/2022

Prompt Tuning for Generative Multimodal Pretrained Models

Prompt tuning has become a new paradigm for model tuning and it has demo...
research
06/23/2021

Open Images V5 Text Annotation and Yet Another Mask Text Spotter

A large scale human-labeled dataset plays an important role in creating ...
research
02/28/2023

Turning a CLIP Model into a Scene Text Detector

The recent large-scale Contrastive Language-Image Pretraining (CLIP) mod...
research
09/28/2022

Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Humour is a substantial element of human affect and cognition. Its autom...
research
06/20/2023

MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian

Multimodal learning on video and text data has been receiving growing at...

Please sign up or login with your details

Forgot password? Click here to reset