OCR-VQGAN: Taming Text-within-Image Generation

10/19/2022
by   Juan A. Rodriguez, et al.
19

Synthetic image generation has recently experienced significant improvements in domains such as natural image or art generation. However, the problem of figure and diagram generation remains unexplored. A challenging aspect of generating figures and diagrams is effectively rendering readable texts within the images. To alleviate this problem, we present OCR-VQGAN, an image encoder, and decoder that leverages OCR pre-trained features to optimize a text perceptual loss, encouraging the architecture to preserve high-fidelity text and diagram structure. To explore our approach, we introduce the Paper2Fig100k dataset, with over 100k images of figures and texts from research papers. The figures show architecture diagrams and methodologies of articles available at arXiv.org from fields like artificial intelligence and computer vision. Figures usually include text and discrete objects, e.g., boxes in a diagram, with lines and arrows that connect them. We demonstrate the effectiveness of OCR-VQGAN by conducting several experiments on the task of figure reconstruction. Additionally, we explore the qualitative and quantitative impact of weighting different perceptual metrics in the overall loss function. We release code, models, and dataset at https://github.com/joanrod/ocr-vqgan.

READ FULL TEXT

page 4

page 7

page 12

page 13

page 14

page 15

research
06/29/2023

DreamDiffusion: Generating High-Quality Images from Brain EEG Signals

This paper introduces DreamDiffusion, a novel method for generating high...
research
09/27/2018

Semantically Invariant Text-to-Image Generation

Image captioning has demonstrated models that are capable of generating ...
research
05/23/2023

Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach

Recent text-to-image generation models have demonstrated impressive capa...
research
05/19/2023

RxnScribe: A Sequence Generation Model for Reaction Diagram Parsing

Reaction diagram parsing is the task of extracting reaction schemes from...
research
12/29/2022

GPTR: Gestalt-Perception Transformer for Diagram Object Detection

Diagram object detection is the key basis of practical applications such...
research
06/01/2023

FigGen: Text to Scientific Figure Generation

The generative modeling landscape has experienced tremendous growth in r...
research
06/15/2017

Extracting Formal Models from Normative Texts

We are concerned with the analysis of normative texts - documents based ...

Please sign up or login with your details

Forgot password? Click here to reset