Towards Scene-Text to Scene-Text Translation

08/06/2023
by   Onkar Susladkar, et al.
0

In this work, we study the task of “visually" translating scene text from a source language (e.g., English) to a target language (e.g., Chinese). Visual translation involves not just the recognition and translation of scene text but also the generation of the translated image that preserves visual features of the text, such as font, size, and background. There are several challenges associated with this task, such as interpolating font to unseen characters and preserving text size and the background. To address these, we introduce VTNet, a novel conditional diffusion-based method. To train the VTNet, we create a synthetic cross-lingual dataset of 600K samples of scene text images in six popular languages, including English, Hindi, Tamil, Chinese, Bengali, and German. We evaluate the performance of VTnet through extensive experiments and comparisons to related methods. Our model also surpasses the previous state-of-the-art results on the conventional scene-text editing benchmarks. Further, we present rigorous qualitative studies to understand the strengths and shortcomings of our model. Results show that our approach generalizes well to unseen words and fonts. We firmly believe our work can benefit real-world applications, such as text translation using a phone camera and translating educational materials. Code and data will be made publicly available.

READ FULL TEXT

page 1

page 4

page 6

page 8

page 9

page 10

page 14

research
04/10/2019

Cross-lingual Visual Verb Sense Disambiguation

Recent work has shown that visual context improves cross-lingual sense d...
research
07/12/2019

The University of Edinburgh's Submissions to the WMT19 News Translation Task

The University of Edinburgh participated in the WMT19 Shared Task on New...
research
01/10/2022

Transfer Learning for Scene Text Recognition in Indian Languages

Scene text recognition in low-resource Indian languages is challenging b...
research
09/22/2022

XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages

Multiple business scenarios require an automated generation of descripti...
research
05/19/2023

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots

Diffusion models have made impressive progress in text-to-image synthesi...
research
09/16/2023

Contextual Label Projection for Cross-Lingual Structure Extraction

Translating training data into target languages has proven beneficial fo...
research
09/14/2020

Adaptive Text Recognition through Visual Matching

In this work, our objective is to address the problems of generalization...

Please sign up or login with your details

Forgot password? Click here to reset