A Multiplexed Network for End-to-End, Multilingual OCR

by   Jing Huang, et al.

Recent advances in OCR have shown that an end-to-end (E2E) training pipeline that includes both detection and recognition leads to the best results. However, many existing methods focus primarily on Latin-alphabet languages, often even only case-insensitive English characters. In this paper, we propose an E2E approach, Multiplexed Multilingual Mask TextSpotter, that performs script identification at the word level and handles different scripts with different recognition heads, all while maintaining a unified loss that simultaneously optimizes script identification and multiple recognition heads. Experiments show that our method outperforms the single-head model with similar number of parameters in end-to-end recognition tasks, and achieves state-of-the-art results on MLT17 and MLT19 joint text detection and script identification benchmarks. We believe that our work is a step towards the end-to-end trainable and scalable multilingual multi-purpose OCR system. Our code and model will be released.


page 4

page 6


Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Recently, models based on deep neural networks have dominated the fields...

A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

We study training a single end-to-end (E2E) automatic speech recognition...

Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes

We present two end-to-end models: Audio-to-Byte (A2B) and Byte-to-Audio ...

A Novel Integrated Framework for Learning both Text Detection and Recognition

In this paper, we propose a novel integrated framework for learning both...

End-to-End Subtitle Detection and Recognition for Videos in East Asian Languages via CNN Ensemble with Near-Human-Level Performance

In this paper, we propose an innovative end-to-end subtitle detection an...

Robust End-to-End Offline Chinese Handwriting Text Page Spotter with Text Kernel

Offline Chinese handwriting text recognition is a long-standing research...

Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition

In the deployment of scene-text spotting systems on mobile platforms, li...