Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text Recognition

09/23/2020
by   Bingcong Li, et al.
0

Recently, inspired by Transformer, self-attention-based scene text recognition approaches have achieved outstanding performance. However, we find that the size of model expands rapidly with the lexicon increasing. Specifically, the number of parameters for softmax classification layer and output embedding layer are proportional to the vocabulary size. It hinders the development of a lightweight text recognition model especially applied for Chinese and multiple languages. Thus, we propose a lightweight scene text recognition model named Hamming OCR. In this model, a novel Hamming classifier, which adopts locality sensitive hashing (LSH) algorithm to encode each character, is proposed to replace the softmax regression and the generated LSH code is directly employed to replace the output embedding. We also present a simplified transformer decoder to reduce the number of parameters by removing the feed-forward network and using cross-layer parameter sharing technique. Compared with traditional methods, the number of parameters in both classification and embedding layers is independent on the size of vocabulary, which significantly reduces the storage requirement without loss of accuracy. Experimental results on several datasets, including four public benchmaks and a Chinese text dataset synthesized by SynthText with more than 20,000 characters, shows that Hamming OCR achieves competitive results.

READ FULL TEXT
research
08/04/2021

TextCNN with Attention for Text Classification

The vast majority of textual content is unstructured, making automated c...
research
05/18/2023

Less is More! A slim architecture for optimal language translation

The softmax attention mechanism has emerged as a noteworthy development ...
research
04/14/2021

Efficient conformer-based speech recognition with linear attention

Recently, conformer-based end-to-end automatic speech recognition, which...
research
09/04/2023

One Wide Feedforward is All You Need

The Transformer architecture has two main non-embedding components: Atte...
research
11/16/2021

TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance

Scene text recognition (STR) is an important bridge between images and t...
research
10/25/2021

Ultra Light OCR Competition Technical Report

Ultra Light OCR Competition is a Chinese scene text recognition competit...
research
10/31/2019

Parameter Sharing Decoder Pair for Auto Composing

Auto Composing is an active and appealing research area in the past few ...

Please sign up or login with your details

Forgot password? Click here to reset