TransDocs: Optical Character Recognition with word to word translation

04/15/2023
by   Abhishek Bamotra, et al.
0

While OCR has been used in various applications, its output is not always accurate, leading to misfit words. This research work focuses on improving the optical character recognition (OCR) with ML techniques with integration of OCR with long short-term memory (LSTM) based sequence to sequence deep learning models to perform document translation. This work is based on ANKI dataset for English to Spanish translation. In this work, I have shown comparative study for pre-trained OCR while using deep learning model using LSTM-based seq2seq architecture with attention for machine translation. End-to-end performance of the model has been expressed in BLEU-4 score. This research paper is aimed at researchers and practitioners interested in OCR and its applications in document translation.

READ FULL TEXT
research
04/10/2018

French Word Recognition through a Quick Survey on Recurrent Neural Networks Using Long-Short Term Memory RNN-LSTM

Optical character recognition (OCR) is a fundamental problem in computer...
research
09/10/2014

Sequence to Sequence Learning with Neural Networks

Deep Neural Networks (DNNs) are powerful models that have achieved excel...
research
07/02/2017

DAG-based Long Short-Term Memory for Neural Word Segmentation

Neural word segmentation has attracted more and more research interests ...
research
12/27/2017

CNN Is All You Need

The Convolution Neural Network (CNN) has demonstrated the unique advanta...
research
07/07/2022

Part-of-Speech Tagging of Odia Language Using statistical and Deep Learning-Based Approaches

Automatic Part-of-speech (POS) tagging is a preprocessing step of many n...
research
04/18/2018

NTUA-SLP at SemEval-2018 Task 2: Predicting Emojis using RNNs with Context-aware Attention

In this paper we present a deep-learning model that competed at SemEval-...
research
12/22/2016

Handwriting recognition using Cohort of LSTM and lexicon verification with extremely large lexicon

State-of-the-art methods for handwriting recognition are based on Long S...

Please sign up or login with your details

Forgot password? Click here to reset