A Transformer-based Math Language Model for Handwritten Math Expression Recognition

08/11/2021
by   Huy Quang Ung, et al.
0

Handwritten mathematical expressions (HMEs) contain ambiguities in their interpretations, even for humans sometimes. Several math symbols are very similar in the writing style, such as dot and comma or 0, O, and o, which is a challenge for HME recognition systems to handle without using contextual information. To address this problem, this paper presents a Transformer-based Math Language Model (TMLM). Based on the self-attention mechanism, the high-level representation of an input token in a sequence of tokens is computed by how it is related to the previous tokens. Thus, TMLM can capture long dependencies and correlations among symbols and relations in a mathematical expression (ME). We trained the proposed language model using a corpus of approximately 70,000 LaTeX sequences provided in CROHME 2016. TMLM achieved the perplexity of 4.42, which outperformed the previous math language models, i.e., the N-gram and recurrent neural network-based language models. In addition, we combine TMLM into a stochastic context-free grammar-based HME recognition system using a weighting parameter to re-rank the top-10 best candidates. The expression rates on the testing sets of CROHME 2016 and CROHME 2019 were improved by 2.97 and 0.83 percentage points, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2017

Online Handwritten Mathematical Expressions Recognition System Using Fuzzy Neural Network

The article describes developed information technology for online recogn...
research
05/24/2023

Lexinvariant Language Models

Token embeddings, a mapping from discrete lexical symbols to continuous ...
research
02/26/2020

Recognizing Handwritten Mathematical Expressions as LaTex Sequences Using a Multiscale Robust Neural Network

In this paper, a robust multiscale neural network is proposed to recogni...
research
05/17/2023

Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions

While there is much recent interest in studying why Transformer-based la...
research
05/13/2021

Learning symbol relation tree for online mathematical expression recognition

This paper proposes a method for recognizing online handwritten mathemat...
research
08/20/2022

Offline Handwritten Mathematical Recognition using Adversarial Learning and Transformers

Offline Handwritten Mathematical Expression Recognition (HMER) is a majo...
research
11/04/2022

A Transformer Architecture for Online Gesture Recognition of Mathematical Expressions

The Transformer architecture is shown to provide a powerful framework as...

Please sign up or login with your details

Forgot password? Click here to reset