Latexify Math: Mathematical Formula Markup Revision to Assist Collaborative Editing in Math Q A Sites

by   Suyu Ma, et al.
Monash University

Collaborative editing questions and answers plays an important role in quality control of Mathematics Stack Exchange which is a math Q A Site. Our study of post edits in Mathematics Stack Exchange shows that there is a large number of math-related edits about latexifying formulas, revising LaTeX and converting the blurred math formula screenshots to LaTeX sequence. Despite its importance, manually editing one math-related post especially those with complex mathematical formulas is time-consuming and error-prone even for experienced users. To assist post owners and editors to do this editing, we have developed an edit-assistance tool, MathLatexEdit for formula latexification, LaTeX revision and screenshot transcription. We formulate this formula editing task as a translation problem, in which an original post is translated to a revised post. MathLatexEdit implements a deep learning based approach including two encoder-decoder models for textual and visual LaTeX edit recommendation with math-specific inference. The two models are trained on large-scale historical original-edited post pairs and synthesized screenshot-formula pairs. Our evaluation of MathLatexEdit not only demonstrates the accuracy of our model, but also the usefulness of MathLatexEdit in editing real-world posts which are accepted in Mathematics Stack Exchange.



There are no comments yet.


page 1


Assessing Post-editing Effort in the English-Hindi Direction

We present findings from a first in-depth post-editing effort estimation...

Manual Post-editing of Automatically Transcribed Speeches from the Icelandic Parliament - Althingi

The design objectives for an automatic transcription system are to produ...

Translator2Vec: Understanding and Representing Human Post-Editors

The combination of machines and humans for translation is effective, wit...

Post-edit Analysis of Collective Biography Generation

Text generation is increasingly common but often requires manual post-ed...

A preliminary study on evaluating Consultation Notes with Post-Editing

Automatic summarisation has the potential to aid physicians in streamlin...

Stack Exchange Tagger

The goal of our project is to develop an accurate tagger for questions p...

Reputation Gaming in Stack Overflow

Stack Overflow incentive system awards users with reputation scores to e...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

As the foundation of science and engineering, mathematics has always been one of the most important subjects for students (JenlinkKarenEmbry2006ME) among all levels of education. It can help reshape the students’ reasoning, creativity, critical thinking and problem-solving abilities. On the other hand, mathematics is also one of the most challenging subjects for students  (JenlinkKarenEmbry2006ME; MiddendorfJessica2018IRtM; DanesiMarcel.author2016LaTM). Various mathematics community Question and Answer (Q&A) sites have been developed. They provide a platform for students to ask about anything related to the mathematics with fast feedback from peers and experts. Mathematics Stack Exchange (math) is an example of such a Q&A site for people studying mathematics at any level and for professionals in related fields. Since its launch in 2010, it contains 1.3 million questions and 1.7 million answers from 661 thousand users (math2).

However, the dramatic growth of posts and users on such Q&A sites poses a severe challenge to the quality assurance for site content. For example, some novice users may post questions and answers with grammar errors, misuse of abbreviations, blurred screenshots of formulas, and the lack of important information (e.g., context or referenced resources) for understanding questions and answers. Such quality deterioration can negatively influence the readability and understandability of posts, and may further discourage the participation of users (li2015good). To avoid the quality decay of the sites, Q&A sites like Mathematics Stack Exchange provide official recommendations for effective question writing (help3) and answer writing (help1).

Although all users are encouraged to follow the quality assurance policies, many users may still violate them carelessly or unintentionally, and some users may not even be aware of the existence of such policies. To ensure the site quality (MamykinaLena2011Dlft), Mathematics Stack Exchange encourages users, especially experienced users (help2), to collaboratively edit the posts to make them comply with the site quality standards. According to our analysis in Section 3, among 2,886,174 posts including questions and answers (as of Mar 2020), 1,318,753 (45.69%) of them have been edited at least once. Post edits involve not only minor corrections of misspellings and grammar errors, but also math-related such as formatting math formula for better readability, fixing mathematical mistakes.

Although collaborative editing is beneficial for the community (Li2016PredictingCE), there are still three problems with such mechanism. First, it requires significant community effort, especially from high-reputation users to edit the posts directly and/or approve the edits by the other users. Second, some errors in the original posts, especially relatively complicated ones such as math formula errors or format are difficult to spot, as they may require a good understanding of the question or answer content. Third, all these collaborative edits are reactive to existing errors which may have already harmed the readers of the posts before edits, or made it difficult for those who want to help to answer the questions.

Therefore, in addition to collaborative editing for reactive quality assurance, we also need a more proactive mechanism of quality assurance which could check a post before it is posted, spot the potential issues in the post, and remind the post owner to fix issues if any. Towards that target, there are some research works investigating the automated revision of Q&A posts (chen2017community; chen2018data; Li2016PredictingCE; vargo2016editing). However, different from other Q&A sites (e.g., Quora111 and Stack Overflow222 ), there are some domain-specific edits required on Mathematics Stack Exchange due to the mathematics characteristics which cannot be addressed by their automated methods. By analysing the topics in the comments of post edits and empirical study of historical math-related edits in Section 3, three editing types emerge including formula latexification, LaTeX formula revision, formula screenshot transcription, involving 516,010 related edits. Considering the wide range of post content and formats involved in post edits (see Table 2 for examples), it would require significant manual effort to develop a complete set of rules for representing editing patterns.

This challenge motivates us to develop a data-driven deep-learning based janitor for recommending post owners or other users potential math-related edits, by automatically learning from historical edits. Note that Mathematics Stack Exchange encourages users to use LaTeX, a computer programming language used in typesetting technical documents (math_latex), for writing mathematical formulas. It can be rendered to clear and nice-looking formulas by Mathjax (CervoneDavide2012MAPf). In this work, we formulate the post editing recommendation to a translation task, in which an original post is “translated” into the edited post. This includes translating erroneous/unformatted math formula to the correct/formatted one (textual LaTeX edit) and math formula screenshots to more readable ones in LaTex (visual Latex edit). We design two models for these two tasks and define them as textual edit model and visual edit model. We adopt transformer-based model for textual edits and use DenseNet (huang2017densely)

and Long Short-term Memory (LSTM) 

(hochreiter1997long-lstm) models for visual edits. To improve the performance of our tool, we adopt the sentence normalization which shortens the length of the input by directly copying the non-math content when fixing formula error and formula formatting (Section 4.3).

We also improve the basic screenshot transcription method by adding image similarity into the inference phase for boosting the quality of generated LaTex.

We name our tool as Mathematical Latex Editor (MathLatexEdit).

The trained MathLatexEdit automates the math-specific edits, thus removing the need for prior manual editing rule development. Our work is the first one to explore how to support collaborative editing in math-related question and answer sites. Our work fills in the gap between math-related post editing and cooperative work. This tool will not only help post owners reduce minor math-related issues before posting their questions and answers, but also help post editors improve their editing efficiency. Furthermore, the identified issues together with the recommended corrections will help novice post editors learn community-adopted editing patterns.

To train and test our model, we develop a text differencing algorithm to collect a large dataset of original-edited sentence pairs from post edits in Mathematics Stack Exchange for textual edit model and prepare a large dataset of LaTeX formula and image pairs for visual edit model. Our results show that our approach outperforms other rule-based or deep learning based baselines for post edit recommendation. Apart from the model performance, we also conducted a user study to confirm that participants with our tool can finish the editing with less time but higher precision, recall and satisfactoriness. The field study also shows its usefulness in which the first author acts as a novice post editor that has little post editing experience. Based on the post editing recommendations by our tool, he edited 80 posts and submit the edits to Mathematics Stack Exchange, and 78 of them were accepted. That is, for each accepted post edit, at least two trusted contributors considered that the edit had significantly improved the post quality.

We make the following key contributions in this paper:

  • We conducted an empirical study of collaborative post editing in Mathematics Stack Exchange including the editing types and editing content, identifying the need for an edit assistance tool for mathematics formulae.

  • We developed MathLatexEdit, a deep learning based edit assistance tool that can latexify math expression, revise LaTeX formulas, convert formula screenshots in Mathematics Stack Exchange to LaTeX sequence for further MathJax rendering to improve the post quality and readability. Although the current work studies Mathematics Stack Exchange , our data analysis method and deep learning approach could be applied to other social systems which is discussed in Section 7.1.

  • We evaluated MathLatexEdit from two perspectives, including the quality of post edit recommendation based on a large-scale dataset and the usefulness to assist a novice post editor in editing unfamiliar new posts.

2. Related Work

2.1. Mathematics Q&A Sites

Online question and answer (Q&A) sites are platforms for participants to ask and answer questions. With the power of crowd-sourced answers, Q&A sites, such as Quora333 and Stack Overflow444, are more and more popular with accumulating millions of questions and answers in different domains (ma2019easy; chen2017unsupervised; chen2019sethesaurus; cao2021automated). Due to the importance and challenges of learning math, math-specific Q&A sites have also been launched, such as Mathematics Stack Exchange 555 for students and MathOverflow666 for professional and expert researchers. MathOverflow has attracted great attention from researchers. Montoya et al. (montoya2013social) model MathOverflow as a social network for analyzing its social achievement and centrality. Tausczik et al. (tausczik2014collaborative) investigate the collaboration patterns when solving a research-level mathematical problem in MathOverflow.

However, few researchers explore the Mathematics Stack Exchange. As one of the most popular mathematics Q&A sites with 635 new questions and 200K visits per day (math2), Mathematics Stack Exchange deserves more attention from our research community. To enhance the post quality in Mathematics Stack Exchange, our study is the first work to explore its collaborative editing patterns and propose a deep learning method to assist post owners and editors to revise low-quality posts.

2.2. Collaborative Editing in Q&A Sites

Although collaborative editing is widely studied in some user-generated content (UGC) communities (e.g., Wikipedia (Wiki), wikiHow (wiki:how)), there are relatively few works focusing on Q&A sites. Collaborative editing can help improve the quality control in Q&A sites by converting low-quality posts to higher-quality ones (MamykinaLena2011Dlft). Li et al. (li2015good) demonstrate that it will not hurt the user engagement, despite the quality improvement. Ford et al. (ford2018we) show that collaborative editing with mentors can further improve engagement in Q&A site. Choi and Yla (choi2018will) find that the moderators in collaborative editing can help resolve the conflict of using tags to describe the questions which is caused by users’ background and understanding differences.

Although collaborative editing is important for the site quality, it takes much human effort to realise. To assist collaborative editing in online communities, some machine learning based methods have been proposed. Li et al. 

(Li2016PredictingCE) and Chen et al. (chen2018data)

developed a classifier to automatically predict if a post needs edits or what type of edits are required e.g. adding links, images, updating format, etc. Chen et al. 

(chen2017community) proposed a deep learning based model to automate some minor revisions such as misspelling, grammar errors and keyword formatting in sentences of the post.

Our MathLatexEdit tool differs in two key aspects. First, most related works above focus on the most popular programming Q&A site, Stack Overflow (MamykinaLena2011Dlft; li2015good; vargo2016editing; chen2017community), while we focus on the Mathematics Stack Exchange. We carried out an empirical study to explore the domain-specific edits in this math-related Q&A site. Second, in addition to their models based on the text of posts, we are also working on a more challenging type of collaborative editing, i.e., converting formula screenshots to LaTeX sequence that bridges the gap between visual and textual information.

2.3. Grammar Error Correction

We formulate the post editing recommendation in this work as a translation task i.e., translating the text, buggy latex commands and formula screenshots to corresponding correct latex commands. There are many research works on automated textual grammar error correction with machine learning (junczys2016phrase; mizumoto2016discriminative; yuan2016candidate) i.e., detecting and fixing grammar error of the original natural-language text. For example, Junczys-Dowmunt and Grundkiewicz design an approach that can automatically correct grammar errors with phrase-based Statistical Machine Translation (SMT) method (junczys2016phrase). Mizumoto and Matsumoto (mizumoto2016discriminative) and Yuan et al. (yuan2016candidate)

recommend ranked grammar error correction with SMT method and a ranking method. Unlike traditional SMT methods, Neural Machine Translation (NMT), such as RNN-based methods and transformer-based methods 


, utilize sentence context information and joint all the components in the training process. CNN based Seq2Seq and quality estimation methods are used by Chollampatt and Ng 

(chollampatt-ng-2018-neural) to automatically estimate the quality of GEC sentences. Grundkiewicz and Junczys-Dowmunt combine SMT and neural machine translation to automated Grammatical Error Correction  (grundkiewicz-junczys-dowmunt-2018-near). Different from their works, we are the first to target at mathematic-specific revisions i.e., the content change of latex commands by taking the mathematic characteristic into the consideration.

2.4. Mathematics Formula Accessibility

It is crucial for the community to make mathematical formula accessible (MiddendorfJessica2018IRtM; DanesiMarcel.author2016LaTM), as many users just take a screenshot of mathematics formula and put it online which can significantly reduce the readability of normal users and engagement of users with vision impairment. To assist the conversion of mathematics formula screenshots to relevant representation in LaTeX, researchers propose many different algorithms. A commercial OCR tool is used by Garain et al.  (garain2004identification) to classify text, and the unrecognized patterns were further analyzed to detect mathematics formulas. Based on larger symbols and blank spaces, T waaliyondo el al. (twaakyondo1995structure) divide the formulas into sub expressions and represent them as a tree. Suziki et al. (garain2004identification) use a similar approach with a minimum cost spanning-tree algorithm. The commercial software InftyReader (SuzukiMasakazu2003IaiO-infty)

is based on this work. Inspired by the image captioning tasks 

(xu2015show; chen2018ui; chen2020unblind), Deng et al. (deng2016you) designed a more advanced deep learning based model.

Compared with these general works, our image transcribing algorithm is the first targeting at collaborative editing in the real Mathematics Stack Exchange. Our approach not only takes the textual Latex edit into consideration and is also more advanced by incorporating DenseNet and inference with additional visual similarity for converting screenshot into Latex representation described in Section 4. Different from these theoretical works only focusing on the model performance on testing dataset, we also carried out a user study and a field study in Section 6 to verify the usefulness in assisting with real-world post edits.


Different from other Q&A sites (e.g., Quora and Stack Overflow), there are some domain-specific edits required on Mathematics Stack Exchange due to the mathematics characteristics which cannot be addressed by automated methods. We downloaded the latest data dump777 of Mathematics Stack Exchange which contains 2,886,174 posts (including 1,216,368 questions and 1,669,806 answers) and all post edits since its launch on July 20, 2010 to March 1, 2020. Based on this large dataset, we carried out an empirical study of post edits in Mathematics Stack Exchange to understand the characteristics of post editing in Mathematics Stack Exchange and to motivate the required tool support. This empirical study allows us to re-frame our research from the perspectives of quality control and beneficial aspects of our tool.

3.1. What are the edits about?

In Mathematics Stack Exchange, there are three kinds of post information which can be edited – question tags (chen2016mining; chen2016techland), question title, and post (question and answer) body (MamykinaLena2011Dlft). Question-title and post-body editing are of the same nature (i.e., sentence editing), while question-tags editing is to add and/or remove the set of tags of a question.

As of March 1, 2020, there have been in total 2,696,115 post edits. Among them, 334,136 (12.39%) are question-title edits, 286,715 (10.63%) are question-tag edits, and the majority of post edits (2,075,264 (76.97%)) are post-body edits. The tags of 268,620 (22.08%) questions, the titles of 243,184 (20.00%) questions, and the body of 1,075,420 (37.26%) posts have been edited at least once. 63.21% of these edits are self-edits by the post owners and the other 36.79% are collaborative edits by other post editors. Overall, post-body edits make up the majority of post edits, and post-body editing are more complex compared with revising title or question tags. Therefore, we focus on post-body edits in this work. Hereafter, post edits refer to post-body edits, unless otherwise stated.

3.2. Who edited posts?

Among all 2,075,264 post edits, 1,228,763 (59.21%) are self-edits by the post owners, 846,501 (40.79%) are edits by other users. This data suggests that an edit assistance tool may benefit the Mathematics Stack Exchange community from three perspectives.

First, the tool can highlight issues in posts that post owners are creating and proactively remind them to fix the issues. This reduces the need for editing after the creation and ensures the post quality in the first place. Second, an edit assistance tool that recommends edits can improve the efficiency in editing others’ posts by providing a reliable starting point. Third, the post editors can use our tool to learn to correct minor domain-specific issues. Such small editing tasks can provide a mechanism of legitimate peripheral participation (lave1999legitimate) for novice users who have no experience in Mathematics Stack Exchange . This may help to on-board novice users in Mathematics Stack Exchange and improve the quality of their post edits.

3.3. What has been edited?

To understand what post edits are about, we analyzed the comments attached to the post edits. In Mathematics Stack Exchange, when users finish editing a post, they can add a brief comment to explain the editing reasons. We collected all post-edit comments and applied standard text processing steps to post-edit comments such as removing punctuations, lower-casing all characters, and excluding stop words.

Topic Name Keywords

spelling & grammar grammatical error, spelling, typo
2 adding links wikipedia link, broken link, linked
3 explaining the question descriptive, explanation, definition
4 converting image replace image, latexifying image, transcribed image
5 revising mathematical content algebraic, mathematical, mathematics

changing format adjust format, formatting, formatted
7 improving readability improve readability, easier read, make readable
8 modifying LaTeX formula tex, improving latex, tex improvement, applied mathjax

Table 1. Topics of post-edit comments

We extracted the topics from users’ editing comments to understand their editing content. To extract common editing types, we adopted the Latent Dirichlet Allocation (LDA) (blei2003latent-lda) model to analyze the post-edit comments. LDA is a statistical model for discovering abstract topics that occur in a collection of documents in which each topic consists of a set of keywords. A significant limitation of LDA is that it considers only single words (i.e., unigrams). However, a single word may not capture the exact semantics of the post-edit comments. In contrast, phrases that are composed of several words are more intuitive to understand the intention behind post edits, such as “latexifying image” instead of “image”, “improving latex” rather than “latex”. Therefore, these multi-word phrases must be recognized and treated as a whole in LDA model.

We adopted a simple data-driven and memory-efficient approach (mikolov2013distributed) to detect multi-word phrases in post-edit comments. In this approach, phrases are formed iteratively based on the unigram and bigram counts, using the following formula . The and are two consecutive words. is a discounting coefficient to prevent infrequent bigrams to be formed. That is, the two consecutive words will not form a bigram phrase if they appear as a phrase less than times in the corpus. N is the vocabulary size of the corpus. In this work, we experimentally set as 10 and the threshold for score as 10 to achieve a good balance between the coverage and accuracy of the detected multi-words phrases.

Our method can find bigram phrases that appear frequently enough in post-edit comments compared with the frequency of each unigram, such as “improve readability”. However, bigram phrases like “it is” will not be formed because each unigram also appears very frequently in the text. All these phrases are then concatenated with underline like “improve_readability” in the corpus of post-edit comments, and then we use the LDA model to extract the topics.

We extracted 8 topics with corresponding keywords shown in Table 1. Note that we annotated the topics name with our own summarization based on the topic keywords. Different from individual frequent keywords, these topics provide finer-grained information. Apart from common types mentioned above, there are also some domain-specific editing patterns related to mathematics formula (#5 revising mathematical content, #8 modifying LaTeX formula) and readability (#4 converting image, #6 changing format, #7 improving readability). These two common editing types represent the community norms that Mathematics Stack Exchange commits its effort to maintain.

3.4. What efforts have been committed to the domain-specific edits?

1 changing formula format
2 correcting mistakes in formula

3 replacing image with LaTeX formula

Table 2. Types of domain-specific accessible edits

As mentioned in Section 3.3, many post edits are domain-specific and highly related to mathematics, especially mathematics formula and readability, according to the analysis of the editing comments. To further explore what edits are about, we extracted the edits with editing comments including domain-specific words such as “formula”, “LaTeX”, “math”, etc. By traversing the commented editing history in Mathematics Stack Exchange, we collected 155,369888Note that only 726,342 edits contain comments, so the number is highly underestimated. domain-specific edits in total. We then randomly selected 100 of them for manual inspection. According to our observation, there are three common mathematics domain-specific editing patterns, as seen in Table 2.

First, 29 edits are to change a plain formula into LaTeX format with MathJax rendering for a clearer view, resulting in better readability. It may be the first time for some users to join the Mathematics Stack Exchange, so they write a plain math formula in the post. But editing it into LaTeX can distinguish the mathematics formula from the plain text so that other users can easily understand the meaning of the post, leading to higher possibility of responses. For instance, “x - root2” is converted into “” in the first example of Table 2. Compared with natural English words, LaTeX mathematical expression highlights these numerical tokens and help readers easily capture the key point of the posts.

Second, 35 edits focus on correcting mistakes in formulas in order to clarify the content. Some tokens in formulas are missed or misspelled. Even the experienced users may make mistakes when writing math formulas, especially for the complicated ones. Since the math formula may be the most important content, many edits are targeting at it for more accurate information. For instance, extra is removed in second example of Table 2, which improves the quality of the post.

Third, 8 edits are to convert the blurred formula images into LaTeX which is further rendered to a vector graph so that users can zoom in or out for better view in the third example of Table 

2. There are mainly two kinds of blurred formula images, which includes screenshots and pictures captured by cameras. Compared with the blurred formula images, the LaTeX format also makes it easy to be indexed by the search engine, resulting in the searchable text.

The rest edits are not related to mathematics, such as adding or deleting the content in posts, correcting the grammar errors in posts, or changing the display formats.

3.5. How many mathematic-specific edits are there?

In Mathematics Stack Exchange, all mathematical formulas should be written in LaTeX format with further MathJax rendering for clear visualization. For each of the three types of mathematics-specific edits (formula latexification, LaTeX revision, and screenshot transcription) appear most often in the last section, we observe the corresponding detailed text changes for detecting instances of a particular type of edits and their frequencies as follows:

  • To render better readability of mathematic formula, Mathematics Stack Exchange encourages users to annotate their formula in LaTeX with at the beginning and end (Table 2.(1)). Users also need to use the LaTeX grammar like some special symbols. The revision from plain math formula in text into LaTeX is called as formula latexification.

  • To improve the accuracy of formulas, users are encouraged to revise others’ formula LaTeX in posts which is called latex revision (Table 2.(2)) in this work.

  • To improve the readability and accessibility of math formula, the embedded image/screenshots which require a special link (Table 2.(3)) ending with postfix including .png, .jpg, .gif, .bmp and .tiff need to be converted to formula in LaTeX with at the beginning and end. We refer to this kind of revision of screenshot links as screenshot transcription.

By differencing the original post body and the edited post body and following last three patterns, we count the number of different types of edits. There exist 627,251 math-specific edits in total, in which 516,010 (24.86% out of all post body edits) edits are following three types.

  • 169,006 (169,006 / 516,010 = 32.75%) post edits include formula latexification.

  • 288,947 (55.99%) post edits include LaTeX revision, and 548,230 formulas are revised in total.

  • For screenshot transcription, 59,141 (11.46%) post-body edits include that conversion.

Considering the diversity of text, formula, pixels and context involved in math-related edits, it would require significant manual effort to develop and validate a complete set of rules for representing editing patterns. For example, “root” should be converted to “\sqrt” in latex, but will not be changed in many other contexts. So, it is impractical to enumerate all such cases. An advanced approach is highly needed for automated editing.

Summary: 45.69% posts in Mathematics Stack Exchange have been edited involving a variety of editing types, including fixing grammatical errors, clarifying the meaning of a post, formatting the post and adding related resources or hyperlinks. In addition, there are also many domain-specific edits like formula latexification, LaTeX revision, screenshot transcription for better readability and accessibility. These 516,010 math-specific edits (24.86% out of all post body edits) require much more human effort to maintain in the community due to the complexity of math formulas. Therefore, a post edit recommendation algorithm is needed to assist users in Mathematics Stack Exchange to correct math-specific errors and assist these three math-related edits for ensuring the post quality.

4. Recommending Latex Edits by Deep Neural Network

The three types of post edits related to math formula in our empirical study highlight the community efforts for ensuring the post quality in Mathematics Stack Exchange. Unfortunately, these efforts and revisions are implicit knowledge in millions of post edits. Considering the diversity of post editing types and contexts, it would require significant human effort to build a complete set of rules to deal in all different situations. Therefore, we developed a deep-learning based approach which can automatically improve the post editing patterns from historical post edits, and recommend edits to the new posts based on the learned editing knowledge.

4.1. Overview of our MathLatexEdit Approach

Figure 1. Workflow of our approach

The workflow of our approach is shown in Fig 1. Given three types of math-specific edit, we separate them into two tasks i.e., textual LaTeX edit (formula latexification, LaTeX revision) and visual LaTeX edit (screenshot transcription). Our approach first collected a large corpus of original-edited sentence pairs of modifying math formulas for subsequent textual LaTeX edit and synthesized a large corpus of image-formula pairs for model training for subsequent visual LaTeX edit (Section  4.2). For textual LaTeX edits, our approach trained a transformer based model on a large parallel corpus of original-edited sentence pairs (Section  4.3). For visual LaTeX edits, our approach adopted an encoder-decoder model for converting the formula screenshot to LaTeX representation based on synthesized image-formula pairs (Section  4.4). We specify the implementation details in Section 4.5.

4.2. Data collection

4.2.1. For formula latexification and LaTeX revision tasks

A post may have been edited several times. Assume a post has versions, i.e., undergoing post edits. For each post edit , we collect a pair of the original and edited content. The original content is from the version of the post before the edit, and the edited content is from the version of the post after the edit. Then, we split the content into sentences.

We then align the sentence list from the original content and the sentence list from the edited content. For a sentence in the , if the similarity score of the most similar sentence in the is above a threshold, the two sentences are aligned as a pair of original-edited sentences. To calculate the similarity between one original sentence and edited sentence , we calculate the Levenshtein distance (levenshtein1966binary)

between two sentences. The similarity threshold should be set to achieve a balanced precision and recall for sentence alignment. Therefore, we experimentally set the threshold at 0.9 in this work. Note that some small edits may not influence readability a lot, however, the aggregating effect of several small issues in one post are not human-tolerable. Therefore, we still take them into consideration during data collection.

We also filter out non-mathematical and noisy edits using the rule below: We remove the sentence pairs that do not include LaTeX formulas in original and edited sentences and sentence pairs in which the LaTeX formula does not change. We also remove too long (char number ) or too short (char number ). In total, we collect 220,093 sentence pairs before March 1, 2020.

4.2.2. For formula screenshot transcription task

In Mathematics Stack Exchange, most users follow the guidelines by posting mathematics equations into LaTeX rather than as an image. To collect the data for training our visual LaTeX edit model, we collected all LaTeX mathematics equations from Mathematics Stack Exchange and then rendered them to an image with automated scripts. First, we extract mathematics formulas using regular expressions i.e., extracting text within special annotations like “begin\{equation\}(.*?)end\{equation\}” and “\$([^\$]*?)\$”. By matching the raw posts content with regular expressions, we collected 5,505,098 raw LaTeX snippets.

Second, to filter out some noisy data, we removed the sequences without any mathematics characters such as “+”, “\frac”. We also removed duplicate LaTeX sequences and too long (char number ) or too short (char number ) LaTeX sequences.

Third, we tokenized the 2,068,744 remaining LaTeX sequences into separate words, especially the domain-specific tokens such as ””, ”” and ”” as one word. To avoid ambiguity i.e., the same image can be generated by different LaTeX, we developed a LaTeX parser to keep the same LaTeX formatting. For instance, we replace “” with “”, change “” to “” and delete “”. We converted these LaTeX formulas to images with and excluded any formulas that failed to compile. Note that these synthesized images are different from the real-world screenshots. The synthesized images are generated following the same rules while real-world screenshots differ greatly from users to users. To bridge the gap between the GUI screenshots from the two resources, we applied an image augmentation method to transform the synthesized images to mimic those found in real-world images. To do this, we randomly applied different sizes and resolution (i.e., DPI(dots per inch)) when rendering the synthesized images based on the collected LaTeX formula. Some resulting formula screenshot examples by the image augmentation can be seen in Fig 2.

Figure 2. Examples of synthesized data

4.3. Textual Latex Edit Recommendation

Figure 3. Textual LaTeX edit structure of MathLatexEdit

The textual LaTeX edit recommendation can be treated as a machine translation problem by treating the original post sentence as input and edited sentence as output. Therefore, we adopt neural machine translation model (gao2019neural; wang2019domain) to learn the mapping from the source sentence to the target sentence.

As shown in fig 3, an attention-based transformer architecture is used for formula latexification and LaTeX revision. Given the source word tokens , the goal is to predict the target word tokens . The source word tokens are the original post sentence, and the predicted output word tokens are the edited post sentence.

The performance of deep learning models heavily depends on the quality of the training data. In particular, our transformer model is sensitive to the input length i.e., the longer input sequence always results in worse performance as it is hard for the model to capture the long semantic especially for those long complex math formulas. To mitigate that issue, we develop a normalization way to preprocess the input before feeding it to the model. Many sentences within our dataset is of both mathematical formula and natural-language part. According to our observation, the non-math related content is rarely changed and math related content is always of special symbols like punctuations (e.g., “$+-=*”), numbers (e.g., 2, 3.14), variables (e.g., a, b, x, y) or commands in LaTex (e.g., ). Therefore, we manually construct a list of rules for detecting the potential mathematical content within the sentence, and replace the non-math part with the same placeholder symbol in the input. To further shorten the length of input sequence, we also normalize the number with another placeholder symbol since it is rarely changed during the edit. For example, the original sentence ”my first though was to factor by doing ( 2 + e x - e x ) / ( e ( - x ) + 1 ) but that negative in the denominator is not letting me solve the problem ?” is preprocessed into ”COMMON_WORDS ( 2 + e x - e x ) / ( e ( - x ) + 1 ) COMMON_WORDS”, which highly shortens the length of the input.

The normalized sentence pairs are used to train or test our transformer model. Given an original sentence to be edited, the trained model will change it into an edited sentence. The special symbols are then mapped back to the original domain-specific words in a post-processing step. For instance, the edited sentence ”COMMON_WORDS $ frac{ 2 + e x - e x }{ e ( - x ) + 1} $ COMMON_WORDS” is mapped back to ”my first though was to factor by doing $ frac{ 2 + e x - e x }{ e ( - x ) + 1} $ but that negative in the denominator is not letting me solve the problem ?”.

4.4. Visual Latex Edit Recommendation

Our MathLatexEdit tool applies an encoder-decoder structure for screenshot transcription, in which the encoder uses DenseNet, and the decoder uses LSTM with attention mechanism. The overall structure is shown in Fig 4.

Figure 4. Visual LaTeX edit structure of MathLatexEdit

To get the feature maps of the input images, DenseNet is first used in the encoder to extract a feature map

. Unlike traditional Convolutional Neural Network (CNN), DenseNet connects each layer to every subsequent layer 

(huang2017densely; wang2019image). The output features of DenseNet contain sequential order information. Thus we use another LSTM encoder to re-encode each row of DenseNet’s output feature map. Based on the feature map , we use LSTM and an attention mechanism (vaswani2017attention) as decoder to generate a sequence of predicted LaTeX tokens.

Figure 5. Overview of inference with image similarity

After training the visual LaTeX edit model, we use it to generate the LaTeX sequence for a mathematics screenshot . As seen in Fig 5, we first adopt a beam search approach to select the top-5 candidates. For each candidate, we render it into a formula image to compare the similarity of it with the original one and then select the most similar one as the generated result.

Given the input screenshot, the generated LaTeX sequence should have the maximum log probability

. However, generating a global optimal LaTeX sequence has an immense search space. Therefore, we adopt a beam search to expand only a limited set of the most promising nodes in the search space.

After selecting the top-5 candidates with the highest probability, we further render them into mathematical formula images. We compare these rendered images with the original input screenshot, and re-rank 5 candidates by image similarity. To calculate image similarity, we firstly crop and resize these two images to same size. We then convert images to a binary image with pixel values as either 0 or 1, where 0 as black and 1 as white. For each column, we can convert a sequence of 0 and 1 into a value with binary system. Let be the image array that is converted from original image . Let be the image array that is converted from rendered image . We calculate the Levenshtein distance between and as . We can get the image similarity as , where is the max length of and . Based on the image similarity of 5 candidates, we select the one with the highest similarity and its corresponding LaTeX sequence as the generated result of MathLatexEdit.

4.5. Implementation


is based on Pytorch


(textual LaTeX edit recommendation model) and Torch


(visual LaTeX edit recommendation model) and all experiments were run on a 11GB NVidia TITAN Xp. The following settings are same for the two models. Minibatch stochastic gradient descent is used to optimize the parameters. The initial learning rate is set as 0.1. The training epoch is set as 25 and perplexity is used to select the best model during the validation step. Once the validation score does not decrease, we halve the learning rate. Beam search is used during the test step, and the beam size is 5. We also release the source code

111111in of this project.

5. Evaluating the Quality of MathLatexEdit

Our MathLatexEdit tool aims to help Mathematics Stack Exchange Q&A site post owners and editors to more effectively and collaboratively edit the math-related content in the post. The quality of the generated recommendation will hence greatly affect the utilization of MathLatexEdit by the community.

5.1. Dataset

For textual LaTeX edit recommendation, from 8,530,558 posts in Math Stack Exchange, we collected 219,420 original-edited sentence pairs about formula latexification and LaTeX revision. We randomly took 186,507 (85%) of these sentence pairs as the training data, 10,971 (5%) as the validation data and 21,942 (10%) as the testing data to evaluate the quality of recommended edits by our tool.

For visual LaTeX edit recommendation, we collected 2,068,744 LaTeX sequences. Based on these extracted formulas, we generated 1,000,000 image-formula pairs. We randomly selected 900,000 (90%) of these pairs as the training data, 50,000 (5%) as the validation data to tune model hyperparamaters, and 50,000 (5%)121212We only use 5% for testing as the image processing is much slower than text processing. as the testing data () to evaluate the quality of converted LaTeX sequences by MathLatexEdit. Apart from these synthesized data, we also collected the real-world post editing history where human editors have converted a posted math formula screenshot to LaTeX. For each post edit, we compared the original and edited posts to check if some images in the original post are replaced by LaTeX sequence in the corresponding position of the edited post. From 12,463 image-formula pairs that the images are manually converted to LaTeX sequences by editors, we randomly selected 100 of them as another testing set () to evaluate the performance of MathLatexEdit.

5.2. Baselines

Apart from our own MathLatexEdit, we selected other four methods as baselines for our comparisons. For textual LaTeX edit recommendation, the first baseline is a grammar error correction sequence to sequence (Seq2seq) model that contains a bidirectional RNN as an encoder and an attention-based decoder  (yuan2016grammatical). The other baseline is the phrase-based Statistical Machine Translation (SMT) model (ortiz2005thot) specifically designed for sentence correction. To check the influence of sentence normalization, we also take the derivative of our approach as the baseline i.e., our own MathLatexEdit without sentence normalization. For all deep-learning models, we use the same training data to train the model.

For visual LaTeX edit recommendation, the first baseline is the InftyReader, a commercial software mathematical expression recognition system. This tool is an OCR-based system, which combines symbol recognition and structural analysis phases (SuzukiMasakazu2003IaiO-infty). The second baseline is the deep learning based model, WYGIWYS, which is specially designed to convert formula images to LaTeX sequences (deng2016you). We also add our approach derivative i.e., MathLatexEdit without image similarity as another baseline.

5.3. Evaluation metrics

We adopt the BLEU (BiLingual Evaluation Understudy) (papineni-etal-2002-bleu)

for evaluating the quality of textual and visual LaTeX edit recommendation. BLEU is an automatic evaluation metric widely used in machine translation studies. It calculates the similarity of machine-generated translations and human-created reference translations (i.e., ground truth).

GLEU (Generalized Language Understanding Evaluation) is also used for evaluating textual LaTeX edit recommendation. In the GEC field, recent released shared tasks have prompted the development of GLEU for evaluating GEC approaches  (napoles2016gleu).   GLEU is a customized metric from the BLEU score which is widely used to evaluate the machine-translation quality. (yuan2016grammatical) It is independent of manual-annotation scheme and requires only reference sentences (without annotations of gold-standard edits). Recent study shows that GLEU has the correlation with human judgments of GEC quality and effort (napoles2016gleu). Since it requires the input sequence, we do not use it for evaluating visual latex recommendation.

Apart from BLEU score, we also adopt the image similarity to check the quality of visual LaTeX edit recommendation by measuring the image similarity between the given screenshot and rendering image based on the generated Latex. As the generated LaTeX can also be used to render the math image, we use the predicted LaTeX sequences and ground-truth LaTeX sequences to generate pair of images with the same resolution. We binarize the pair of images so that the pixel values are all 0 and 1, where 0 means black and 1 means white. Then we convert the image into a one dimension array and remove the elements that only contain 1 value. The following steps for image similarity is the same as what we use in Section


5.4. Evaluation Results

We report the evaluation results of MathLatexEdit from two aspects i.e., the performance of MathLatexEdit in recommending edits to formula latexification and LaTeX revision, and how accurate MathLatexEdit can be to generate the LaTeX sequence for a given math screenshot.

5.4.1. Performance of textual LaTeX edit recommendation

Models BLEU score GLEU score
SMT 59.22 52.64
Seq2seq 78.60 72.91
MathLatexEdit without sentence normalization 80.41 74.69
MathLatexEdit 82.30 76.57

Table 3. The Performance of different methods for textual LaTeX edit models

Original sentence
Our model Seq2seq SMT

1 formula: y + py = px - 2p for which value ( s ) of p 1
formula: $ y + py = px - 2p $ for which value ( s ) of $ p $ 1 formula: $ y + py = px - 2p $ for which value ( s ) of p 1 formula : $ y + py = px - 2p $ for which value ( s ) of p 1

2 ’ i ’ is part of the ratio
$ i $ is part of the ratio ’$ i $’ is part of the ratio ’ i ’ is part of the ratio

3 we have seen also that the primitives f ( x , y )
we have seen also that the primitives $ f ( x , y ) $ we have seen also that the starts $ f ( x, y ) $ we have seen also that the primitives $ f ( x , y ) $

4 can some one explain $ f ( n ) = 10 * log ( n ) $
can some one explain $ f ( n ) = 10 log ( n ) $ can some one explain $ f ( n ) = 10 * log ( n ) $ can some one explain $ f ( n ) = 10 * log ( n ) $

Table 4. Examples of textual LaTeX edit with different methods
Models BLEU score Image Similarity
INFTY() 67.72 50.21
INFTY() 47.24 43.12
WYGIWYS() 89.92 90.22
WYGIWYS() 70.21 77.21
MathLatexEdit() without image similarity 90.32 91.45
MathLatexEdit() without image similarity 73.21 79.36
MathLatexEdit() 91.78 92.23
MathLatexEdit() 74.71 81.61
Table 5. The Performance of different methods with and for visual LaTeX edit

Table 3 presents the two metrics score of different methods for modifying post sentences. Our MathLatexEdit achieves the best overall result with the average BLEU score as 82.30, GLEU score as 76.57 which is 4.7%, 5.0% higher than Seq2seq model and 38.97%, 45.46% higher than SMT. The improvement in the two scores by our model represents a significant improvement over the two baseline methods. Note that in more detail, our model achieve 83.74 BLEU score, 77.41 GLEU score in formula latexification and 81.63 BLEU score, 76.03 GLEU score in LaTex revision.

To qualitatively understand the strengths and weaknesses of different methods, we analyzed and compared the test results from three methods. Table 4 lists some representative examples in which MathLatexEdit outperforms the two baseline methods. Each row contains an original sentence and three edited sentences which are modified by different methods. SMT can edit some domain-specific words (e.g., f(x,y) to $f(x,y)$ in 3rd example). But it often preserves the original sentences that should be edited. For instance, SMT fails to convert ”p” into ”$ p $” in 1st example, fail to convert ”’i’” into ”$ i $” in 2nd example and fail to convert ”*” and ”log” into ”cdot” and ”log” in 4th example. Therefore, SMT does not work well for minor math-related changes in post edits because it cannot fully utilize the context information.

Seq2seq works better than SMT, but still fails to edit some tokens. For example, ”’i’” is incorrectly converted to ”’$ i $’” with an additional ”’” in 2nd example and ”primitives” is incorrectly converted into ”starts” in 3rd example, ”primitives” is incorrectly converted into ”starts” in 3rd example, and it fails to convert ”*” into ”cdot” in 4th example. The reason may be that Seq2seq incorrectly learns some biased knowledge from the training dataset and cannot handle the rare patterns. However, our MathLatexEdit can avoid the incorrect modification in sentence tokens with sentence normalization and provide better results in all the examples.

The ablation study of our model without sentence normalization shows that the domain-specific normalization contributes 2.3% and 2.5% improvement than the vanilla model in BLEU and GLEU score. By analyzing low-quality recommendations by our model, we find four main reasons why our recommendation does not match the ground truth. First, some sentences are edited to add more information which is beyond the context of a sentence, such as editing ”i need to find weight for x and y ” to ”I need to find weight $ w _ 1 $ and $ w _ 2 $ ”. Our current model considers only the local context of a sentence. To support such complicated edits, the broader context of the sentence (i.e., previous and subsequent sentences) need to be considered in the future.

Second, our model may provide better results compared with the ground truth. For example, our model edit the sentence ”if f f is differentiable, then f f f is differentiable ? ” to ”if $ f circ f $ is differentiable, then $ f circ f circ f $ is differentiable?”. While the ground truth is ”if f f is differentiable, then $ f f f $ is differentiable?”. Compared with the ground truth, our model not only convert ”” to ” circ”, but also correctly add ”$” to ”f f”.

Third, different users may have different opinions regarding what should or should not be edited. For example, some users will edit the sentence ”what i know so far is $ m = ( y _ 2 - y _ 1 ) / ( x _ 2 - x _ 1 ) $” to ”what I know so far is $ m = frac { y _ 2 - y _ 1 } / { x _ 2 - x _ 1 } $”. However, many other users will not do that. Many revert-back edits we see when collecting original-edited sentences are the evidence of such different opinions. Different editing opinions often result in non-obvious editing patterns, which machine learning techniques cannot effectively encode.

Forth, the sentence length is a crucial factor influencing the performance of the model. Our model sometimes cannot fully correct all the errors in long LaTeX sequence. For example, so $ k = 0 . 5 * sqrt ( - 16 ) = 2i $ $ f _ 1 ( x ) = e ( 2ix ) = cos ( 2x ) + isin ( 2x ) $ $ f _ 2 ( x ) = e ( - 2ix ) = cos ( 2x ) - isin ( 2x ) $ is the solution is converted into so $ k = 0 . 5 sqrt ( - 16 ) = 2i $ $ f _ 1 ( x ) = e ( 2ix ) = cos ( 2x ) + isin ( 2x ) $ $ f _ 2 ( x ) = e ( - 2ix ) = cos ( 2x ) - i sin ( 2x ) $ is the solution. However, our model fails to convert 0 . 5 * to 0 . 5 cdot , e ( 2ix ) to e { 2ix } and e ( -2ix ) to e { -2ix }. To support such long sequence edits, splitting the long-length post sentence need to be considered in the future.

5.4.2. Performance of visual LaTeX edit recommendation

Table 5 presents the two metrics score of different methods for converting formula images to LaTeX formula in the dataset of 50,000 image-formula pairs and of 100 image-formula pairs.

In , our MathLatexEdit achieves the best overall result with the average BLEU score as 91.78 and image similarity as 92.23. Due to the limitation of conventional image processing, INFTY gets the worst performance among all three models. The Deep learning based model, WYGIWYS has much better performance than INFTY. But MathLatexEdit can still achieve 2.0% and 2.2% boost in BLEU score and image similarity. In , all three models show slightly worse performance compared with , as it is a more challenging task. MathLatexEdit can still outperform the other two baselines with reasonably good performance in all metrics. Adding re-ranking based on the image similarity during inferring helps our model improve 1.6% BLEU score in synthesized data and 2.8% image similarity score in real data.

Figure 6. Examples of converted LaTeX formulas by different models

To qualitatively understand the strengths and weaknesses of different methods, we randomly selected 600 results from three methods for a detailed comparison. Fig 6 lists some representative examples in which MathLatexEdit outperforms the two baseline methods. Each row contains an original formula image and three formula images rendered by the predicted LaTeX sequences by three models. INFTY does not work well for most cases as its rule-based methods do not scale well. It first segments the characters in the image and then recognizes each character by comparing it with candidates in their database. But there might be exceptions in each step and the errors in one step will be further amplified in the consecutive steps. For example, it cannot segment the and from in Fig 6 (a). The deep learning model, WYGIWYS behaves well in most cases, but it still makes mistakes especially for formulas with very fine-grained information. For example, it incorrectly predicts “” as “” in Fig 6(b) and “” as “” in Fig 6(c), as these characters are too close to each other and relatively small compared with the whole screenshot.

Figure 7. Examples of wrongly predicted LaTeX formulas by MathLatexEdit

To analyze the low-quality recommendation by our model, we randomly selected some predictions which do not match the ground truth in both and . According to our observation, we summarize three reasons for those erroneous predictions as Fig 7.

First, some input images have long or complex formulas, which makes it difficult to precisely predict the whole formula. For example, Fig 7.(a) is one formula image whose formula has a long length and complex structure. Although our predictions are right for most tokens, we still misclassify “” as “”, “” and “” as “” and “”. The longer the formula is, the more difficult we predict an exact match.

Second, some tokens in input formula images have some rare characters. For example, the special combination of , and as in Fig 7.(b) is rarely used. In the generated formula image, all other tokens except are correctly predicted, but is classified as . In our collected 2,068,744 math LaTeX sequences from Mathematics Stack Exchange, only appears once, resulting in the miss in MathLatexEdit.

Third, some input formula screenshots are too blurred or with some noisy background. For example, Fig 7.(c) is a screenshot taken from a book with low resolution and gray background. Most part of the generated LaTeX formula matches the ground truth, but there are still some minor errors like misclassifying ”” as ””.

6. Real-world Evaluation of Edits

Having established confidence in the quality of modified posts by MathLatexEdit, we further investigated whether the recommended edits by MathLatexEdit can actually assist post editors especially those with little editing experience or expertise to successfully improve the quality of mathematics posts in practice. We conducted a user study and a field study in this section to further evaluate the usefulness of our model for assisting domain-specific edits.

6.1. Usefulness Evaluation

6.1.1. Procedures for User Study

We randomly selected 50 posts to manually check its needed edits. We asked an experienced research staff (not involved in the study) to edit these posts and collected corresponding edits as the groundtruth. Note that these posts are selected outside our training corpus (i.e., date after May 1, 2020) to avoid potential bias. Among 50 posts, 14 of them require math-related edits and we further selected 6 needing different kinds of edits for this user study.

We recruited ten PhD students and research assistants from our school. According to pre-study background survey, all participants knew Mathematics Stack Exchange and were familiar with using LaTeX. The study involves the two groups of five participants: the experimental group who do the post editing with our tool, and the control group who start from scratch. Each pair of participants have comparable experience in using Mathematics Stack Exchange and Latex so that the experimental group has similar expertise to the control group in total. Note that we do not ask participants to edit half of the posts with our tool while the other half without assisting tools to avoid potential tool bias.

We gave participants a detailed explanation of our tool for helping them understand the results. Participants were required to edit posts with/without our tool and have up to 10 minutes for each post. We recorded the time used to edit each post for every participant. After each post editing, participants were asked to rate how satisfied they are with their edits in five-point Likert scale  (likert1932technique) (1: not satisfied at all and 5: highly satisfied). Compared with the groundtruth, we calculated the precision and recall for these post edits.

6.1.2. Results of User Study

Post ID Time cost (s) Precision Recall Satisfactoriness

1 345.6 56.4 62.41 90.24 74.21 91.62 3.33 4.50
2 373.6 60.2 70.21 92.92 70.53 89.48 3.50 5.00
3 424.8 70.2 61.42 94.75 75.47 93.70 3.83 4.83
4 483.4 88.6 71.23 93.27 78.64 94.57 3.50 4.50
5 262.2 60.6 63.42 92.88 69.83 90.29 3.67 4.83
6 382.4 47.8 60.74 89.22 71.42 90.58 3.67 5.00
Average 378.6 64.0* 68.70 92.21* 73.35 91.70* 3.58 4.78*

Table 6. The comparison of the experiment and control group. * denotes

Table 6 shows that participants in the experimental group spent less time (on average 64s versus 378.6s) in editing posts than the control group. It indicates that our tool helps editors save 82.85% editing time. In fact, the average time of the control group is underestimated, as 2 participants failed to complete at least one post within 10 minutes, which means that they may need more time in the real editing. In contrast, all participants in the experimental group finished all post editing within 3 minutes. With the help of our MathLatexEdit, the experimental group also obtained higher (34.22% and 25.02%) precision and recall score in terms of the edits. The experimental group rated 80% of their post edits as highly satisfactory (5 point), as opposed to 10% highly satisfactory by the control group. On average, the satisfactoriness scores for the experiment and control group are 4.78 versus 3.58.

We carried out the Mann-Whitney U test (mwtest) (specifically designed for small samples) to understand the significance of the differences between the two groups. It suggests that our tool can significantly help the experimental group edit posts faster (), with higher precision, recall and satisfactoriness score ().

We believe that the better performance of the experimental group is due to the assistance of our MathLatexEdit which gives participants a reliable starting point for editing. Guided by the recommended post edits, participants can easily identify the issues and finish the edit. 83.3% recommendations provided by our tool are directly accepted by the editors. Although there may be some mistakes in our tool’s recommendation, participants can easily revise them. Without the help of edit recommendation, the control group have to read the post thoroughly and determine where issues are and solve these issues from scratch, which results in the longer edit time and less satisfactory edit results. For example, some latexification edits within text like replacing x-y to are frequently missed in the control group. The revision of complex latex formula such as is more error-prone in control group.

6.2. Real-world Assistance for Post Editors

We also conducted a small-scale field study, in which the first author who has no experience in Mathematics Stack Exchange acted as a novice post editor. We randomly selected 600 posts (106 are with images) after May 1, 2020 which are never used in our training dataset to avoid potential bias. Our model found that 112 posts need at least one LaTeX edit (including 95 posts with textual LaTeX edit, 22 posts with visual LaTeX edits). Among those 112 posts, we selected 80 posts (60 with only textual Latex edits, 20 with only visual Latex edits) which is of reasonable size, and manageable with human effort to manually submit the post edits. In Mathematics Stack Exchange, each question can have up to 5 tags to describe its topic. The 80 selected posts contain 203 tags in total (if the post is the answer, we took tags from its corresponding question) and 127 of these tags are unique. This indicates that the 80 selected posts cover a diverse set of mathematical topics. In fact, these posts contain many mathematical terms that are beyond the expertise of the first author. Within 80 selected posts, 69 (86.25%) of them just involve one edit while 11 (13.75%) of them involve multiple edits.

Figure 8. Example of MathLatexEdit being used to assist post editors
Figure 9. Example of rejected MathLatexEdit generated revision

After proofreading the 80 posts edited by our MathLatexEdit, he submitted the post edits to the community for approval. For the 80 submitted post edits, 78 (97.5%) were accepted and 2 (2.5%) were rejected. And the two rejected ones are both single edits. The experiment results show that no matter there are single edits or multiple edits in one post, most of them are accepted by the community. Records of some edits are shown in Fig 8. For the two rejected edits, which are visual LaTeX edits, the original images have special notations in the formula. Although MathLatexEdit recommended the correct LaTeX sequences, the notations in the formulas images were missed. Figure 9 shows one example with yellow notation of some parts of the formula for illustrating the rejection. Since our revised LaTeX sequence misses the notation which is important for this post, the revision was not accepted. But for the 78 accepted post edits, the trusted contributors believed that they contained sufficient correct edits that significantly improved the post quality, and thus approved them. This real-world acceptance of our MathLatexEdit’s proposed edits demonstrates the usefulness of MathLatexEdit in formula latexification, LaTeX revision and screenshot transcription for the Mathematics Stack Exchange Q&A collaborative site.

7. Discussion and Future Works

In this section, we discuss the possible generalization of our MathLatexEdit approach for use to support collaboration on other Q&A sites, and also its potential impact to improve accessibility of online mathematics-related information.

7.1. Generalization of MathLatexEdit

This work examined collaborative editing patterns in Mathematics Stack Exchange, and we developed MathLatexEdit, a deep learning-based approach for latexifing formula, revising latex and transcribing screenshots to assist post owners and editors. Note that our data analysis method and deep learning approach are totally data driven, and not tied to any specific collaborative editing or quality control process used by particular community Q&A sites. In this work, we study only Mathematics Stack Exchange post edits. However, the input to our approach is essentially just a parallel corpus of original and edited text (see Figure 1). Therefore, we would expect that our data analysis method and deep learning approach could be applied to other mathematics-related Q&A sites.

Figure 10. Example of MathLatexEdit generated revision from other sites

There are also some other sites which contain mathematics formulas and also support collaborative editing, such as MathOverflow 131313 (math Q&A site for professional mathematicians in Stack Exchange network), Quora141414 (largest general Q&A site), BaiduBaike151515 online encyclopedia created and edited by volunteers in China). We randomly selected 15 posts that require formula latexification, latex revision or screenshot transcription from each site and edited them with our tool, as shown in Fig 10. Since BaiduBaiKe requires certain reputation score for submission while we failed to reach that score, we only submitted our revisions in MathOverflow and Quora. We submitted our revisions to corresponding sites at 5 Oct, 2020. MathOverflow accepted 15 edits, and Quora accepted 10 edits with the other 5 pending. These results demonstrate the generalization of our MathLatexEdit approach, and also the possibility of MathLatexEdit to be adapted to use for these sites.

However, it is still an open question whether our approach can perform well in a large-scale, live deployment on mathematics Q&A sites. Several issues require further investigation, such as: how to integrate our recommendation in the Q&A and collaborative editing processes; can post owners and editors easily solve errors made in our generated LaTeX; how does MathLatexEdit usage impact the post owners and post editors’ behavior; and how does the suggested edits of MathLatexEdit lead to changes in people’s perceptions of the math formula.

7.2. Implications of MathLatexEdit to Accessible Math

Converting a posted mathematics formula screenshot to Latex brings many benefits to the site. First, as shown in Fig 6, transcribing the mathematics formula screenshot makes the site easier to read for everyone, as users can zoom in or out for a clearer view without blurring the formula. Such readability can help other users better understand the question, resulting in higher possibility of responses.

Second, this machine-readability also makes the text easier to register and index by search-engine crawlers. This in turn makes the post easier to search for and easier to be found by other users, and this, therefore, makes the post more useful for a broader cross-section of internet users.

The screenshot transcribing feature would also make the questions more accessible to users with vision impairment, where the formula can be read by screen reader. The users with visual impairments, ranging from difficulty reading all the way to complete blindness, need to be able to use the internet and study maths (AlajarmehNancy2012Dmma). When visiting the website, they mainly rely on assistive technology such as screen readers which can only read the text, rather than the image. Using proper typesetting (including good use of Markdown and Mathjax) provides additional HTML syntax that those assistive technologies can use to give a more meaningful account of the content on the page.

Although MathLatexEdit makes it easier for users to convert an image to LaTeX sequence, not all users will use it or follow the guidelines of Mathematics Stack Exchange. Indeed, outside Mathematics Stack Exchange there are still many screenshots of mathematics formulas on different sites. These inaccessible screenshots put a major barrier for users with vision impairment in learning and using mathematics online, resulting in degradation of their education equality. MathLatexEdit can help with converting a digital image into a LaTeX textual mathematics formula. Physical books are another kind of important learning resources for those special students, our model has important potential to be adapted for mathematics formula recognition from physical or image-based electronic book pictures.

However, note that what we generate is only the LaTeX sequences, which are not very user friendly for most end users. A simple division is written as “\frac{numerator}{denominator}” as an example in LaTeX, and this is much more complicated than how we speak about the mathematics orally. Therefore, in future work we want to develop a set of rules to further convert the generated LaTeX from MathLatexEdit into the natural language of mathematics formula to make it more accessible for novice users.

8. Conclusion

In this paper, we carried out an empirical study of historical collaborative editing data on the Mathematics Stack Exchange. Our results showed that collaborative editing is widely used in Mathematics Stack Exchange, which includes three domain-specific editing use cases as formula latexification, latex revision and screenshot transcription. Due to the difficulty of the conversion, we designed MathLatexEdit, a deep learning-based approach that automatically revises the math-related content in post. MathLatexEdit’s recommendations can assist post owners and editors in improving dissemination of mathematics knowledge via the Q&A site. Our evaluation through use of large-scale datasets demonstrates the quality of the edited LaTeX sequence by MathLatexEdit. The edit recommendations from MathLatexEdit for a selection of real-world posts were accepted by experienced users of Mathematics Stack Exchange, further showing the usefulness of our tool. We discussed the potential benefits of our MathLatexEdit post-edit recommendation approach for post owners/editors as well as different platforms. However, deploying our approach on these sites may have complicated impacts on social process and collaborative editing, which need further study in the future.


Ma is supported by a Faculty of IT PhD scholarship. Grundy and Khalajzadeh are supported by ARC Laureate Fellowship FL190100035.