1 Introduction
Searching for math formulae does not appear as a task for search engines at first glance. Text retrieval is dominant among search engines, while math-awareness is a specialized area in the field of information retrieval: Springer’s LaTeX Search, the MathWebSearch of zbMATH Open (formerly known as Zentralblatt MATH), and the Math Indexer and Searcher (MIaS) of the European Digital Mathematics Library (EuDML) are all examples of systems with math-aware search deployed in production. Our MIaS search engine [mir:MIaSNTCIR-11short] runs on the industry-grade, robust, and highly-scalable full-text search engine Apache Lucene with our own preprocessing of mathematical formulae.

The text is tokenized and stemmed to unify inflected word forms whereas math is expected to be in the MathML format, which is then canonicalized, ordered, tokenized, and unified, see Figure 1.

To provide a web user interface for MIaS, we have developed and open-sourced the WebMIaS [mir:webmias2014short, mir:MIaSNTCIR-11short] search engine. In WebMIaS, users can input their mixed queries in a combination of text and math with a native support for LaTeX and MathML. Matches are conveniently highlighted in the search results. The user interface of WebMIaS is shown in Figure 2.
Although the (Web)MIaS system has been deployed in the European Digital Mathematics Library (EuDML) already, the complicated deployment process might be an obstacle for a more wide-spread deployment to other digital mathematics libraries that avail of or can extend to the MathML markup. To solve this problem, we will describe the virtualization of WebMIaS using Docker [boettiger2015introduction] that allows anyone to deploy WebMIaS in a single line of code. Whether you have an open-access repository such as DSpace, or just a number of mathematical documents, you can benefit from the math-aware search provided by WebMIaS. For testing, we also provide the MREC dataset [dml:liska2011short].
2 Deployment process description
All modules of the MIaS system are Java projects, so users first need to 1) install the Java environment prerequisites and then 2) build the respective system modules. The next step in the process is to 3) index a dataset of mathematical documents using the command-line interface of MIaS. Finally, the users can 4) run Apache Tomcat with the WebMIaS servlet as a user interface.
Over the years, we have attempted to automate the above steps into running a single Makefile or Jupyter Notebook. However, these solutions were slow, fragile, and hard to maintain. We propose a better solution using lightweight virtualization via Docker with instant deployment, a short but powerful Dockerfile configuration, and a complete workflow that automates all the steps of the deployment process. Moreover, GitHub Actions provide continuous integration and automate the publishing of Docker images to Docker Hub.

Both MIaS and WebMIaS are containerized into separate Docker images named miratmu/mias and miratmu/webmias, respectively. This allows users to run both the indexing and the retrieval without a specific configuration of the environment. Resolving the dependencies and building all modules is up to the continuous integration workflow (see Figure 3), and users receive Docker images with everything prebuilt. After downloading a dataset to the working directory, users can index the dataset directory into the index directory using MIaS, see Listing 2. [t] [ linenos, breaklines, framesep=.02mm, ]bash \(wgethttps://mir.fi.muni.cz/MREC/MREC2011.4.439.tar.bz2\) mkdir dataset ; tar xj -f MREC2011.4.439.tar.bz2 -C dataset PWD”/dataset:/dataset:ro -v ” docker run -v ”PWD”/index:/index:ro –rm –name webmias -d -p 127.0.0.1:8888:8080 miratmu/webmias Downloading and indexing the MREC2011.4 dataset for WebMIaS (lines 1–3), and deploying WebMIaS in a single line (n. 4) of code.
Mathematical (sub)formulae Indexing time (min) Documents Input Indexed Real (Wall clock) CPU 10,000 (2.28 %) 3,406,068 64,008,762 35.75 (2.05 %) 35.05 100,000 (22.76 %) 36,328,126 670,335,243 384.44 (22.00 %) 366.54 439,423 (100 %) 158,106,118 2,910,314,146 1,747.16 (100 %) 1,623.22
Measure | Level | PMath | CMath | PCMath | LaTeX |
---|---|---|---|---|---|
MAP | 3 | 0.3073 | 0.3630 /1/ | 0.3594 | 0.3357 |
P@10 | 3 | 0.3040 | 0.3520 /1/ | 0.3480 | 0.3380 |
P@5 | 3 | 0.5120 | 0.5680 /1/ | 0.5560 | 0.5400 |
P@10 | 1 | 0.5020 | 0.5440 | 0.5520 /1/ | 0.5400 |
Finally, the users can deploy WebMIaS in a single line of code with the dataset and index directories in a container named webmias running at the TCP port 8888 on the localhost. The WebMIaS system will be running at http://localhost:8888/WebMIaS.
3 Evaluation
We performed a speed evaluation of MIaS on the MREC dataset [dml:liska2011short] (see Table 2), and a quality evaluation on the NTCIR-10 Math [mir:NTCIR-10-Overview, MIR:MIRMUshort], NTCIR-11 Math-2 [NTCIR11Math2overviewshort, mir:MIaSNTCIR-11short] (see Table 2), NTCIR-12 MathIR [ZanibbiEtAl16NTCIR, RuzickaSojkaLiska16Mathshort], and ARQMath 2020 [zanibbi2020overview, novotny2020three] datasets. We also measured the time to deploy WebMIaS without Docker (see Figure 3).
The speed evaluation shows that the indexing time of our system is linear in the number of indexed documents and that the average query time is 469 ms. Additionally, the dockerization of WebMIaS reduces the deployment time from about 10 minutes to a matter of seconds. With respect to quality evaluation, MIaS has notably won the NTCIR-11 Math-2 task.
4 Conclusion
An open-source environment brings reproducibility and the possibility of trying out the projects of one’s interest without limitations. However, the installation instructions are often hard to follow with many prerequisites and possible conflicts with the running operating environment on the go. Automation tools, continuous integration, and package virtualization ease the development process. With this motivation and in the hope of helping the math community, we have dockerized our math-aware web search engine WebMIaS. As a result, anyone can now deploy WebMIaS in a single line of code. The software is accessible and at the fingertips of the math community, see https://github.com/MIR-MU/WebMIaS.
Comments
There are no comments yet.