Introduction
Technical scientific articles, such as those from physics and computer science, rely on both mathematics and text to communicate ideas. Most existing work in natural language processing (NLP) and machine learning studies these two components separately. For instance, textbased topic models have been used widely on scientific articles to uncover their semantic structure
[Blei, Ng, and Jordan2003, Blei and Lafferty2006, Newman et al.2010a]. For mathematics, recent work [Lan et al.2015, Zanibbi et al.2016, Deng et al.2017] has studied methods to model and generate mathematical equations, for example using RNNs. However, ultimately these two components should be processed together in a seamless manner. Algorithms for automated understanding of scientific documents should extract the information encoded by not only words but also mathematical equations. At the same time, equations should ideally be modeled with the help of the surrounding text, as the meaning of an equation depends not only on its constituent symbols and syntax, but also on the context in which it appears [Wang et al.2015, Krstovski and Blei2018].To this end, this paper proposes a topicequation model that jointly generates equations and their surrounding text in scientific documents (TopicEq), and demonstrates that the model can effectively achieve the aforementioned two goals. The intuition behind the model is illustrated in the sample passages in Figure 1
, which shows how the topic of the word context is often indicative of the distinctive types of equations used, and vice versa. For instance, equations appearing in the topic of relativity (with context words like “back hole”, “Einstein”) tend to involve a series of tensors like
and , while equations used in the topic of optimization (context words “gradient”, “optimal”) may use norms, the operator, and often their combinations. Ideally, the strings of mathematical symbols in the equations should aid the training of topic models, and the context words should aid the modeling and understanding of the equations.Our model formalizes this intuition for scientific texts by generating each equation and its context passage using a shared latent topic. Specifically, we apply a topic model to the context passage, and use the same latent topic proportion vector in a recurrent neural network (RNN) to generate the equation as a sequence of symbols. To develop and experiment with this model, we construct a large corpus of contextequation pairs, extracted from the LaTeX source of arXiv articles across a range of scientific domains (
ContextEq400K). We fit the model on this corpus using approximate inference based on a variational autoencoder approach.Our evaluation shows that this joint model significantly outperforms alternative topic models and RNN equation models for scientific texts. We further show that the model enables novel applications that bridge topics and mathematical equations. Concretely, the paper makes the following contributions.

[topsep=0.5mm]

The first study of jointly modeling topics and mathematics in scientific texts.

Better topic models for scientific texts: Joint training with the RNN equation model boosts the quality of topic modeling. This greatly outperforms the topic model that includes equations simply as bags of tokens, suggesting that equations’ syntaxlevel information captured by the RNN is useful for topic modeling.

Better equation models: Joint topic modeling provides the narrative context for equation prediction, and improves the quality/grammaticality of the RNN equation model.

Our model successfully captures the relationship between mathematical equations and topics (words), enabling interpretable handling of equations. For instance, we illustrate that the model enables topicaware equation generation and equation topic inference. We also present a variant of this model that learns topicaware associations between mathematical symbols and words.

The model is unsupervised, and enables the aforementioned tasks and applications without manual labels.
Related Work
Our work is connected to a wide range of recent research, from topic models to mathematical equation processing.
Topic models.
Topic models provide a powerful tool to extract the semantic structure of texts in the form of the latent topics—usually multinomial distributions over words. Starting from LDA [Blei, Ng, and Jordan2003], topic models have been studied extensively [Teh et al.2005, Blei and Lafferty2006, Blei and Lafferty2007, Hall, Jurafsky, and Manning2008], especially for scientific articles. However, while mathematical equations play an essential role in scientific documents, topic models capable of processing equations besides word texts are yet to be studied. This work shows that incorporating joint modeling of equations via an RNN boosts the performance of topic modeling for scientific texts.
Recent work [Cao et al.2015, Larochelle and Lauly2012] has proposed neural topic models, leveraging the flexibility and representation power of neural networks. In particular, [Miao, Yu, and Blunsom2016, Miao, Grefenstette, and Blunsom2017, Srivastava and Sutton2017] employ neural variational inference to train topic models; we will apply their technique to fit our model.
Language models & equation models.
Language modeling aims to learn a probability distribution over a sequence of words. It is a fundamental task in NLP, with a plethora of applications including text generation. RNNbased language models are shown effective for sequences with longterm dependencies
[Mikolov et al.2010, Jozefowicz et al.2016].Similar to language models, equation models are useful for various tasks involving equation generation, such as semantic parsing [Roy, Upadhyay, and Roth2016] and handwriting / optical character recognition [Deng et al.2017]. The use of RNNs to model LaTeX was illustrated by [Karpathy2015] for an algebraic geometry text. This work also employs an RNN to model each equation as a sequence of LaTeX tokens (or “symbols,” interchangeably).
Neural topiclanguage models.
Our model architecture is motivated by joint topiclanguage models. Such models typically extract latent topics of a given document via a topic model, and utilize the topic knowledge to improve an RNN language model. mikolov2012context mikolov2012context incorporate the topic vector of a pretrained LDA model into an RNN language model; recent work [Dieng et al.2017, Lau, Baldwin, and Cohn2017, Wang et al.2018] trains neural topic and language models jointly, as we will do here.
Key distinctions can be made between our work and these models. First, while previous work uses topic models to improve language modeling on the same word text, our task models two different modalities: word text and equations. In this sense, our work is related to [Blei and Jordan2003], which extends LDA to model imagetext pairs. Moreover, taking advantage of these two modalities, we also present a variant of the TopicEq model that learns topicaware association between mathematical symbols and words.
The second difference lies in the RNN equation model we propose. While [Dieng et al.2017, Ahn et al.2016, Lau, Baldwin, and Cohn2017] integrate the topic knowledge into either the output layer of the LSTM or the word predictions of the language model, we embed the topic proportion vector inside the LSTM, to enable the topic knowledge to have deeper influence on equation generation. Experimental results show that this method of incorporating topic information is more effective than the existing methods for improving the quality of equation modeling.
Mathematical equation processing.
Some work has processed equations as bags of math symbols to extract their features for searching [Sojka and Líška2011] and clustering [Lan et al.2015]. zanibbi2016multi zanibbi2016multi introduce treebased representations for equations for mathematical information retrieval tasks. Most recently, deng2017image deng2017image propose RNNbased models to generate equations. We will show that RNNbased equation processing can capture syntactic features of equations, and provides more effective help for topic modeling than bag of tokenbased equation processing does.
Finally, our work of modeling equations with contexts is related to [Krstovski and Blei2018], which fits equation embeddings using surrounding words. While they limit the equation domains (i.e., ML, AI), this work aims to uncover topics for texts and equations from a range of scientific domains. This work also models each equation itself as a sequence of symbols, which is not studied in their work.
The TopicEq Model
Our starting point is the correlated topic model [Blei and Lafferty2007], which models the topic proportion vector through a latent Gaussian vector. We extend this model to the setting where each “document” consists of a displayed equation eq and its surrounding text , which we call the equation’s context. Our joint model assumes that each equation and its context are generated from the same latent topic vector ; see Figure 2. Concretely, the generative process for a given is
(1) 
(2)  
(3) 
where
. Note that this is equivalent to placing a logistic normal distribution on
where the latent Gaussian has mean and covariance . The parameters , the topics, and the weights in the LSTM are to be estimated from data. Expressing the model as shown in Figure
2 emphasizes the connection with neural topic models such as [Miao, Grefenstette, and Blunsom2017]; we will apply their model training technique.Both the words and the equation are generated in a way that depends on the topic proportion vector . The topics are distributions over a word vocabulary with words; the context words are then drawn from the mixture , similar to [Wang et al.2018]. We employ an RNN to generate eq as a sequence of mathematical tokens, where the vocabulary is extracted from the set of LaTeX tokens. Specifically, to generate an equation conditioned on the latent topic proportion vector (equivalently ), we consider a TopicEmbedded LSTM (TELSTM), an extension of the LSTM [Hochreiter and Schmidhuber1997] where the th update is
Here denotes the concatenation of the current input, previous state and topic proportion vector;
is the sigmoid function and
denotes the Hadamard product. The probability of the next token in the equation is . Thus, the TELSTM embeds inside the LSTM cell to reflect the topic knowledge for equation generation. As a joint topicequation model, it is similar to the topiclanguage model of [Wang et al.2018].Writing the equation as a sequence of tokens
, the training objective is the marginal likelihood of and eq
(4) 
Since its direct optimization is intractable, we employ variational inference [Jordan et al.1999].
Denoting the variational distribution by , we
maximize the variational lower bound (ELBO) for the loglikelihood, :
(5) 
Following recent approaches to neural topiclanguage models [Miao, Grefenstette, and Blunsom2017, Dieng et al.2017, Wang et al.2018], we compute as a function of the context using the variational autoencoder technique [Kingma and Welling2014]
. Specifically, we use a feedforward neural network (FFNN) as an inference network to parameterize the mean and variance vectors of the (diagonal) Gaussian variational distribution
. We then use samples from to optimize Eq 5. The parameters of the inference network, the topic model, and the equation model are jointly trained by stochastic gradient descent.
We also include a topic diversity regularization term to Eq 5, following [Xie, Deng, and Xing2015]. We observed that this technique prevents learning generic, redundant topics.
Experiments
We study the performance of the proposed model on a corpus of contextequation pairs constructed from arXiv articles. We quantitatively show that our joint topicequation model provides superior fits than alternative topic models and equation models. We further demonstrate its efficacy through qualitative analyses and novel applications, such as equation generation and equation topic inference.
Dataset Construction (ContextEq400K)
To obtain a dataset of contextequation pairs, we used scientific articles
published on arXiv.org. We sampled 100k articles from all
domains in the past 5 years, and split them into train, validation
and test sets (80%, 10%, 10%). For each article, we parsed its
LaTeX source and extracted singleline display equations that have
five consecutive sentences both before and after the
equation, which are used to define the word context. Following
[Deng et al.2017], we further tokenized each equation into a
sequence of LaTeX tokens (e.g., \sigma
, ^
, {
,
2
, }
) and kept those of length 20–150, yielding the final corpus
of 400K equationcontext pairs.
An equation has 63 tokens on average. The context size of 10 sentences is similar to the document size used in recent work of topiclanguage models
[Dieng et al.2017, Wang et al.2018].
Experimental Setup
We fit the TopicEq model endtoend on the train set and evaluate its performance on the test set.
Preprocessing.
For the topic modeling of context passages, we first removed all the inline math expressions in the text. We then followed the preprocessing steps in [Wang et al.2018] to tokenize and lowercase all words, exclude stopwords and words appearing in fewer than 100 documents; this resulted in a vocabulary size of 8,660. For equations, we use the 1,000 most frequent LaTeX tokens as our vocabulary.


Topic Model  50  100
(# Topics) 


LDA (context only)  .085  .083 
Ours (context only)  .085  .084 
Ours (context + Eq BOW)  .087  .086 
Ours (context + Eq LSTM)  .097  .094 
Ours (context + Eq LSTM shuffled)  .086  .085 



Quantum physics 
spin energy field electron magnetic state states hamiltonian 
[2pt/1.5pt]
Particle physics 
higgs neutrino coupling decay scale masses mixing quark 
[2pt/1.5pt]
Astrophysics 
mass gas star stellar galaxies disk halo radius luminosity 
[2pt/1.5pt]
Relativity 
black metric hole schwarzschild gravity holes einstein 
[2pt/1.5pt]
Number theory 
prime integer numbers conjecture integers degree modulo 
[2pt/1.5pt]
Graph theory 
graph vertex vertices edges node edge number set tree 
[2pt/1.5pt]
Linear algebra 
matrix matrices vector basis vectors diagonal rank linear 
[2pt/1.5pt]
Optimization 
problem optimization algorithm function solution gradient 
[2pt/1.5pt]
Probability 
random probability distribution process measure time 
[2pt/1.5pt]
Machine learning 
layer word image feature sentence model cnn lstm training 

Model setting.
For the inference network , we use a 2layer FFNN with 300 units, similar to [Miao, Yu, and Blunsom2016, Miao, Grefenstette, and Blunsom2017]. The equation TELSTM architecture has two layers and state size 500, with dropout rate 0.5 applied to each layer [Srivastava et al.2014]. The parameters of the TopicEq model are jointly optimized by Adam [Kingma and Ba2015]
, with batch size 200, learning rate 0.002, and gradient clipping 1.0
[Pascanu, Mikolov, and Bengio2012].Topic Model Evaluation
We first study the topic modeling performance of TopicEq, by evaluating the coherence of the learned topics [Chang et al.2009, Newman et al.2010b, Mimno et al.2011]. Specifically, following [Lau, Newman, and Baldwin2014], we compute the normalized PMI metric on the heldout test set. As our TopicEq model incorporates joint, RNNbased equation model, to analyze its effect, we compare the full TopicEq model with the following baseline topic models:

[topsep=1pt]

LDA (context only): we apply LDA to the word text

Ours (context only): TopicEq without the equation model

Ours (context + Eq BOW): TopicEq’s joint LSTM equation model (Eq 3) is replaced by a baseline bagoftokens model similar to that for context words.
The evaluation results are summarized in Table 1. The full TopicEq model is shown as “Ours (context + Eq LSTM)” in the table. We observe that TopicEq’s topic model component (2nd row) performs on a par with LDA (1st row), but it achieves a significant boost (+0.01) when trained together with the LSTM equation model (4th row). Adding equations as bag of tokens (3rd row) does improve topic models marginally (+0.002), but the improvement made by using joint LSTM equation model is 5 times greater. These results show that a joint RNN equation model provides significant information to aid topic modeling of scientific texts.


Equation Model  Perplexity  Error (%)  
50  100  100  


No joint training 

LSTM (no topic)  5.81  5.81  15.3 
LSTM + LDA  5.54  5.52  13.4 
Joint training with topic model 

TDLSTM (Lau et al. 2017)  5.44  5.41  12.5 
TELSTM (Ours)  5.36  5.34  11.7 

Why is the RNN helpful?
We hypothesize that one reason why the joint RNN equation model is more helpful than the bagoftokens equation model is that the RNN also captures syntaxlevel information in equations. But one might argue that the introduction of the RNN itself was useful for topic modeling (e.g. as a form of regularization). To study our hypothesis, we retrained TopicEq with each equation’s token order randomly shuffled in the training data—thus corrupting the syntactic information of each equation. The result is shown in Table 1 as “Ours (context + Eq LSTM shuffled).” This time, the topic model performance degrades severely and falls to the level of the baseline topic model, “Ours (context only)”. This result supports the claim that the original TopicEq’s joint RNN actually captured syntactic features of equations, providing more effective help for topic modeling than a bagoftoken equation model does.
This idea also makes intuitive sense. Mathematical equations use a much smaller vocabulary (symbols / variables) than word texts, and thus often need phrase or syntaxlevel information to aid topic modeling. For example, in the equations in Figure 1, phrases like (use of super/subscripts for a tensor) and (regularization term) provide rich information to identify the topics (relativity and optimization), while the corresponding bags of tokens and themselves do not provide as much help.
Learned topics.
To visualize the topic modeling performance, we sampled 10 topics learned by TopicEq (Table 2). They intuitively reflect the scientific topics of arXiv articles.
Equation Model Evaluation
Next, we evaluate the equation model component of TopicEq by measuring the test set perplexity. Additionally, as the grammaticality of equations can be measured using the LaTeX compiler, we also evaluate the syntax error rate of generated equations. We compare our TELSTM with

[topsep=1pt]

a generic LSTM (no topic knowledge)

LSTM + LDA: the topic vector obtained from a pretrained LDA is concatenated to the output of LSTM
and a recent topicdependent LSTM applied to our task

[topsep=1pt]

TDLSTM [Lau, Baldwin, and Cohn2017]: is added to the output of LSTM via a dense layer.
TDLSTM and our TELSTM are jointly trained with our topic model component. As Table 3 shows, all the topicdependent LSTMs are superior to the vanilla LSTM in both the perplexity metric and syntax error metric. Moreover, our TELSTM outperforms TDLSTM, suggesting that the model better incorporates topic knowledge by embedding inside the LSTM. We also find that compared to [Wang et al.2018]’s MixtureofExpert LSTM, our model achieves similar performance in this task while requiring fewer parameters and much less training time (40% reduction). In total, compared to the generic LSTM, our TELSTM equation model reduces test perplexity by 8% (relative) and syntax error rate by 3.5% (absolute). This result suggests that incorporating context/topic information can improve the quality and grammaticality of equation modeling.
Qualitative Analysis & Applications



Inferred Topic (showing top 5 words) 

23[2pt/1.5pt] 
by our TopicEq 
by bagoftoken baseline 







[2pt/1.5pt]




[2pt/1.5pt]




[2pt/1.5pt]




[2pt/1.5pt]




[2pt/1.5pt]




[2pt/1.5pt]




[2pt/1.5pt]





Topicaware Equation Generation
The TopicEq model can generate meaningful equations from specified topics, using Eq 3 (TELSTM). For example, given a topic , we let be the onehot vector representing the topic; conditioned on , and starting from <START> token, we keep sampling the next LaTeX token until the <END> token is generated. Table 4 shows several topics picked from Table 2 (left), and equations generated from each of these topics (right). We see that the artificial equations generated by the model clearly reflect the distinctive characteristics of the given topics. For instance, derivatives, and number + units are generally used for physics; electron configuration for quantum physics; series of tensors like for relativity; prime number for number theory; , clauses for probability. We also note that the equations generated by our TELSTM use not only topicspecific symbols but also topicspecific phrases and syntax (e.g., a set definition is used for linear algebra; “ subject to” clause for optimization). These qualitative results support that TopicEq is capable of fully incorporating topic information for equation modeling.
Mixtures of topics.
The model can also generate equations from a mixture of topics by setting accordingly. To qualitatively analyze the space of the topic vector in terms of equation generation, we let the model generate equations while smoothly changing between two topics (i.e., onehot vectors and ) via linear interpolation: for . In Table 5, for two examples we show the given topic pair and its interpolation (left), and the equation greedily decoded from each (right). We let the model start all equations from in the first example (astrophysics and graph theory), and from in the second example (optmization and statistics). In both cases we observe that the generated equations make a smooth transition from one topic to the other — e.g., for the first example, from using (astrophysics) to using linear algebraic term , and finally a set notation (graph theory). In the second example, where the two topics optimization and statistics are closely related, the generated equations make a very intuitive transition: from an optimization objective with norms and regularization terms (top), to using summation terms (middle) and finally expectations (bottom; statistics topic). These observations support that TopicEq learns smooth representations for the latent topic vector (especially for a mixture of closely related topics), regarding equation generation.
Finally, we illustrate that the model can generate equations from a given set of context words. Specifically, we let the model infer the topic proportion of the context words via the inference network , and then generate equations from via Eq 3 (TELSTM). As Table 6 shows, the model is able to infer the right topic mixture (2nd column) and generate equations that reflect those topics (e.g., solar mass and radius
are used for the top example; loss function
, , and for the bottom example).Equation Topic Inference
Identifying the topic of equations is an important task that allows readers to obtain semantic descriptions for equations unfamiliar to them. However, while some work [Schubotz et al.2016, Stathopoulos et al.2018] has studied the task of identifying the meaning of individual mathematical symbols, no prior work has succeeded in providing descriptions to entire equations from various domains.
Our TopicEq model can be utilized to identify the topic of given equations. Specifically, with a trained TopicEq model, for a given equation eq, we find the topic (so is a onehot vector) that maximizes the likelihood in Eq 3, which is parametrized by our topicdependent LSTM. Table 7 shows examples of equations across different domains (1st column), and the most likely topic inferred by our model for each equation (2nd column). We used topics in this task. We observe that the TopicEq model correctly identifies the domains or even finer topics (e.g., note the distinction between #5 and #6) for most of the given equations.



Topics  
25[2pt/1.5pt]  No Topic  Probability  Quntum physics  Graph theory  








[2pt/1.5pt]






[2pt/1.5pt]






[2pt/1.5pt]






[2pt/1.5pt]






[2pt/1.5pt]






[2pt/1.5pt]







Is an RNN necessary for this task?
We repeated this experiment using a bag of tokens model for equations in Eq 3 (instead of LSTM), to analyze whether the RNN equation model provides an advantage over the bag of tokensbased approach in this task. As can be seen in Table 7, 3rd column, this bagoftokens baseline performs as well in #1 and #2, which have topicspecific variables like , , , but fails in #3 and #4, which consist of a relatively generic set of symbols and require recognizing phrases like (work) and (neural network layer) to identify the correct topic. Indeed, the topics predicted for #3 and #4 are very generic and similar. Similarly, the bagoftokens baseline fails to distinguish #5 and #6, most likely because it does not recognize the phase and syntaxlevel differences between these two equations. Finally, for #7 (Taylor Expansion), we also experimented with #7’, where we just changed some variable names without altering the equation’s meaning and syntax. While our TopicEq still recognizes this to be the same topic as #7, the bagoftokens baseline is fooled by the changed variable names and predicts a wrong topic. These observations suggest that the RNN equation model can capture phrase and syntaxlevel information, and can consistently infer the correct topics for equations from various domains. The TopicEq model could be used to help readers interpret equations unfamiliar to them.
Extension: Topicaware alignment between mathematical tokens and words
Mathematical symbols (including variables) carry different meanings in different contexts or topics. Prior work [Pagael and Schubotz2014, Schubotz et al.2016, Stathopoulos et al.2018] has studied the task of identifying meanings of math variables using surrounding words, but its topic dependence has not been modeled explicitly. Here we present a variant of the TopicEq model that captures topicdependent alignment between mathematical tokens and words from scientific document data. Specifically, we aim to learn the most probable descriptions (word phrases) associated with a given math symbol , under a given topic or topic mixture : .
Baseline alignment model.
We use the equations and context texts from our ContextEq corpus. Similar to [Pagael and Schubotz2014], we consider that the descriptions of math symbols often appear in the sentence immediately before or after the given equation (immediate context). We then consider a simple alignment model between symbols in the equation and phrases in the immediate context, such that
(6) 
Here vector is the bagoftokens representation of the equation. is the alignment matrix we estimate from the data, by maximizing the likelihood . are the vocab sizes of symbols and word descriptions. For the vocabulary of word descriptions, we collect the titles of Wikipedia pages that contain mathematical equations. We then use the top 2,000 phrases that appear in our arXiv dataset. For math symbols, we use the top .
To predict given a single symbol , we set to be the onehot vector representing , as a surrogate.
Topicaware alignment model.
To model , we want the alignment matrix to depend on . Motivated by the tensor factorization method in [Song, Gan, and Carin2016], we let
(7) 
where , , are parameters to estimate. is the number of factors, which we set to be equal to the number of topics . To jointly perform topic modeling and alignment learning, we consider a variant of TopicEq, where we just replace Eq 3 by this topicdependent alignment model. We train it on the ContextEq corpus.
Results and Discussion
Table 9 shows the perplexity of the baseline / topicaware alignment models evaluated on the heldout test set. We observe that the topic information significantly improves the alignment between math symbols and word descriptions, reducing the perplexity by more than 33% (relative).


Alignment Model  50 
100
(# Topics) 


Baseline (no topic)  602  602 
TopicAware  406  387 



Topic Model  50  100
(# Topics) 


Context Only  .085  .084 
with joint
Alignment Model 
.088  .087 

Qualitative results.
Table 8 shows the actual top phrases predicted by the alignment models for several math symbols that are used in a wide range of domains.
The proposed TopicEq variant indeed learns the topicdependent alignment between symbols and words. For instance,
it associates with “expectation” for the probability topic, “electric field” for quantum physics, and “edge” for graph theory, which makes intuitive sense.
On the other hand,
the baseline (no topic) model associates with “energy”, which is simply the description that appears most frequently across all articles.
This is another example where the TopicEq framework can be used to capture the relation of topics and mathematics.
Utility.
We also note that our topicaware alignment model can be conditioned on a mixture of topics by setting accordingly. Given a context text and equation, this model can infer the topic proportion by the topic model component, and then use the topicaware alignment component to infer the most probable meaning of each variable in the given equation. This could aid readers to comprehend scientific documents containing mathematics unfamiliar to them.
Effect on topic modeling.
In Table 10, we compare our baseline topic model (top) and this TopicEq variant with the alignment component (bottom). The joint alignment model provides moderate improvements for topic modeling quality.
Conclusion
Motivated by the topical correspondence between text and mathematical equations observed in scientific documents, we proposed TopicEq, a joint topicequation model that generates the text by a topic model and the equations by a topicdependent RNN. This joint model outperforms existing topic models and equation models for scientific texts. We also qualitatively analyzed TopicEq, and showed its applications and extensions, such as equation topic inference and topicaware alignment of mathematical symbols and words.
Acknowledgments
We thank Matt Bonakdarpour, Paul Ginsparg, Samuel Helms, and Kriste Krstovski for their assistance, and Jungo Kasai as well as the anonymous reviewers for their feedback. This work was supported in part by a grant from the Alfred P. Sloan Foundation.
References
 [Ahn et al.2016] Ahn, S.; Choi, H.; Pärnamaa, T.; and Bengio, Y. 2016. A neural knowledge language model. arXiv:1608.00318.
 [Blei and Jordan2003] Blei, D. M., and Jordan, M. I. 2003. Modeling annotated data. In SIGIR.
 [Blei and Lafferty2006] Blei, D. M., and Lafferty, J. D. 2006. Dynamic topic models. In ICML.
 [Blei and Lafferty2007] Blei, D. M., and Lafferty, J. D. 2007. A correlated topic model of science. The Annals of Applied Statistics 17–35.
 [Blei, Ng, and Jordan2003] Blei, D. M.; Ng, A. Y.; and Jordan, M. I. 2003. Latent dirichlet allocation. JMLR.
 [Cao et al.2015] Cao, Z.; Li, S.; Liu, Y.; Li, W.; and Ji, H. 2015. A novel neural topic model and its supervised extension. In AAAI.
 [Chang et al.2009] Chang, J.; Gerrish, S.; Wang, C.; BoydGraber, J. L.; and Blei, D. M. 2009. Reading tea leaves: How humans interpret topic models. In NIPS.
 [Deng et al.2017] Deng, Y.; Kanervisto, A.; Ling, J.; and Rush, A. M. 2017. Imagetomarkup generation with coarsetofine attention. In ICML.
 [Dieng et al.2017] Dieng, A. B.; Wang, C.; Gao, J.; and Paisley, J. 2017. Topicrnn: A recurrent neural network with longrange semantic dependency. In ICLR.
 [Hall, Jurafsky, and Manning2008] Hall, D.; Jurafsky, D.; and Manning, C. D. 2008. Studying the history of ideas using topic models. In EMNLP.
 [Hochreiter and Schmidhuber1997] Hochreiter, S., and Schmidhuber, J. 1997. Long shortterm memory. Neural Computation 9(8):1735–1780.
 [Jordan et al.1999] Jordan, M. I.; Ghahramani, Z.; Jaakkola, T. S.; and Saul, L. K. 1999. An introduction to variational methods for graphical models. Machine learning 37(2):183–233.
 [Jozefowicz et al.2016] Jozefowicz, R.; Vinyals, O.; Schuster, M.; Shazeer, N.; and Wu, Y. 2016. Exploring the limits of language modeling. arXiv:1602.02410.
 [Karpathy2015] Karpathy, A. 2015. The unreasonable effectiveness of recurrent neural networks. Blog posting, May 21.
 [Kingma and Ba2015] Kingma, D., and Ba, J. 2015. Adam: A method for stochastic optimization. In ICLR.
 [Kingma and Welling2014] Kingma, D. P., and Welling, M. 2014. Autoencoding variational bayes. In ICLR.
 [Krstovski and Blei2018] Krstovski, K., and Blei, D. M. 2018. Equation embeddings. arXiv:1803.09123.
 [Lan et al.2015] Lan, A. S.; Vats, D.; Waters, A. E.; and Baraniuk, R. G. 2015. Mathematical language processing: Automatic grading and feedback for open response mathematical questions. In ACM Conference on Learning@ Scale.
 [Larochelle and Lauly2012] Larochelle, H., and Lauly, S. 2012. A neural autoregressive topic model. In NIPS.
 [Lau, Baldwin, and Cohn2017] Lau, J. H.; Baldwin, T.; and Cohn, T. 2017. Topically driven neural language model. In ACL.
 [Lau, Newman, and Baldwin2014] Lau, J. H.; Newman, D.; and Baldwin, T. 2014. Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In EACL.
 [Miao, Grefenstette, and Blunsom2017] Miao, Y.; Grefenstette, E.; and Blunsom, P. 2017. Discovering discrete latent topics with neural variational inference. In ICML.
 [Miao, Yu, and Blunsom2016] Miao, Y.; Yu, L.; and Blunsom, P. 2016. Neural variational inference for text processing. In ICML.
 [Mikolov and Zweig2012] Mikolov, T., and Zweig, G. 2012. Context dependent recurrent neural network language model. SLT 12:234–239.
 [Mikolov et al.2010] Mikolov, T.; Karafiát, M.; Burget, L.; Černockỳ, J.; and Khudanpur, S. 2010. Recurrent neural network based language model. In Interspeech.
 [Mimno et al.2011] Mimno, D.; Wallach, H. M.; Talley, E.; Leenders, M.; and McCallum, A. 2011. Optimizing semantic coherence in topic models. In EMNLP.
 [Newman et al.2010a] Newman, D.; Baldwin, T.; Cavedon, L.; Huang, E.; Karimi, S.; Martinez, D.; Scholer, F.; and Zobel, J. 2010a. Visualizing search results and document collections using topic maps. Web Semantics: Science, Services and Agents on the World Wide Web 8(23):169–175.
 [Newman et al.2010b] Newman, D.; Lau, J. H.; Grieser, K.; and Baldwin, T. 2010b. Automatic evaluation of topic coherence. In NAACL.
 [Pagael and Schubotz2014] Pagael, R., and Schubotz, M. 2014. Mathematical language processing project. In CICM.
 [Pascanu, Mikolov, and Bengio2012] Pascanu, R.; Mikolov, T.; and Bengio, Y. 2012. On the difficulty of training recurrent neural networks. arXiv:1211.5063.
 [Roy, Upadhyay, and Roth2016] Roy, S.; Upadhyay, S.; and Roth, D. 2016. Equation parsing: Mapping sentences to grounded equations. In EMNLP.
 [Schubotz et al.2016] Schubotz, M.; Grigorev, A.; Leich, M.; Cohl, H. S.; Meuschke, N.; Gipp, B.; Youssef, A. S.; and Markl, V. 2016. Semantification of identifiers in mathematics for better math information retrieval. In SIGIR.
 [Sojka and Líška2011] Sojka, P., and Líška, M. 2011. Indexing and searching mathematics in digital libraries. In CICM.
 [Song, Gan, and Carin2016] Song, J.; Gan, Z.; and Carin, L. 2016. Factored temporal sigmoid belief networks for sequence learning. In ICML.
 [Srivastava and Sutton2017] Srivastava, A., and Sutton, C. 2017. Autoencoding variational inference for topic models. In ICLR.
 [Srivastava et al.2014] Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2014. Dropout: A simple way to prevent neural networks from overfitting. JMLR.
 [Stathopoulos et al.2018] Stathopoulos, Y.; Baker, S.; Rei, M.; and Teufel, S. 2018. Variable typing: Assigning meaning to variables in mathematical text. In NAACL.
 [Teh et al.2005] Teh, Y. W.; Jordan, M. I.; Beal, M. J.; and Blei, D. M. 2005. Sharing clusters among related groups: Hierarchical dirichlet processes. In NIPS.
 [Wang et al.2015] Wang, Y.; Gao, L.; Wang, S.; Tang, Z.; Liu, X.; and Yuan, K. 2015. Wikimirs 3.0: a hybrid mir system based on the context, structure and importance of formulae in a document. In JCDL.
 [Wang et al.2018] Wang, W.; Gan, Z.; Wang, W.; Shen, D.; Huang, J.; Ping, W.; Satheesh, S.; and Carin, L. 2018. Topic compositional neural language model. In AISTATS.

[Xie, Deng, and
Xing2015]
Xie, P.; Deng, Y.; and Xing, E.
2015.
Diversifying restricted boltzmann machine for document modeling.
In KDD.  [Zanibbi et al.2016] Zanibbi, R.; Davila, K.; Kane, A.; and Tompa, F. W. 2016. Multistage math formula search: Using appearancebased similarity metrics at scale. In SIGIR.
Comments
There are no comments yet.