Solving math word problems (MWPs) poses unique challenges for understanding natural-language problems and performing arithmetic reasoning over quantities with commonsense knowledge. As shown in Figure 1, a typical MWP consists of a short narrative describing a situation in the world and asking a question about an unknown quantity. To solve the MWP in Figure 1, a machine needs to extract key quantities from the text, such as “100 kilometers” and “2 hours”, and understand the relationships between them. General mathematical knowledge like “distance = velocity time” is then used to calculate the solution.
Researchers have recently focused on solving MWPs using neural-symbolic models Ling et al. (2017); Wang et al. (2017); Huang et al. (2018); Wang et al. (2018); Xie and Sun (2019). These models usually consist of a neural perception module (i.e., Seq2Seq or Seq2Tree) that maps the problem text into a solution expression or tree, and a symbolic module which executes the expression and generates the final answer. Training these models requires the full supervision of the solution expressions.
However, these fully-supervised approaches have three drawbacks. First, current MWP datasets only provide one solution for each problem, while there naturally exist multiple solutions that give different paths of solving the same problem. For instance, the problem in Figure 1 can be solved by “” if we first calculate the speed and then multiply it by the total time; alternatively, we can solve it using “” by summing the distances of the first and second parts of the journey. The models trained with full supervision on current datasets are forced to fit the given solution and cannot generate diverse solutions. Second, annotating the expressions for MWPs is time-consuming. However, a large amount of MWPs with their final answers can be mined effortlessly from the internet (e.g., online forums). How to efficiently utilize these partially-labeled data without the supervision of expressions remains an open problem. Third, current supervised learning approaches suffer from the train-test discrepancy. The fully-supervised learning methods optimize expression accuracy rather than answer accuracy. However, the model is evaluated by the answer accuracy on the test set, causing a natural performance gap.
To address these issues, we propose to solve the MWPs with weak supervision, where only the problem texts and the final answers are required. By directly optimizing the answer accuracy rather than the expression accuracy, learning with weak supervision naturally addresses the train-test discrepancy. Our model consists of a tree-structured neural model similar to Xie and Sun (2019) to generate the solution tree and a symbolic execution module to calculate the answer. However, the symbolic execution module for arithmetic expressions is non-differentiable with respect to the answer accuracy, making it infeasible to use back-propagation to compute gradients. A straightforward approach is to employ policy gradient methods like REINFORCE Williams (1992) to train the neural model. The policy gradient methods explore the solution space and update the policy based on generated solutions that happen to hit the correct answer. Since the solution space is large and incorrect solutions are abandoned with zero reward, these methods usually converge slowly or fail to converge.
To improve the efficiency of weakly-supervised learning, we propose a novel fixing mechanism to learn from incorrect predictions, which is inspired by the human ability to learn from failures via abductive reasoning Magnani (2009); Zhou (2019a). The fixing mechanism propagates the error from the root node to the leaf nodes in the solution tree and finds the most probable fix that can generate the desired answer. The fixed solution tree is further used as a pseudo label to train the neural model. Figure 2 shows how the fixing mechanism corrects the wrong solution tree by tracing the error in a top-down manner.
Furthermore, we design two practical techniques to traverse the solution space and discover possible solutions efficiently. First, we observe a positive correlation between the number of quantities in the text and the size of the solution tree (the number of leaf nodes in the tree), and propose a tree regularization technique based on this observation to limit the range of possible tree sizes and shrink the solution space. Second, we adopt a memory buffer to track and save the discovered fixes for each problem with the fixing mechanism. All memory buffer solutions are used as pseudo labels to train the model, encouraging the model to generate more diverse solutions for a single problem.
In summary, by combining the fixing mechanism and the above two techniques, the proposed learning-by-fixing (LBF) method contains an exploring stage and a learning stage in each iteration, as shown in Figure 2. We utilize the fixing mechanism and tree regularization to correct wrong answers in the exploring stage and generate fixed expressions as pseudo labels. In the learning stage, we train the neural model using these pseudo labels.
We conduct comprehensive experiments on the Math23K dataset Wang et al. (2017). The proposed LBF method significantly outperforms the reinforcement learning baselines in weakly-supervised learning and achieves comparable performance with several fully-supervised methods. Furthermore, our proposed method achieves significantly better answer accuracies of all the top-3/5 answers than fully-supervised methods, illustrating its advantage in generating diverse solutions. The ablative experiments also demonstrate the efficacy of the designed algorithms, including the fixing mechanism, tree regularization, and memory buffer.
Math Word Problems
Recently, there emerges various question-answering tasks that require human-like reasoning abilities Qi et al. (2015); Tu et al. (2014); Zhang et al. (2019); Dua et al. (2019); Hong et al. (2019); Zhu et al. (2020); Zhang et al. (2020b); Li et al. (2020b); Yu et al. (2020). Among them, solving mathematical word problems (MWPs) is a fundamental and challenging task.
Previous studies of MWPs range from traditional rule-based methods Fletcher (1985); Bakman (2007); Yu-hui et al. (2010), statistical learning methods Kushman et al. (2014); Zhou et al. (2015); Mitra and Baral (2016); Roy and Roth (2017); Huang et al. (2016), semantic-parsing methods Shi et al. (2015); Koncel-Kedziorski et al. (2015); Huang et al. (2017)
to recent deep learning methodsLing et al. (2017); Wang et al. (2017); Huang et al. (2018); Robaidek et al. (2018); Wang et al. (2018, 2019); Chiang and Chen (2019); Xie and Sun (2019); Zhang et al. (2020a).
In particular, Deep Neural Solver (DNS) Wang et al. (2017) is a pioneering work that designs a Seq2seq model to solve MWPs and achieves promising results. Xie and Sun (2019) propose a tree-structured neural solver to generate the solution tree in a goal-driven manner. All these neural solvers learn the model with full supervision, where the ground-truth intermediate representations (e.g., expressions, programs) are given during training. To learn the solver with less supervision, Koncel-Kedziorski et al. (2015) use a discriminative model to solve MWPs in a weakly-supervised way. They utilize separate modules to extract features, construct expression trees, and score the likelihood, which is different from the current end-to-end neural solvers. Upadhyay et al. (2016), Zhou et al. (2015), and Kushman et al. (2014) use mixed supervision, where one dataset has only annotated equations, and the other has only final answers. However, for the set with final answers, they also depend on pre-defined equation templates. Chen et al. (2020) apply a neural-symbolic reader on MathQAAmini et al. (2019), which is a large-scale dataset with fully-specified operational programs. They have access to the ground truth programs for a small fraction of training samples at the first iterations of training.
Unlike these methods, the proposed LBF method requires only the supervision of the final answer and generates diverse solutions by keeping a memory buffer. Notably, it addresses the sparse reward problem in policy gradient methods using a fixing mechanism that propagates error down a solution tree and finds the most probable fix.
Neural-Symbolic Learning for NLP
Neural-symbolic learning has been applied to solve NLP tasks with weak supervision, such as semantic parsing and program synthesis Liang et al. (2016a); Guu et al. (2017); Liang et al. (2018); Agarwal et al. (2019); Li et al. (2020b). Similar to MWP, they generate intermediate symbolic representations with a neural network and execute the intermediate representation with a symbolic reasoning module to get the final result. Typical approaches for such neural-symbolic models use policy gradient methods like REINFORCE since the symbolic execution module is non-differentiable. For example, Neural Symbolic Machines Liang et al. (2016b) combines REINFORCE with a maximum-likelihood training process to find good programs. Guu et al. (2017) augment reinforcement learning with the maximum marginal likelihood so that probability is distributed evenly across consistent programs. Memory Augmented Policy Optimization (MAPO) Liang et al. (2018) formulates its learning objective as an expectation over a memory buffer of high-reward samples and a separate expectation outside the buffer, which helps accelerate and stabilize policy gradient training. Meta Reward Learning Agarwal et al. (2019) uses an auxiliary reward function to provide feedback beyond a binary success or failure. Since these methods can only learn from sparse successful samples, they suffer from cold start and inefficient exploration of large search spaces. Recently, Dai and Zhou (2017), Dai et al. (2019), and Zhou (2019b) introduce abductive learning, which states that human misperceptions can be corrected via abductive reasoning. In this paper, we follow the abductive learning method Li et al. (2020a) and propose a novel fixing mechanism to learn from negative samples, significantly accelerating and stabilizing the weakly-supervised learning process. We further design the tree regularization and memory buffer techniques to efficiently shrink and explore the solution space.
In this section, we define the weakly-supervised math word problems and describe the goal-driven tree model originated from Xie and Sun (2019). Then we introduce the proposed learning-by-fixing method, as also shown in Figure 2.
A math word problem is represented by an input problem text
. The machine learning model with parametersrequires to translate into an intermediate expression , which is executed to compute the final answer . In fully-supervised learning, we learn from the ground truth expression and the final answer . The learning objective is to maximize the data likelihood , where computing given is a deterministic process. In contrast, in the weakly-supervised setting, only and are observed, while is hidden. In other words, the model is required to generate an unknown expression from the problem text. The expression is then executed to get the final answer.
Goal-driven Tree-Structured Model
A problem text consists of words and numeric values. The model takes in problem text and generates a solution tree . Let denote the ordered list of numeric values in according to their order in the problem text. Generally, may contain constants , mathematical operators , and numeric values from the problem text . Therefore, the target vocabulary of is denoted as and it varies between problems due to different .
To generate the solution tree, we adopt the goal-driven tree-structured neural model (GTS) Xie and Sun (2019), which first encodes the problem text into its goal and then recursively decomposes it into sub-goals in a top-down manner.
Problem Encoding. Each word of the problem text is encoded into a contextual representation. Specifically, for a problem , each word is first converted to a word embedding . Then the sequence of embeddings is inputted to a bi-directional GRU Cho et al. (2014) to produce a contextual word representation: where are the hidden states of the forward and backward GRUs at position , respectively.
Solution Tree Generation.
The tree generation process is designed as a preorder tree traversal (root-left-right). The root node of the solution tree is initialized with a goal vector
For a node with goal q, we first derive a context vector c by an attention mechanism to summarize relevant information from the problem:
where and are trainable parameters. Then the goal q and the context c are used to predict the token of this node from the target vocabulary . The probability of token is defined as:
where is the embedding of token :
where and are two trainable embeddings for operators and constants, respectively. For a number token, its embedding is the corresponding hidden state from the encoder, where is the index of in the problem . The predicted token is:
If the predicted token is a number token or constant, the node is terminated and its goal is realized by the predicted token; otherwise, the predicted token is an operator and the current goal is decomposed into left and right sub-goals combined by the operator. Please refer to the supplementary material for more details about the goal decomposition process.
Answer Calculation. The generated solution tree is transformed into a reasoning tree by creating auxiliary non-terminal nodes in place of the operator nodes to store the intermediate results, and the original operator nodes are attached as child nodes to the corresponding auxiliary nodes. Then the final answer is calculated by executing to the value of the root node in a bottom-up manner.
Drawing inspiration from humans’ ability to correct and learn from failures, we propose a fixing mechanism to correct the wrong solution trees via abductive reasoning following Li et al. (2020a) and use the fixed solution trees as pseudo labels for training. Specifically, we find the most probable fix for the wrong prediction by back-tracking the reasoning tree and propagating the error from the root node into the leaf nodes in a top-down manner.
The key ingredient in the fixing mechanism is the 1-step fix (1-FIX) algorithm which assumes that only one symbol in the reasoning tree can be substituted. As shown by the 1-Fix function in Algorithm 1, the 1-step fix starts from the root node of the reasoning tree and gradually searches down to find a fix that makes the final output equal to the ground-truth. The search process is implemented with a priority queue, where each element is defined as a fix-tuple :
is the current visiting node.
is the expected value on this node, which means if the value of is changed to , will execute to the ground-truth answer .
is the visiting priority, which reflects the probability of changing the value of .
In 1-FIX, error propagation through the solution tree is achieved by a function, which aims at computing the expected value of a child node from its parent’s expected value. Supposing is ’s child node and is the expected value of , the function works as following:
If is ’s left or right child, we directly solve the equation or to get ’s expected value , where denotes the operator.
If is an operator node, we try to replace with all other operators and check whether the new expression can generate the correct answer. That is, where is now an operator. If there is no satisfying this equation, the solve function returns none.
Please refer to the supplementary material for the definition of the visiting priority as well as the illustrative example of the 1-FIX process.
To search the neighbors of within multi-step distance, we extend the 1-step fix to multi-step by incorporating a RandomWalk function. As shown in Algorithm 1, if we find a fix by 1-FIX, we return this fix; otherwise, we randomly change one leaf node in the reasoning tree to another symbol within the same set (e.g., operators ) based on the probability in Equation 4. This process will be repeated for certain iterations until it finds a fix for the solution.
Solution Space Exploration
Tree Regularization While Li et al. (2020a) assumes the length of the intermediate representation is given, the expression length is unknown in weakly-supervised learning. Thus, the original solution space is infinite since the predicted token decides whether to continue the generation or stop. Therefore, it is critical to shrink the solution space, i.e., control the size of the generated solution trees. If the size of the generated solution tree varies a lot from the target size, it would be challenging for the solution or its fix to hit the correct answer. Although the target size is unknown, we observe a positive correlation between the target size and the number of quantities in text. Regarding this observation as a tree size prior, we design a tree regularization algorithm to generate a solution tree with a target size and regularize the size in an empirical range. Denote the size of a solution tree as the number of leaf nodes including quantities, constants, and operators. The prior range of given the length of the numeric value list is defined as:
are the hyperparameters. The effect of these hyperparameters will be discussed inTable 2.
We further propose a tree regularization algorithm to decode a solution tree with a given size. To generate a tree of a given size , we design two rules to produce a prefix-order expression during the preorder tree decoding:
The number of operators cannot be greater than .
Except the -th position, the number of numeric values (quantities and constants) cannot be greater than the number of operators.
These two rules are inspired by the syntax of prefix notation (a.k.a, normal Polish notation) for mathematical expressions. The rules shrink the target vocabulary in Equation 6 so that the tree generation can be stopped when it reaches the target size. Figure 3 shows illustrative examples of the tree regularization algorithm.
With tree regularization, we can search the possible fixes within a given range of tree size for each problem.
Memory Buffer. We adopt a memory buffer to track and save the discovered fixes for each problem. The memory buffer enables us to seek multiple solutions for a single problem and use all of them as pseudo labels for training, which encourages diverse solutions. Formally, given a problem and its buffer , the learning objective is to minimize the negative log-likelihood of all fixed expressions in the buffer:
The complete learning-by-fixing method is described in Algorithm 2. In the exploring state, we use the fixing mechanism and tree regularization to discover possible fixes for the wrong trees generated by the neural network, and put them into a buffer. In the learning stage, we train the model with all the solutions in the memory buffer by minimizing the loss function in Equation 8.
Dataset. We evaluate our proposed method on the Math23K dataset Wang et al. (2017). It contains 23,161 math word problems annotated with solution expressions and answers. For the weakly-supervised setting, we only use the problems and final answers and discard the expressions. We do cross-validation following the setting of Xie and Sun (2019).
Evaluation Metric. We evaluate the model performance by answer accuracy, where the generated solution is considered correct if it executes to the ground-truth answer. Specifically, we report answer accuracies of all the top- predictions using beam search. It evaluates the model’s ability to generate multiple possible solutions.
Models. We conduct experiments by comparing our methods with variants of weakly-supervised learning methods. Specifically, we experiment with two inference models: Seq2Seq with bidirectional Long Short Memory network (BiLSTM) Wu et al. (2016) and GTS Xie and Sun (2019), and train with four learning strategies: REINFORCE, MAPO Liang et al. (2018), LBF, LBF-w/o-M (without memory buffer). MAPO is a state-of-the-art method in semantic parsing task that extends the REINFORCE with augmented memory. Both models are also trained with the tree regularization algorithm. We also compare with the fully-supervised learning methods to demonstrate our superiority in generating diverse solutions. In the ablative studies, we analyze the effect of the proposed tree regularization and the length of search steps in fixing mechanism.
Comparisons with State-of-the-art
Table 1 summarizes the answer accuracy of different weakly-supervised learning methods and the state-of-the-art fully-supervised approaches. The proposed learning-by-fixing framework significantly outperforms the policy gradient baselines like REINFORCE and MAPO, on both the Seq2seq and the GTS models. It demonstrates the strength of our proposed LBF method in weakly-supervised learning. The GTS-LBF-fully model is trained by initializing the memory buffer with all the ground-truth expressions. It demonstrates that by extending to the fully-supervised setting, our model maintains the top-1 accuracy while significantly improving solutions’ diversity. We believe that learning MWPs with weak supervision is a promising direction. It requires fewer annotations and allows us to build larger datasets with less cost.
|Retrieval Robaidek et al. (2018)||47.2|
|Classification Robaidek et al. (2018)||57.9|
|LSTM Robaidek et al. (2018)||51.9|
|CNN Robaidek et al. (2018)||42.3|
|DNS Wang et al. (2017)||58.1|
|Seq2seqET Wang et al. (2018)||66.7|
|Stack-Decoder Chiang and Chen (2019)||65.8|
|T-RNN Wang et al. (2019)||66.9|
|GTS Xie and Sun (2019)||74.3|
|Graph2Tree Zhang et al. (2020a)||74.8 111 We run the code using the same setting as GTS for three times and compute the average accuracy.|
Figure 4 shows the learning curves of different weakly-supervised learning methods for the GTS model. The proposed LBF method converges significantly faster and achieves higher accuracy compared with other methods. Both the REINFORCE and MAPO take a long time to start improving, which indicates the policy gradient methods suffer from the cold-start and need time to accumulate rewarding samples.
Diverse Solutions with Memory Buffer
To evaluate the ability to generate diverse solutions, we report the answer accuracies of all the top-1/3/5 solutions on the test set using beam search, denoted as Acc@1/3/5, as shown in Table 2. In the weakly-supervised scenario, GTS-LBF achieves slightly better Acc@1 accuracy and much better Acc@3/5 accuracy than GTS-LBF-w/o-M. In the fully supervised scenario, GTS-LBF-fully achieves comparable Acc@1 accuracy and much better Acc@3/5 accuracy than the original GTS model. Particularly, GTS-LBF-fully outperforms GTS by 21% and 26% in terms of Acc@3/5 accuracy. It reveals the efficacy of the memory buffer in encouraging diverse solutions in both weakly-supervised learning and fully-supervised learning.
We visualize several examples of the top-5 predictions of GTS-LBF in Figure 5. In the first example, the first solution generated by our model is to sum up the prices of a table and a chair first, and then multiply it by the number of pairs of tables and chairs. Our model can also produce another reasonable solution (the fifth column) by deriving the prices of tables and chairs separately and then summing them up.
One caveat for the multiple solutions is that some solutions have different solution trees but are equivalent by switching the order of numeric values or subtrees, as shown in the first four solutions of the first problem in Figure 5. In particular, multiplication and addition are commutative, and our model learns and exploits this property to generate equivalent solutions with different tree structures.
The first solution to the fourth problem in Figure 5 is a typical error case of our model due to the wrong prediction of the problem goal. Another failure type is the spurious solutions, which are correct but not meaningful answers, such as the second solution of the third problem in Figure 5. To test how frequent the spurious solutions appear, we randomly select 500 examples from the test set, and ask three human annotators to determine whether each generated expression is right, wrong, or spurious. Table 3 provides the human evaluation results, and it shows that spurious solutions are rare in our model.
We test different choices of the hyperparameters defined by Equation 7 in tree regularization. As shown in Table 2, the model without tree regularization, i.e., tree size , fails to converge and gets nearly 0 accuracy. The best range for the solution tree size is , where . We provide an intuitive interpretation of this range: for a problem with quantities, (1) operators are needed to connect quantities, which leads to the lower bound of tree size to ; (2) in certain cases, the constants or quantities are used more than once, leading to a rough upper bound of . Therefore, we use as the default range in our implementations. Empirically, this range covers 88% of the lengths of the given ground-truth expressions in the Math23K dataset, providing an efficient prior for tree size.
Number of Search Steps
Table 4 shows the comparison of various step lengths in the m-FIX algorithm. In most cases, increasing the step length improves the chances of correcting wrong solutions, thus improving the performance.
In this work, we propose a weakly-supervised paradigm for learning MWPs and a novel learning-by-fixing framework to boost the learning. Our method endows the MWP learner with the capability of learning from wrong solutions, thus significantly improving the answer accuracy and learning efficiency. One future direction of the proposed model is to prevent generating equivalent or spurious solutions during training, possibly by making the generated solution trees more interpretable with semantic constraints.
The presented work should be categorized as research in the field of weakly-supervised learning and abductive reasoning. It can help teachers in school get various solutions of a math word problem. This work may also inspire new algorithmic, theoretical, and experimental investigation in neural-symbolic methods and NLP tasks.
This work reported herein is supported by ARO W911NF1810296, DARPA XAI N66001-17-2-4029, and ONR MURI N00014-16-1-2007.
- Learning to generalize from sparse and underspecified rewards. In ICML, Cited by: Neural-Symbolic Learning for NLP.
- MathQA: towards interpretable math word problem solving with operation-based formalisms. In NAACL-HLT, Cited by: Math Word Problems.
- Robust understanding of word problems with extraneous information. Cited by: Math Word Problems.
- Neural symbolic reader: scalable integration of distributed and symbolic representations for reading comprehension. In ICLR, Cited by: Math Word Problems.
- Semantically-aligned equation generation for solving and reasoning math word problems. ArXiv abs/1811.00720. Cited by: Math Word Problems, Table 1.
- Learning phrase representations using rnn encoder-decoder for statistical machine translation. EMNLP. Cited by: Goal-driven Tree-Structured Model.
- Bridging machine learning and logical reasoning by abductive learning. In Advances in Neural Information Processing Systems, pp. 2811–2822. Cited by: Neural-Symbolic Learning for NLP.
Combining logical abduction and statistical induction: discovering written primitives with human knowledge.
Thirty-First AAAI Conference on Artificial Intelligence, Cited by: Neural-Symbolic Learning for NLP.
- DROP: a reading comprehension benchmark requiring discrete reasoning over paragraphs. In NAACL-HLT, Cited by: Math Word Problems.
- Understanding and solving arithmetic word problems: a computer simulation. Behavior Research Methods, Instruments, & Computers 17, pp. 565–571. Cited by: Math Word Problems.
- From language to programs: bridging reinforcement learning and maximum marginal likelihood. In ACL, Cited by: Neural-Symbolic Learning for NLP.
- Academic reader: an interactive question answering system on academic literatures. Thirty-Third AAAI Conference on Artificial Intelligence. Cited by: Math Word Problems.
- Neural math word problem solver with reinforcement learning. In COLING, Cited by: Introduction, Math Word Problems.
- How well do computers solve math word problems? large-scale dataset construction and evaluation. In ACL, Cited by: Math Word Problems.
- Learning fine-grained expressions to solve math word problems. In EMNLP, Cited by: Math Word Problems.
- Parsing algebraic word problems into equations. Transactions of the Association for Computational Linguistics 3, pp. 585–597. Cited by: Math Word Problems, Math Word Problems.
- Learning to automatically solve algebra word problems. In ACL, Cited by: Math Word Problems, Math Word Problems.
- Closed loop neural-symbolic learning via integrating neural perception, grammar parsing, and symbolic reasoning. In International Conference on Machine Learning (ICML), Cited by: Neural-Symbolic Learning for NLP, Fixing Mechanism, Solution Space Exploration.
A competence-aware curriculum for visual concepts learning via question answering.
European Conference on Computer Vision, Cited by: Math Word Problems, Neural-Symbolic Learning for NLP.
- Neural symbolic machines: learning semantic parsers on freebase with weak supervision. arXiv preprint arXiv:1611.00020. Cited by: Neural-Symbolic Learning for NLP.
- Neural symbolic machines: learning semantic parsers on freebase with weak supervision. In ACL, Cited by: Neural-Symbolic Learning for NLP.
- Memory augmented policy optimization for program synthesis and semantic parsing. In NeurIPS, Cited by: Neural-Symbolic Learning for NLP, Experimental Setup.
- Program induction by rationale generation: learning to solve and explain algebraic word problems. ArXiv abs/1705.04146. Cited by: Introduction, Math Word Problems.
- Abductive cognition: the epistemological and eco-cognitive dimensions of hypothetical reasoning. Vol. 3, Springer Science & Business Media. Cited by: Introduction.
- Learning to use formulas to solve simple arithmetic problems. In ACL, Cited by: Math Word Problems.
- A restricted visual turing test for deep scene and event understanding. ArXiv abs/1512.01715. Cited by: Math Word Problems.
- Data-driven methods for solving algebra word problems. ArXiv abs/1804.10718. Cited by: Math Word Problems, Table 1.
- Unit dependency graph and its application to arithmetic word problem solving. In AAAI, Cited by: Math Word Problems.
- Automatically solving number word problems by semantic parsing and reasoning. In EMNLP, Cited by: Math Word Problems.
- Joint video and text parsing for understanding events and answering queries. IEEE MultiMedia 21, pp. 42–70. Cited by: Math Word Problems.
- Learning from explicit and implicit supervision jointly for algebra word problems. In EMNLP, Cited by: Math Word Problems.
- Translating math word problem to expression tree. In EMNLP, Cited by: Introduction, Math Word Problems, Table 1.
- Template-based math word problem solvers with recursive neural networks. Proceedings of the AAAI Conference on Artificial Intelligence 33 (01), pp. 7144–7151. Cited by: Math Word Problems, Table 1.
- Deep neural solver for math word problems. Copenhagen, Denmark, pp. 845–854. Cited by: Introduction, Introduction, Math Word Problems, Math Word Problems, Experimental Setup, Table 1.
- Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8 (3-4), pp. 229–256. Cited by: Introduction.
Google’s neural machine translation system: bridging the gap between human and machine translation. ArXiv abs/1609.08144. Cited by: Experimental Setup.
- A goal-driven tree-structured neural model for math word problems. In IJCAI, Cited by: Introduction, Introduction, Math Word Problems, Math Word Problems, Goal-driven Tree-Structured Model, Weakly-Supervised MWPs, Experimental Setup, Experimental Setup, Table 1.
- ReClor: a reading comprehension dataset requiring logical reasoning. ArXiv abs/2002.04326. Cited by: Math Word Problems.
- Frame-based calculus of solving arithmetic multi-step addition and subtraction word problems. 2010 Second International Workshop on Education Technology and Computer Science 2, pp. 476–479. Cited by: Math Word Problems.
RAVEN: a dataset for relational and analogical visual reasoning.
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5312–5322. Cited by: Math Word Problems.
- Graph-to-tree learning for solving math word problems. ACL 2020. Cited by: Math Word Problems, Table 1.
- Machine number sense: a dataset of visual arithmetic problems for abstract and relational reasoning. ArXiv abs/2004.12193. Cited by: Math Word Problems.
- Learn to solve algebra word problems using quadratic programming. In EMNLP, Cited by: Math Word Problems, Math Word Problems.
- Abductive learning: towards bridging machine learning and logical reasoning. Science China Information Sciences 62, pp. 1–3. Cited by: Introduction.
- Abductive learning: towards bridging machine learning and logical reasoning. Science China Information Sciences 62, pp. 1–3. Cited by: Neural-Symbolic Learning for NLP.
- Dark, beyond deep: a paradigm shift to cognitive ai with humanlike common sense. Engineering. Cited by: Math Word Problems.