When a daily event such as “Peter makes John’s coffee” occurs, people have the ability to reason about the causes (e.g. “Peter wanted to be helpful”) and effects (e.g. “Peter get thanked”) of the event. Inferential text generation is the task that aims to test a computational system’s ability on the reasoning of inferential knowledge. Given a piece of text like “Peter makes John’s coffee” and one of the pre-defined relations like “cause”, the task aims to generate the desired sequence of words.
We study the problem on two benchmark datasets, Event2Mind  and ATOMIC , both of which call for commonsense inference on events. Till now, sequence-to-sequence model trained on each relation separately achieves the promising performance on both datasets [9, 3]. We contribute at the knowledge source level and at the training algorithm. Firstly, given a text and a particular relation as input, our approach uses automatically retrieved evidences from external multiple knowledge sources, including ConceptNet and search-engine results, for the generation for an event. Secondly, instead of training each relation separately, we regard examples of other relations as auxiliary tasks to improve the targeted relation. We use model-agnostic meta-learning (MAML) 
here, which has achieved promising performances on low-resource image classification and reinforcement learning. We regard learning from other relations as the meta-training process, and the evaluation on the targeted relation as the meta-test process.
Results on both datasets show that the integration of external knowledge sources improves the performance, and using multi-task learning with MAML brings further improvements.
2 Task and Base Model
Given an event phrase and a commonsense relation as input, the task is to generate a sequence , which is the desired hypothesis for the input event on the given relation .
We evaluate on Event2Mind  and ATOMIC , both of which contain about 25,000 event phrases. Event2Mind focuses on three relations related to mental states (i.e. intents and reactions of the actors), while ATOMIC has nine inferential dimensions includes mental states (the mental pre- and post- conditions of events), event (events about pre- and post- conditions of events) and persona (a stative relation about how the subject of an event is perceived).
2.1 Base Model: Encoder-Decoder
Our base model is an encoder-decoder approach conditioned on a particular relation.
A bi-directional RNN with gated recurrent unit (GRU) is used to read a event phrase conditioned on a inference type . Specially, at -th step, we concatenate the embedding of -th word and the inference type as the input. We then get the final representation of the source sentence , where and are last hidden states of the forward and backward RNN, respectively.
We use a GRU with an attention mechanism as the decoder. At each time-step
, the context vectoris computed same as the multiplicative attention . Afterwards, the concatenation of the context vector, the embedding of the previously predicted word , the embedding of the inference type and the last hidden state is fed to the next step. After obtaining hidden states by GRU, we predict a word from the target vocabulary by a linear layer followed by a softmax function.
In this section, we first describe the extraction and the use of knowledge sources, and then describe the use of model-agnostic meta-learning.
3.1 Knowledge from ConceptNet
We use triplets
from ConceptNet as knowledge. We follow wang2018yuanfudao, and retrieve triples from ConceptNet that contain any n-gram in the target sentence. Yet, ConceptNet is dominated by taxonomic knowledge (e.g., “river is related to water.”), and inferential knowledge (e.g., “a gift is used for celebrating a birthday”) tend to be rare. For instance, among the entire collections of knowledge triplets in ConceptNet, the most frequent relationship type is “RelatedTo” (37.5%) followed by “isA” (7.0%), “AtLocation” (5.2%), and “Synonym” (4.6%). We attempts to use search results to increase the coverage.
3.2 Knowledge from Web Search Snippets
As suggested by emami2018generalized, texts from search engines provide valuable information for commonsense question answering such as Winograd Schema Challenge. To extract knowledge from web search, we first prepossess an input even phrase , removing place holders and stop words to keep the essential terms of the event. We then automatically generate search queries by concatenating the prepossessed input event phrases and a pre-defined keyword phrase. We use Google search in this work. The list of part of predefined key phrases for ATOMIC and Event2Mind could be found in Table 2. As retrieved results contain many irrelevant words, we only retain nouns, adjectives, and verbs.
We randomly sample 100 examples from the Event2Mind development dataset, and calculate the overlap of tokens in each triplet with any of the knowledge triplets. As shown in Table 2, the coverage of ConceptNet is low, while the coverage of the triplet extracted from search snippets results is higher.
3.3 Key-Value Memory
We treat subjects with relations (search queries) as keys and objects (search results) as values in ConceptNet (Web Search), and retrieved key-value pairs from external knowledge bases are stored in a memory. Similar to miller2016key, we first use source sentence to calculate the relevance between the event phrase and keys through attention mechanism, and then obtain the knowledge representation by weighting averaging values according to the relevance. is used as initial hidden state of the decoder.
3.4 Multi-Task Learning with Meta-Learning
We human beings are very versatile in that we have the ability to leverage experiences learnt from other tasks to help us complete the task at hand. In this work, a natural intuition for multi-task learning is to use examples from other relations to improve the targeted relation. This can be directly modeled by model-agnostic meta-learning 
, which has a meta-train step to quickly update the parameter with several gradient decent steps, followed by a meta-test step which evaluates the new parameter. The final loss at the meta-test step will be used to measure the goodness of the entire learning process and update the model parameter. We summarize the learning algorithm in Algorithm 1. In this work, for each targeted relation, we regard the learning of examples from other relations as the meta-train process (line 4-8), and the evaluation on examples from the targeted relation as the meta-test process (line 8). Meanwhile, we remain the original supervised loss function for each relations (line 9).
to use Recall and Bleu-2 at top 10 generated texts as evaluation metrics (Recall@10 and BLEU@10), respectively. Training details are given in appendix.
Single-task training different sequence-to-sequence models for each inference type separately, Multi-task represents multi-task learning way, and ConceptNet (Google) stands for knowledge resources. We can see that incorporating knowledge resources achieve a gain of 0.6% recall and 0.3% bleu on Event2Mind and ATOMIC datasets, respectively. Results also show that applying our MAML framework, shown in Algorithm 1, to multi-task learning performs better on the majority of inference types.
4.1 Effect of the number of search snippets
We study how the number of Google search snippets affects the performance of the model, shown in Figure 1. We can see that applying Google search snippets could bring higher recall, which demonstrates the usefulness of knowledge. Although collected search snippets are valuable knowledge sources, they might contain more noise with the increasing number of Google search snippets, which hurts performance of the model.
4.2 Error analysis
We analyze randomly selected 100 wrongly predicted instances on the ATOMIC dataset, and summary three main classes of errors. The first problem is that most examples generate correct texts but not in the set of gold answers, which needs more careful evaluation by humans. The second is that the model mistakes inference types. Specially, given a same input and different inference type, the model tends to generate similar outputs. This problem might be mitigated by incorporating more information of inference type. Lastly, some examples fails to generate correct texts since lacking of specific commonsense knowledge. For examples, “PersonX talks in class” will be punished but the model generates “listens to the teacher” and “PersonX drinks ___ everyday” will gain weight but the model generates “loses weight”. There are two potential directions to make further improvements. The first direction is to leverage more knowledge resources from different dimension. The second direction is to utilize more powerful pre-trained model, such as BERT .
We present a generative model for generating inferential text of if-else relations. We incorporate two types of knowledge from ConceptNet and Google search results, and use model-agnostic meta-learning (MAML) to utilize examples from other relations. Experiments show that the integration of external knowledge and MAML both improve the accuracy.
-  (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In EMNLP, Cited by: §2.1.
-  (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §4.2.
Modeling event background for if-then commonsense reasoning using context-aware variational autoencoder. In
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2682–2691. Cited by: §1.
Model-agnostic meta-learning for fast adaptation of deep networks.
Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pp. 1126–1135. External Links: Cited by: §1, §3.4.
Effective approaches to attention-based neural machine translation. In EMNLP, Cited by: §2.1.
-  (2014) Glove: global vectors for word representation.. In EMNLP, Cited by: Inferential Text Generation with Multiple Knowledge Sources and Meta-Learning.
-  (2018) Deep contextualized word representations. Cited by: Inferential Text Generation with Multiple Knowledge Sources and Meta-Learning.
-  (2018) Event2Mind: commonsense inference on events, intents, and reactions. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 463–473. External Links: Cited by: §1, §2, §4, Table 5.
-  (2019) ATOMIC: an atlas of machine commonsense for if-then reasoning. In AAAI, Cited by: §1, §2, Table 4, §4, Table 5.