Inferential Text Generation with Multiple Knowledge Sources and Meta-Learning

04/07/2020 ∙ by Daya Guo, et al. ∙ 9

We study the problem of generating inferential texts of events for a variety of commonsense like if-else relations. Existing approaches typically use limited evidence from training examples and learn for each relation individually. In this work, we use multiple knowledge sources as fuels for the model. Existing commonsense knowledge bases like ConceptNet are dominated by taxonomic knowledge (e.g., isA and relatedTo relations), having a limited number of inferential knowledge. We use not only structured commonsense knowledge bases, but also natural language snippets from search-engine results. These sources are incorporated into a generative base model via key-value memory network. In addition, we introduce a meta-learning based multi-task learning algorithm. For each targeted commonsense relation, we regard the learning of examples from other relations as the meta-training process, and the evaluation on examples from the targeted relation as the meta-test process. We conduct experiments on Event2Mind and ATOMIC datasets. Results show that both the integration of multiple knowledge sources and the use of the meta-learning algorithm improve the performance.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

When a daily event such as “Peter makes John’s coffee” occurs, people have the ability to reason about the causes (e.g. “Peter wanted to be helpful”) and effects (e.g. “Peter get thanked”) of the event. Inferential text generation is the task that aims to test a computational system’s ability on the reasoning of inferential knowledge. Given a piece of text like “Peter makes John’s coffee” and one of the pre-defined relations like “cause”, the task aims to generate the desired sequence of words.

We study the problem on two benchmark datasets, Event2Mind [8] and ATOMIC [9], both of which call for commonsense inference on events. Till now, sequence-to-sequence model trained on each relation separately achieves the promising performance on both datasets [9, 3]. We contribute at the knowledge source level and at the training algorithm. Firstly, given a text and a particular relation as input, our approach uses automatically retrieved evidences from external multiple knowledge sources, including ConceptNet and search-engine results, for the generation for an event. Secondly, instead of training each relation separately, we regard examples of other relations as auxiliary tasks to improve the targeted relation. We use model-agnostic meta-learning (MAML) [4]

here, which has achieved promising performances on low-resource image classification and reinforcement learning. We regard learning from other relations as the meta-training process, and the evaluation on the targeted relation as the meta-test process.

Results on both datasets show that the integration of external knowledge sources improves the performance, and using multi-task learning with MAML brings further improvements.

2 Task and Base Model

Given an event phrase and a commonsense relation as input, the task is to generate a sequence , which is the desired hypothesis for the input event on the given relation .

We evaluate on Event2Mind [8] and ATOMIC [9], both of which contain about 25,000 event phrases. Event2Mind focuses on three relations related to mental states (i.e. intents and reactions of the actors), while ATOMIC has nine inferential dimensions includes mental states (the mental pre- and post- conditions of events), event (events about pre- and post- conditions of events) and persona (a stative relation about how the subject of an event is perceived).

2.1 Base Model: Encoder-Decoder

Our base model is an encoder-decoder approach conditioned on a particular relation.

Encoder

A bi-directional RNN with gated recurrent unit (GRU)

[1] is used to read a event phrase conditioned on a inference type . Specially, at -th step, we concatenate the embedding of -th word and the inference type as the input. We then get the final representation of the source sentence , where and are last hidden states of the forward and backward RNN, respectively.

Decoder

We use a GRU with an attention mechanism as the decoder. At each time-step

, the context vector

is computed same as the multiplicative attention [5]. Afterwards, the concatenation of the context vector, the embedding of the previously predicted word , the embedding of the inference type and the last hidden state is fed to the next step. After obtaining hidden states by GRU, we predict a word from the target vocabulary by a linear layer followed by a softmax function.

3 Approach

In this section, we first describe the extraction and the use of knowledge sources, and then describe the use of model-agnostic meta-learning.

3.1 Knowledge from ConceptNet

We use triplets

from ConceptNet as knowledge. We follow wang2018yuanfudao, and retrieve triples from ConceptNet that contain any n-gram in the target sentence. Yet, ConceptNet is dominated by taxonomic knowledge (e.g., “

river is related to water.”), and inferential knowledge (e.g., “a gift is used for celebrating a birthday”) tend to be rare. For instance, among the entire collections of knowledge triplets in ConceptNet, the most frequent relationship type is “RelatedTo” (37.5%) followed by “isA” (7.0%), “AtLocation” (5.2%), and “Synonym” (4.6%). We attempts to use search results to increase the coverage.

3.2 Knowledge from Web Search Snippets

As suggested by emami2018generalized, texts from search engines provide valuable information for commonsense question answering such as Winograd Schema Challenge. To extract knowledge from web search, we first prepossess an input even phrase , removing place holders and stop words to keep the essential terms of the event. We then automatically generate search queries by concatenating the prepossessed input event phrases and a pre-defined keyword phrase. We use Google search in this work. The list of part of predefined key phrases for ATOMIC and Event2Mind could be found in Table 2. As retrieved results contain many irrelevant words, we only retain nouns, adjectives, and verbs.

relation key phrases xIntent motivated by, has subevent, intentions, why {o, x}React causes, has subevent, reactions xAttr has property, attribute, who xNeed needs, motivated by, has prerequisite, before {o, x}Want motivated by, causes desire, intentions {o, x}Effect causes, has subevents, effects, influences
Table 1: Examples of key phrase for knowledge hunting from web-search results.
ConceptNet Web Search # of knowledge per event 201 3,780 Hit @ xIntent 28.32% 86.1% Hit @ xReact 6.76% 63.7% Hit @ oReact 2.37 % 57.7%
Table 2: The coverage of existing knowledge bases and the natural language snippets results on web search.

We randomly sample 100 examples from the Event2Mind development dataset, and calculate the overlap of tokens in each triplet with any of the knowledge triplets. As shown in Table 2, the coverage of ConceptNet is low, while the coverage of the triplet extracted from search snippets results is higher.

3.3 Key-Value Memory

We treat subjects with relations (search queries) as keys and objects (search results) as values in ConceptNet (Web Search), and retrieved key-value pairs from external knowledge bases are stored in a memory. Similar to miller2016key, we first use source sentence to calculate the relevance between the event phrase and keys through attention mechanism, and then obtain the knowledge representation by weighting averaging values according to the relevance. is used as initial hidden state of the decoder.

3.4 Multi-Task Learning with Meta-Learning

We human beings are very versatile in that we have the ability to leverage experiences learnt from other tasks to help us complete the task at hand. In this work, a natural intuition for multi-task learning is to use examples from other relations to improve the targeted relation. This can be directly modeled by model-agnostic meta-learning [4]

, which has a meta-train step to quickly update the parameter with several gradient decent steps, followed by a meta-test step which evaluates the new parameter. The final loss at the meta-test step will be used to measure the goodness of the entire learning process and update the model parameter. We summarize the learning algorithm in Algorithm 1. In this work, for each targeted relation, we regard the learning of examples from other relations as the meta-train process (line 4-8), and the evaluation on examples from the targeted relation as the meta-test process (line 8). Meanwhile, we remain the original supervised loss function for each relations (line 9).

0:  Training datapoints
0:  : step size hyper parameters
1:  Randomly initialize
2:  while not done do
3:     Sample batch of data for each
4:     for all  do
5:         Evaluate using training samples from {}
6:         Compute parameters with gradient descent:
7:     end for
8:     Update using one batch of data from task
9:     Update using another batch of data from task
10:  end while
Algorithm 1

4 Experiment

Table 3 and Table 4 report results of different approaches on Event2Mind and ATOMIC datasets. We follow [8] and [9]

to use Recall and Bleu-2 at top 10 generated texts as evaluation metrics (

Recall@10 and BLEU@10), respectively. Training details are given in appendix.

Methods Dev Test
xIntent xReact oReact xIntent xReact oReact
Single-task 42.05% 43.97% 69.91% 42.29% 44.55% 69.94%
Single-task+ConceptNet 43.24% 44.02% 70.02% 43.43% 44.94% 69.87%
Single-task+Google 42.69% 44.18% 70.12% 42.94% 44.79% 70.01%
Single-task+ConceptNet+Google 43.31% 44.20% 70.24% 43.52% 45.04% 70.25%
Multi-task+ConceptNet+Google 43.21% 43.87% 70.38% 43.28% 45.03% 69.97%
MAML+ConceptNet+Google 43.24% 44.17% 70.54% 43.50% 45.32% 70.52%
Table 3: Recall@10 on three inference types of the Event2Mind dataset with different approaches.

Single-task training different sequence-to-sequence models for each inference type separately, Multi-task represents multi-task learning way, and ConceptNet (Google) stands for knowledge resources. We can see that incorporating knowledge resources achieve a gain of 0.6% recall and 0.3% bleu on Event2Mind and ATOMIC datasets, respectively. Results also show that applying our MAML framework, shown in Algorithm 1, to multi-task learning performs better on the majority of inference types.

Methods xIntent xNeed xAttr xEffect xWant xReact oEffect oWant oReact
9ENC9DEC [9] 3.47 9.93 1.64 7.53 7.66 3.15 5.02 8.12 3.51

Single-task
6.03 16.85 4.81 9.10 11.13 4.75 4.61 7.38 4.41
Single-task+ConceptNet 6.21 16.62 4.72 9.13 11.20 4.62 4.24 7.50 4.77
Single-task+Google 6.45 16.86 4.77 9.21 11.30 4.87 4.13 8.36 4.81
Single-task+ConceptNet+Google 6.46 17.06 4.77 9.44 11.35 4.68 4.64 6.54 4.70
Multi-task+ConceptNet+Google 7.24 16.66 4.81 9.57 11.55 4.82 5.89 8.99 4.71

MAML+ConceptNet+Google
6.70 17.48 4.88 9.98 11.80 4.64 6.26 9.02 4.36
Table 4: BLEU@10 on nine inference types of the ATOMIC test dataset with different approaches.

4.1 Effect of the number of search snippets

Figure 1: Overall Recall@10 with different number of Google search snippets on the Event2Mind dataset.

We study how the number of Google search snippets affects the performance of the model, shown in Figure 1. We can see that applying Google search snippets could bring higher recall, which demonstrates the usefulness of knowledge. Although collected search snippets are valuable knowledge sources, they might contain more noise with the increasing number of Google search snippets, which hurts performance of the model.

4.2 Error analysis

We analyze randomly selected 100 wrongly predicted instances on the ATOMIC dataset, and summary three main classes of errors. The first problem is that most examples generate correct texts but not in the set of gold answers, which needs more careful evaluation by humans. The second is that the model mistakes inference types. Specially, given a same input and different inference type, the model tends to generate similar outputs. This problem might be mitigated by incorporating more information of inference type. Lastly, some examples fails to generate correct texts since lacking of specific commonsense knowledge. For examples, “PersonX talks in class” will be punished but the model generates “listens to the teacher” and “PersonX drinks ___ everyday” will gain weight but the model generates “loses weight”. There are two potential directions to make further improvements. The first direction is to leverage more knowledge resources from different dimension. The second direction is to utilize more powerful pre-trained model, such as BERT [2].

5 Conclusion

We present a generative model for generating inferential text of if-else relations. We incorporate two types of knowledge from ConceptNet and Google search results, and use model-agnostic meta-learning (MAML) to utilize examples from other relations. Experiments show that the integration of external knowledge and MAML both improve the accuracy.

References

  • [1] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In EMNLP, Cited by: §2.1.
  • [2] J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §4.2.
  • [3] L. Du, X. Ding, T. Liu, and Z. Li (2019)

    Modeling event background for if-then commonsense reasoning using context-aware variational autoencoder

    .
    In

    Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

    ,
    pp. 2682–2691. Cited by: §1.
  • [4] C. Finn, P. Abbeel, and S. Levine (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In

    Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017

    ,
    pp. 1126–1135. External Links: Link Cited by: §1, §3.4.
  • [5] T. Luong, H. Pham, and C. D. Manning (2015)

    Effective approaches to attention-based neural machine translation

    .
    In EMNLP, Cited by: §2.1.
  • [6] J. Pennington, R. Socher, and C. D. Manning (2014) Glove: global vectors for word representation.. In EMNLP, Cited by: Inferential Text Generation with Multiple Knowledge Sources and Meta-Learning.
  • [7] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer (2018) Deep contextualized word representations. Cited by: Inferential Text Generation with Multiple Knowledge Sources and Meta-Learning.
  • [8] H. Rashkin, M. Sap, E. Allaway, N. A. Smith, and Y. Choi (2018) Event2Mind: commonsense inference on events, intents, and reactions. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 463–473. External Links: Link Cited by: §1, §2, §4, Table 5.
  • [9] M. Sap, R. LeBras, E. Allaway, C. Bhagavatula, N. Lourie, H. Rashkin, B. Roof, N. A. Smith, and Y. Choi (2019) ATOMIC: an atlas of machine commonsense for if-then reasoning. In AAAI, Cited by: §1, §2, Table 4, §4, Table 5.