Unsupervised Question Decomposition for Question Answering

02/22/2020 ∙ by Ethan Perez, et al. ∙ 14

We aim to improve question answering (QA) by decomposing hard questions into easier sub-questions that existing QA systems can answer. Since collecting labeled decompositions is cumbersome, we propose an unsupervised approach to produce sub-questions. Specifically, by leveraging >10M questions from Common Crawl, we learn to map from the distribution of multi-hop questions to the distribution of single-hop sub-questions. We answer sub-questions with an off-the-shelf QA model and incorporate the resulting answers in a downstream, multi-hop QA system. On a popular multi-hop QA dataset, HotpotQA, we show large improvements over a strong baseline, especially on adversarial and out-of-domain questions. Our method is generally applicable and automatically learns to decompose questions of different classes, while matching the performance of decomposition methods that rely heavily on hand-engineering and annotation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: Overview: Using unsupervised learning, we decompose a multi-hop question into single-hop sub-questions, whose predicted answers are given to a downstream question answering model.

Question answering (QA) systems have become remarkably good at answering simple, single-hop questions but still struggle with compositional, multi-hop questions (Yang et al., 2018; Hudson and Manning, 2019). In this work, we examine if we can answer hard questions by leveraging our ability to answer simple questions. Specifically, we approach QA by breaking a hard question into a series of sub-questions that can be answered by a simple, single-hop QA system. The system’s answers can then be given as input to a downstream QA system to answer the hard question, as shown in Fig. 1. Our approach thus answers the hard question in multiple, smaller steps, which can be easier than answering the hard question all at once. For example, it may be easier to answer “What profession do H. L. Mencken and Albert Camus have in common?” when given the answers to the sub-questions “What profession does H. L. Mencken have?” and “Who was Albert Camus?”

Prior work in learning to decompose questions into sub-questions has relied on extractive heuristics, which generalizes poorly to different domains and question types, and requires human annotation 

(Talmor and Berant, 2018; Min et al., 2019b)

. In order to scale to any arbitrary question, we would require sophisticated natural language generation capabilities, which often relies on large quantities of high-quality supervised data. Instead, we find that it is possible to learn to decompose questions

without supervision.

Specifically, we learn to map from the distribution of hard questions to the distribution of simpler questions. First, we automatically construct a noisy, “pseudo-decomposition” for each hard question by retrieving relevant sub-question candidates based on their similarity to the given hard question. We retrieve candidates from a corpus of 10M simple questions that we extracted from Common Crawl. Second, we train neural text generation models on that data with (1) standard sequence-to-sequence learning and (2) unsupervised sequence-to-sequence learning. The latter has the advantage that it can go beyond the noisy pairing between questions and pseudo-decompositions. Fig. 2 overviews our decomposition approach.

Figure 2: Unsupervised Decomposition: Step 1: We create a corpus of pseudo-decompositions by finding candidate sub-questions from a simple question corpus which are similar to a multi-hop question in . Step 2: We learn to map multi-hop questions to decompositions using and as training data, via either standard or unsupervised sequence-to-sequence learning.

We use decompositions to improve multi-hop QA. We first use an off-the-shelf single-hop QA model to answer decomposed sub-questions. We then give each sub-question and its answer as additional input to a multi-hop QA model. We test our method on HotpotQA (Yang et al., 2018), a popular multi-hop QA benchmark.

Our contributions are as follows. First, QA models relying on decompositions improve accuracy over a strong baseline by 3.1 F1 on the original dev set, 11 F1 on the multi-hop dev set from Jiang and Bansal (2019), and 10 F1 on the out-of-domain dev set from Min et al. (2019b). Our most effective decomposition model is a 12-block transformer encoder-decoder (Vaswani et al., 2017) trained using unsupervised sequence-to-sequence learning, involving masked language modeling, denoising, and back-translation objectives (Lample and Conneau, 2019). Second, our method is competitive with state-of-the-art methods SAE (Tu et al., 2020) and HGN (Fang et al., 2019) which leverage strong supervision. Third, we show that our approach automatically learns to generate useful decompositions for all 4 question types in HotpotQA, highlighting the general nature of our approach. In our analysis, we explore how sub-questions improve multi-hop QA, and we provide qualitative examples that highlight how question decomposition adds a form of interpretability to black-box QA models. Our ablations show that each component of our pipeline contributes to QA performance. Overall, we find that it is possible to successfully decompose questions without any supervision and that doing so improves QA.

2 Method

We now formulate the problem and overview our high-level approach, with details in the following section. We aim to leverage a QA model that is accurate on simple questions to answer hard questions, without using supervised question decompositions. Here, we consider simple questions to be “single-hop” questions that require reasoning over one paragraph or piece of evidence, and we consider hard questions to be “multi-hop.” Our aim is then to train a multi-hop QA model to provide the correct answer to a multi-hop question about a given a context (e.g., several paragraphs). Normally, we would train to maximize . To help , we leverage a single-hop QA model that may be queried with sub-questions , whose “sub-answers” to each sub-question may be provided to the multi-hop QA model. may then instead maximize the (potentially easier) objective .

Supervised decomposition models learn to map each question to a decomposition of sub-questions using annotated examples. In this work, we do not assume access to strong supervision. To leverage the single-hop QA model without supervision, we follow a three-stage approach: 1) map a question into sub-questions via unsupervised techniques, 2) find sub-answers with the single-hop QA model, and 3) provide and to help predict .

2.1 Unsupervised Question Decomposition

To train a decomposition model, we need appropriate training data. We assume access to a hard question corpus and a simple question corpus . Instead of using supervised training examples, we design an algorithm that constructs pseudo-decompositions to form pairs from and using an unsupervised approach (§2.1.1). We then train a model to map to a decomposition. We explore learning to decompose with standard and unsupervised sequence-to-sequence learning (§2.1.2).

2.1.1 Creating Pseudo-Decompositions

For each , we construct a pseudo-decomposition set by retrieving simple question from . We concatenate all simple questions in to form the pseudo-decomposition used downstream. may be chosen based on the task or vary based on . To retrieve useful simple questions for answering , we face a joint optimization problem. We want sub-questions that are both (i) similar to according to some metric and (ii) maximally diverse:

(1)

2.1.2 Learning to Decompose

Having now retrieved relevant pseudo-decompositions, we examine different ways to learn to decompose (with implementation details in the following section):

No Learning

We use pseudo-decompositions directly, employing retrieved sub-questions in downstream QA.

Sequence-to-Sequence (Seq2Seq)

We train a Seq2Seq model with parameters to maximize .

Unsupervised Sequence-to-Sequence (USeq2Seq)

We start with paired examples but do not learn from the pairing, because the pairing is noisy. We use unsupervised sequence-to-sequence learning to learn a mapping instead of training directly on the noisy pairing.

2.2 Answering Sub-Questions

To answer the generated sub-questions, we use an off-the-shelf QA model. The QA model may answer sub-questions using any free-form text (i.e., a word, phrase, sentence, etc.). Any QA model is suitable, so long as it can accurately answer simple questions in . We thus leverage good accuracy on questions in to help QA models on questions in .

2.3 QA using Decompositions

Downstream QA systems may use sub-questions and sub-answers in various ways. We add sub-questions and sub-answers as auxiliary input for a downstream QA model to incorporate in its processing. We now describe the implementation details of our approach outlined above.

3 Experimental Setup

3.1 Question Answering Task

We test unsupervised decompositions on HotpotQA (Yang et al., 2018), a standard benchmark for multi-hop QA. We use HotpotQA’s “Distractor Setting,” which provides 10 context paragraphs from Wikipedia. Two (or more) paragraphs contain question-relevant sentences called “supporting facts,” and the remaining paragraphs are irrelevant, “distractor paragraphs.” Answers in HotpotQA are either yes, no, or a span of text in an input paragraph. Accuracy is measured with F1 and Exact Match (EM) scores between the predicted and gold spans.

3.2 Unsupervised Decomposition

3.2.1 Question Data

We use HotpotQA questions as our initial multi-hop, hard question corpus . We use SQuAD 2 questions as our initial single-hop, simple question corpus . However, our pseudo-decomposition corpus should be large, as the corpus will be used to train neural Seq2Seq models, which are data hungry. A larger will also improve the relevance of retrieved simple questions to the hard question. Thus, we take inspiration from work in machine translation on parallel corpus mining (Xu and Koehn, 2017; Artetxe and Schwenk, 2019) and in unsupervised QA (Lewis et al., 2019). We augment and

by mining more questions from Common Crawl. We choose sentences which start with common “wh”-words and end with “?” Next, we train a FastText classifier 

(Joulin et al., 2017) to classify between 60K questions sampled from Common Crawl, SQuAD 2, and HotpotQA. Then, we classify Common Crawl questions, adding questions classified as SQuAD 2 questions to and questions classified as HotpotQA questions to . Question mining greatly increases the number of single-hop questions (130K 10.1M) and multi-hop questions (90K 2.4M). Thus, our unsupervised approach allows us to make use of far more data than supervised counterparts.

3.2.2 Creating Pseudo-Decompositions

To create pseudo-decompositions, we set the number of sub-questions per question to 2, as questions in HotpotQA usually involve two reasoning hops. In Appendix §A.1, we discuss how our method works when varies per question.

Similarity-based Retrieval

To retrieve question-relevant sub-questions, we embed any text

into a vector

by summing the FastText vectors (Bojanowski et al., 2017)111We use 300-dim. English Common Crawl vectors: https://fasttext.cc/docs/en/english-vectors.html for words in .222We also tried TFIDF and BERT representations but did not see significant improvements over FastText (see Appendix §A.3).

We use cosine similarity as our similarity metric

. Let be a multi-hop question used to retrieve pseudo-decomposition , and let be the unit vector of . Since , Eq. 1 reduces to:

(2)

The last term requires comparisons, which is expensive as is large (10M). Instead of solving Eq. (2) exactly, we find an approximate pseudo-decomposition by computing Eq. (2) over , using . We use FAISS (Johnson et al., 2017a) to efficiently build .

Random Retrieval

For comparison, we test random pseudo-decompositions, where we randomly retrieve by sampling from . USeq2Seq trained on random should at minimum learn to map to multiple simple questions.

Editing Pseudo-Decompositions

Since the sub-questions are retrieval-based, the sub-questions are often not about the same entities as . As a post-processing step, we replace entities in with entities from . We find all entities in that do not appear in using spaCy (Honnibal and Montani, 2017). We replace these entities with a random entity from with the same type (e.g., “Date” or “Location”) if and only if one exists. We use entity replacement on pseudo-decompositions from both random and similarity-based retrieval.

3.2.3 Unsupervised Decomposition Models

Pre-training

Pre-training is a key ingredient for unsupervised Seq2Seq methods (Artetxe et al., 2018; Lample et al., 2018), so we initialize all decomposition models with the same pre-trained weights, regardless of training method (Seq2Seq or USeq2Seq). We warm-start our pre-training with the pre-trained, English Masked Language Model (MLM) from Lample and Conneau (2019), a 12-block decoder-only transformer model (Vaswani et al., 2017) trained to predict masked-out words on Toronto Books Corpus (Zhu et al., 2015)

and Wikipedia. We train the model with the MLM objective for one epoch on the augmented corpus

(2.4 M questions), while also training on decompositions formed via random retrieval from . For our pre-trained encoder-decoder, we initialize a 6-block encoder with the first 6 MLM blocks, and we initialize a 6-block decoder with the last 6 MLM blocks, randomly initializing the remaining weights as in Lample and Conneau (2019).

Seq2Seq

We fine-tune the pre-trained encoder-decoder using maximum likelihood. We stop training based on validation BLEU (Papineni et al., 2002) between generated decompositions and pseudo-decompositions.

USeq2Seq

We follow the approach by Lample and Conneau (2019) in unsupervised translation.333https://github.com/facebookresearch/XLM Training follows two stages: (1) MLM pre-training on the training corpora (described above), followed by (2) training simultaneously with denoising and back-translation objectives. For denoising, we produce a noisy input by randomly masking, dropping, and locally shuffling tokens in , and we train a model with parameters to maximize . We likewise maximize . For back-translation, we generate a multi-hop question for a decomposition , and we maximize . Similarly, we maximize . To stop training without supervision, we use a modified version of round-trip BLEU (Lample et al., 2018) (see Appendix §B.1 for details). We train with denoising and back-translation on smaller corpora of HotpotQA questions () and their pseudo-decompositions ().444Using the augmented corpora here did not improve QA.

3.3 Single-hop Question Answering Model

We train our single-hop QA model following prior work from Min et al. (2019b) on HotpotQA.555Our code is based on transformers (Wolf et al., 2019)

Model Architecture

We fine-tune a pre-trained model to take a question and several paragraphs and predicts the answer, similar to the single-hop QA model from Min et al. (2019a). The model computes a separate forward pass on each paragraph (with the question). For each paragraph, the model learns to predict the answer span if the paragraph contains the answer and to predict “no answer” otherwise. We treat yes and no predictions as spans within the passage (prepended to each paragraph), as in Nie et al. (2019) on HotpotQA. During inference, for the final softmax, we consider all paragraphs as a single chunk. Similar to Clark and Gardner (2018)

, we subtract a paragraph’s “no answer” logit from the logits of all spans in that paragraph, to reduce or increase span probabilities accordingly. In other words, we compute the probability

of each span in a paragraph using the predicted span logit and “no answer” paragraph logit as follows:

(3)

We use   (Liu et al., 2019) as our pre-trained initialization. Later, we also experiment with using the   ensemble from Min et al. (2019b).

Training Data and Ensembling

Similar to Min et al. (2019b), we train an ensemble of 2 single-hop QA models using data from SQuAD 2 and HotpotQA questions labeled as “easy” (single-hop). To ensemble, we average the logits of the two models before predicting the answer. SQuAD is a single-paragraph QA task, so we adapt SQuAD to the multi-paragraph setting by retrieving distractor paragraphs from Wikipedia for each question. We use the TFIDF retriever from DrQA (Chen et al., 2017) to retrieve 2 distractor paragraphs, which we add to the input for one model in the ensemble. We drop words from the question with a 5% probability to help the model handle any ill-formed sub-questions. We use the single-hop QA ensemble as a black-box model once trained, never training the model on multi-hop questions.

Returned Text

We have the single-hop QA model return the sentence containing the model’s predicted answer span, alongside the sub-questions. Later, we compare against alternatives, i.e., returning the predicted answer span without its context or not returning sub-questions.

3.4 Multi-hop Question Answering Model

Our multi-hop QA architecture is identical to the single-hop QA model, but the multi-hop QA model also uses sub-questions and sub-answers as input. We append each (sub-question, sub-answer) pair in order to the multi-hop question along with separator tokens. We train one multi-hop QA model on all of HotpotQA, also including SQuAD 2 examples used to train the single-hop QA model. Later, we experiment with using  and  instead of  as the multi-hop QA model. All reported error margins show the mean and std. dev. across 5 multi-hop QA training runs using the same decompositions.

4 Results on Question Answering

Decomp. Pseudo- HotpotQA F1
Method Decomps. Orig MultiHop OOD
✗ (1hop) 66.7 63.7 66.5
✗ (Baseline) 77.0.2 65.2.2 67.1.5
No Learn Random 78.4.2 70.9.2 70.7.4
FastText 78.9.2 72.4.1 72.0.1
Seq2Seq Random 77.7.2 69.4.3 70.0.7
FastText 78.9.2 73.1.2 73.0.3
USeq2Seq Random 79.8.1 76.0.2 76.5.2
FastText 80.1.2 76.2.1 77.1.1
DecompRC* 79.8.2 76.3.4 77.7.2
SAE (Tu et al., 2020) 80.2 61.1 62.6
HGN (Fang et al., 2019) 82.2 78.9 76.1
Ours SAE HGN
Test (EM/F1) 66.33/79.34 66.92/79.62 69.22/82.19
Table 1: Unsupervised decompositions significantly improve the F1 on HotpotQA over the baseline. We achieve comparable F1 to methods which use supporting fact supervision (). (*) We use supervised and heuristic decompositions from Min et al. (2019b). () Scores are approximate due to mismatched Wikipedia dumps.

We compare variants of our approach that use different learning methods and different pseudo-aligned training sets. As a baseline, we compare RoBERTa with decompositions to a RoBERTa model that does not use decompositions but is identical in all other respects. We train the baseline for 2 epochs, sweeping over batch size , learning rate , and weight decay

; we choose the hyperparameters that perform best on our dev set. We then use the best hyperparameters for the baseline to train our

RoBERTa models with decompositions.

We report results on 3 versions of the dev set: (1) the original version,666The test set is private, so we randomly halve the dev set to form validation and held-out dev sets. We will release our splits. (2) the multi-hop version from Jiang and Bansal (2019) which created some distractor paragraphs adversarially to test multi-hop reasoning, and (3) the out-of-domain version from Min et al. (2019b) which retrieved distractor paragraphs using the same procedure as the original version, but excluded paragraphs in the original version.

Main Results

Table 1 shows how unsupervised decompositions affect QA. Our RoBERTa baseline performs quite well on HotpotQA (77.0 F1), despite processing each paragraph separately, which prohibits inter-paragraph reasoning. The result is in line with prior work which found that a version of our baseline QA model using BERT (Devlin et al., 2019) does well on HotpotQA by exploiting single-hop reasoning shortcuts (Min et al., 2019a). We achieve significant gains over our strong baseline by leveraging decompositions from our best decomposition model, trained with USeq2Seq on FastText pseudo-decompositions; we find a 3.1 F1 gain on the original dev set, 11 F1 gain on the multi-hop dev set, and 10 F1 gain on the out-of-domain dev set. Unsupervised decompositions even match the performance of using (within our pipeline) supervised and heuristic decompositions from DecompRC (i.e., 80.1 vs. 79.8 F1 on the original dev set).

More generally, all decomposition methods improve QA over the baseline by leveraging the single-hop QA model (“1hop” in Table 1). Using FastText pseudo-decompositions as sub-questions directly improves QA over using random sub-questions on the multi-hop set (72.4 vs. 70.9 F1) and out-of-domain set (72.0 vs. 70.7 F1). USeq2Seq on random pseudo-decompositions also improves over the random sub-question baseline (e.g., 79.8 vs. 78.4 F1 on HotpotQA). However, we only find small improvements when training USeq2Seq on FastText vs. Random pseudo-decompositions (e.g., 77.1 vs. 76.5 F1 on the out-of-domain dev set).

The best decomposition methods learn with USeq2Seq. Using Seq2Seq to generate decompositions gives similar QA accuracy as the “No Learning” setup, e.g. both approaches achieve 78.9 F1 on the original dev set for FastText pseudo-decompositions. The results are similar perhaps since supervised learning is directly trained to place high probability on pseudo-decompositions.

777We also tried using the Seq2Seq model to initialize USeq2Seq. Seq2Seq initialization resulted in comparable or worse downstream QA accuracy, suggesting that pre-training on noisy decompositions did not help bootstrap USeq2Seq (see Appendix §A.3 for details). USeq2Seq may improve over Seq2Seq by learning to align hard questions and pseudo-decompositions while ignoring the noisy pairing.

After our experimentation, we chose USeq2Seq trained on FastText pseudo-decompositions as the final model, and we submitted the model for hidden test evaluation. Our approach achieved a test F1 of 79.34 and Exact Match (EM) of 66.33. Our approach is competitive with concurrent, state-of-the-art systems SAE (Tu et al., 2020) and HGN (Fang et al., 2019), which both (unlike our approach) learn from additional, strong supervision about which sentences are necessary to answer the question.

4.1 Question Type Breakdown

Decomps. Bridge Comp. Intersec. Single-hop
80.1.2 73.8.4 79.4.6 73.9.6
81.7.4 80.1.3 82.3.5 76.9.6
Table 2: F1 scores on 4 types of questions in HotpotQA. Unsupervised decompositions improves QA for all types.
SubQs SubAs QA F1
77.0.2
Sentence 80.1.2
Span 77.8.3
Random Entity 76.9.2
76.9.2
Sentence 80.2.1
Table 3: Ablation Study: QA model F1 when trained with different sub-answers: the sentence containing the predicted sub-answer, the predicted sub-answer span, and a random entity from the context. We also train QA models with (✓) or without (✗) sub-questions and sub-answers.

To understand where decompositions help, we break down QA performance across 4 question types from Min et al. (2019b). “Bridge” questions ask about an entity not explicitly mentioned in the question (“When was Erik Watts’ father born?”). “Intersection” questions ask to find an entity that satisfies multiple separate conditions (“Who was on CNBC and Fox News?”). “Comparison” questions ask to compare a property of two entities (“Which is taller, Momhil Sar or K2?”). “Single-hop” questions are likely answerable using single-hop shortcuts or single-paragraph reasoning (“Where is Electric Six from?”). We split the original dev set into the 4 types using the supervised type classifier from Min et al. (2019b). Table 2 shows F1 scores for RoBERTa with and without decompositions across the 4 types.

Unsupervised decompositions improve QA across all question types. Our single decomposition model generates useful sub-questions for all question types without special case handling, unlike earlier work from Min et al. (2019b) which handled each question type separately. For single-hop questions, our QA approach does not require falling back to a single-hop QA model and instead learns to leverage decompositions to better answer questions with single-hop shortcuts (76.9 vs. 73.9 F1 without decompositions).

4.2 Answers to Sub-Questions are Crucial

To measure the usefulness of sub-questions and sub-answers, we train the multi-hop QA model with various, ablated inputs, as shown in Table 3. Sub-answers are crucial to improving QA, as sub-questions with no answers or random answers do not help (76.9 vs. 77.0 F1 for the baseline). Only when sub-answers are provided do we see improved QA, with or without sub-questions (80.1 and 80.2 F1, respectively). It is important to provide the sentence containing the predicted answer span instead of the answer span alone (80.1 vs. 77.8 F1, respectively), though the answer span alone still improves over the baseline (77.0 F1).

4.3 How Do Decompositions Help?

Figure 3: Multi-hop QA is better when the single-hop QA model answers with the ground truth “supporting fact” sentences. We plot mean and std. across 5 random QA training runs.

Decompositions help to answer questions by retrieving important supporting evidence to answer questions. Fig. 3 shows that multi-hop QA accuracy increases when the sub-answer sentences are the “supporting facts” or sentences needed to answer the question, as annotated by HotpotQA. We retrieve supporting facts without learning to predict them with strong supervision, unlike many state-of-the-art models (Tu et al., 2020; Fang et al., 2019; Nie et al., 2019).

4.4 Example Decompositions

Q1: Are both Coldplay and Pierre Bouvier
       from the same country?
   SQ: Where are Coldplay and Coldplay from?
    Coldplay are a British rock band formed in 1996 by lead
       vocalist and keyboardist Chris Martin and lead guitarist
       Jonny Buckland at University College London (UCL).
   SQ: What country is Pierre Bouvier from?
    Pierre Charles Bouvier (born 9 May 1979) is a Canadian
       singer, songwriter, musician, composer and actor who is
       best known as the lead singer and guitarist of the rock
       band Simple Plan.
: No
Q2: How many copies of Roald Dahl’s variation on a popular anecdote sold?
   SQ: How many copies of Roald Dahl’s?
    His books have sold more than 250 million
       copies worldwide.
   SQ What is the name of the variation on a popular anecdote?
    “Mrs. Bixby and the Colonel’s Coat” is a short story by
       Roald Dahl that first appeared in the 1959 issue of Nugget.
: more than 250 million
Q3: Who is older, Annie Morton or Terry Richardson?
   SQ: Who is Annie Morton?
    Annie Morton (born October 8, 1970) is an
       American model born in Pennsylvania.
   SQ: When was Terry Richardson born?
    Kenton Terry Richardson (born 26 July 1999) is an English
       professional footballer who plays as a defender for
       League Two side Hartlepool United.
: Annie Morton
Table 4: Example sub-questions generated by our model, along with predicted sub-answer sentences (answer span underlined) and final predicted answer.

To illustrate how decompositions help QA, Table 4 shows example sub-questions from our best decomposition model with predicted sub-answers. Sub-questions are single-hop questions relevant to the multi-hop question. The single-hop QA model returns relevant sub-answers, sometimes in spite of grammatical errors (Q1, SQ) or under-specified questions (Q2, SQ). The multi-hop QA model then returns an answer consistent with the predicted sub-answers. The decomposition model is largely extractive, copying from the multi-hop question rather than hallucinating new entities, which helps generate relevant sub-questions. To better understand our system, we analyze the model for each stage: decomposition, single-hop QA, and multi-hop QA.

5 Analysis

5.1 Unsupervised Decomposition Model

Intrinsic Evaluation of Decompositions
Decomp. GPT2 % Well- Edit Length
Method NLL Formed Dist. Ratio
USeq2Seq 5.56 60.9 5.96 1.08
DecompRC 6.04 32.6 7.08 1.22
Table 5: Analysis of sub-questions produced by our method vs. the supervised+heuristic method of Min et al. (2019b). From left-to-right: Negative Log-Likelihood (NLL) according to GPT2 (lower is better), % Well-Formed according to a classifier, Edit Distance between decomposition and multi-hop question, and token-wise Length Ratio between decomposition and multi-hop question.

We evaluate the quality of decompositions on other metrics aside from downstream QA. To measure the fluency of decompositions, we compute the likelihood of decompositions using the pre-trained GPT-2 language model (Radford et al., 2019). We train a classifier on the question-wellformedness dataset of Faruqui and Das (2018)

, and we use the classifier to estimate the proportion of sub-questions that are well-formed. We measure how abstractive decompositions are by computing (i) the token Levenstein distance between the multi-hop question and its generated decomposition and (ii) the ratio between the length of the decomposition and the length of the multi-hop question. We compare our best decomposition model against the supervised+heuristic decompositions from

DecompRC (Min et al., 2019b) in Table 5.

Unsupervised decompositions are both more natural and well-formed than decompositions from DecompRC. Unsupervised decompositions are also closer in edit distance and length to the multi-hop question, consistent with our observation that our decomposition model is largely extractive.

Quality of Decomposition Model
Figure 4: Left: We decode from the decomposition model with beam search and use nth-ranked hypothesis as a question decomposition. We plot the F1 of a multi-hop QA model trained to use the nth-ranked decomposition. Right: Multi-hop QA is better when the single-hop QA model places high probability on its sub-answer.

Another way to test the quality of the decomposition model is to test if the model places higher probability on decompositions that are more helpful for downstream QA. We generate hypotheses from our best decomposition model using beam search, and we train a multi-hop QA model to use the th-ranked hypothesis as a question decomposition (Fig. 4, left). QA accuracy decreases as we use lower probability decompositions, but accuracy remains relatively robust, at most decreasing from 80.1 to 79.3 F1. The limited drop suggests that decompositions are still useful if they are among the model’s top hypotheses, another indication that our model is trained well for decomposition.

5.2 Single-hop Question Answering Model

Sub-Answer Confidence

Figure 4 (right) shows that the model’s sub-answer confidence correlates with downstream multi-hop QA performance for all HotpotQA dev sets. A low confidence sub-answer may be indicative of (i) an unanswerable or ill-formed sub-question or (ii) a sub-answer that is more likely to be incorrect. In both cases, the single-hop QA model is less likely to retrieve the useful supporting evidence to answer the multi-hop question.

Changing the Single-hop QA Model

We find that our approach is robust to the single-hop QA model that answers sub-questions. We use the   ensemble from Min et al. (2019b) as the single-hop QA model. The model performs much worse compared to our   single-hop ensemble when used directly on HotpotQA (56.3 vs. 66.7 F1). However, the model results in comparable QA when used to answer single-hop sub-questions within our larger system (79.9 vs. 80.1 F1 for our   ensemble).

5.3 Multi-hop Question Answering Model

Varying the Base Model

To understand how decompositions impact performance as the multi-hop QA model gets stronger, we vary the base pre-trained model. Table 6 shows the impact of adding decompositions to  ,  , and finally  (see Appendix §C.2 for hyperparameters). The gain from using decompositions grows with strength of the multi-hop QA model. Decompositions improve QA by 1.2 F1 for a   model, by 2.6 F1 for the stronger   model, and by 3.1 F1 for our best  model.

6 Related Work

Answering complicated questions has been a long-standing challenge in natural language processing. To this end, prior work has explored decomposing questions with supervision or heuristic algorithms. IBM Watson (Ferrucci et al., 2010) decomposes questions into sub-questions in multiple ways or not at all. DecompRC (Min et al., 2019b) largely frames sub-questions as extractive spans of a multi-hop question, learning to predict span-based sub-questions via supervised learning on human annotations. In other cases, DecompRC decomposes a multi-hop question using a heuristic algorithm, or DecompRC does not decompose at all. Watson and DecompRC use special case handling to decompose different questions, while our algorithm is fully automated and requires minimal hand-engineering.

More traditional, semantic parsing methods map questions to compositional programs, whose sub-programs can be viewed as question decompositions in a formal language (Talmor and Berant, 2018; Wolfson et al., 2020). Examples include classical QA systems like SHRDLU (Winograd, 1972) and LUNAR (Woods et al., 1974), as well as neural Seq2Seq semantic parsers (Dong and Lapata, 2016) and neural module networks (Andreas et al., 2015, 2016). Such methods usually require strong, program-level supervision to generate programs, as in visual QA (Johnson et al., 2017b) and on HotpotQA (Jiang and Bansal, 2019). Some models use other forms of strong supervision, e.g. predicting the “supporting evidence” to answer a question annotated by HotpotQA. Such an approach is taken by SAE (Tu et al., 2020) and HGN (Fang et al., 2019), whose methods may be combined with our approach.

Unsupervised decomposition complements strongly and weakly supervised decomposition approaches. Our unsupervised approach enables methods to leverage millions of otherwise unusable questions, similar to work on unsupervised QA (Lewis et al., 2019). When decomposition examples exist, supervised and unsupervised learning can be used in tandem to learn from both labeled and unlabeled examples. Such semi-supervised methods outperform supervised learning for tasks like machine translation (Sennrich et al., 2016)

. Other work on weakly supervised question generation uses a downstream QA model’s accuracy as a signal for learning to generate useful questions. Weakly supervised question generation often uses reinforcement learning 

(Nogueira and Cho, 2017; Wang and Lake, 2019; Strub et al., 2017; Das et al., 2017; Liang et al., 2018), where an unsupervised initialization can greatly mitigate the issues of exploring from scratch (Jaderberg et al., 2017).

Multi-hop QA Model QA F1 (w/o w/ Decomps.)
71.8.4 73.0.4
76.4.2 79.0.1
77.0.3 80.1.2
Table 6: Stronger QA models benefit more from decompositions.

7 Conclusion

We proposed an algorithm that decomposes questions without supervision, using 3 stages: (1) learning to decompose using pseudo-decompositions without supervision, (2) answering sub-questions with an off-the-shelf QA system, and (3) answering hard questions more accurately using sub-questions and their answers as additional input. When evaluated on HotpotQA, a standard benchmark for multi-hop QA, our approach significantly improved accuracy over an equivalent model that did not use decompositions. Our approach relies only on the final answer as supervision but works as effectively as state-of-the-art methods that rely on strong supervision, such as supporting fact labels or example decompositions. Qualitatively, we found that unsupervised decomposition resulted in fluent sub-questions whose answers often match the annotated supporting facts in HotpotQA

. Our unsupervised decompositions are largely extractive, which is effective for compositional, multi-hop questions but not all complex questions, showing room for future work. Overall, this work opens up exciting avenues for leveraging methods in unsupervised learning and natural language generation to improve the interpretability and generalization of machine learning systems.

Acknowledgements

EP is supported by the NSF Graduate Research Fellowship. KC is supported by Samsung Advanced Institute of Technology (Next Generation Deep Learning: from pattern recognition to AI) and Samsung Research (Improving Deep Learning using Latent Structure). KC also thanks eBay and NVIDIA for their support. We thank Paul Christiano, Sebastian Riedel, He He, Jonathan Berant, Alexis Conneau, Jiatao Gu, Sewon Min, Yixin Nie, Lajanugen Logeswaran, and Adam Fisch for helpful feedback, as well as Yichen Jiang and Peng Qi for help with evaluation.

References

  • J. Andreas, M. Rohrbach, T. Darrell, and D. Klein (2015) Neural module networks. CVPR, pp. 39–48. External Links: Link Cited by: §6.
  • J. Andreas, M. Rohrbach, T. Darrell, and D. Klein (2016)

    Learning to compose neural networks for question answering

    .
    In NAACL, San Diego, California, pp. 1545–1554. External Links: Link, Document Cited by: §6.
  • M. Artetxe, G. Labaka, E. Agirre, and K. Cho (2018)

    Unsupervised neural machine translation

    .
    In ICLR, External Links: Link Cited by: §3.2.3.
  • M. Artetxe and H. Schwenk (2019) Margin-based parallel corpus mining with multilingual sentence embeddings. In ACL, Florence, Italy, pp. 3197–3203. External Links: Link, Document Cited by: §3.2.1.
  • P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov (2017) Enriching word vectors with subword information. TACL 5, pp. 135–146. External Links: Link, Document Cited by: §3.2.2.
  • D. Chen, A. Fisch, J. Weston, and A. Bordes (2017) Reading Wikipedia to answer open-domain questions. In ACL, Vancouver, Canada, pp. 1870–1879. External Links: Link, Document Cited by: §3.3.
  • C. Clark and M. Gardner (2018) Simple and effective multi-paragraph reading comprehension. In ACL, Melbourne, Australia, pp. 845–855. External Links: Link, Document Cited by: §3.3.
  • A. Das, S. Kottur, J. M.F. Moura, S. Lee, and D. Batra (2017) Learning cooperative visual dialog agents with deep reinforcement learning. In ICCV, External Links: Link Cited by: §6.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL, pp. 4171–4186. External Links: Link, Document Cited by: §C.2, §4.
  • L. Dong and M. Lapata (2016) Language to logical form with neural attention. In ACL, Berlin, Germany, pp. 33–43. External Links: Link, Document Cited by: §6.
  • Y. Fang, S. Sun, Z. Gan, R. Pillai, S. Wang, and J. Liu (2019) Hierarchical graph network for multi-hop question answering. Vol. abs/1911.03631. External Links: 1911.03631, Link Cited by: Table 7, §1, §4, §4.3, Table 1, §6.
  • M. Faruqui and D. Das (2018) Identifying well-formed natural language questions. In EMNLP, Brussels, Belgium, pp. 798–803. External Links: Link, Document Cited by: §5.1.
  • D. Ferrucci, E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prager, N. Schlaefer, and C. Welty (2010) Building watson: an overview of the deepqa project. AI Magazine 31 (3), pp. 59–79. External Links: Link, Document Cited by: §6.
  • M. Honnibal and I. Montani (2017)

    spaCy 2: natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing

    .
    Note: To appear External Links: Link Cited by: §3.2.2.
  • D. A. Hudson and C. D. Manning (2019) GQA: a new dataset for real-world visual reasoning and compositional question answering. In CVPR, External Links: Link Cited by: §1.
  • M. Jaderberg, V. Mnih, W. M. Czarnecki, T. Schaul, J. Z. Leibo, D. Silver, and K. Kavukcuoglu (2017) Reinforcement learning with unsupervised auxiliary tasks. In ICLR, External Links: Link Cited by: §6.
  • Y. Jiang and M. Bansal (2019) Avoiding reasoning shortcuts: adversarial evaluation, training, and model development for multi-hop QA. In ACL, Florence, Italy, pp. 2726–2736. External Links: Link, Document Cited by: §1, §4.
  • Y. Jiang and M. Bansal (2019) Self-assembling modular networks for interpretable multi-hop reasoning. In EMNLP, Hong Kong, China. External Links: Link Cited by: §6.
  • J. Johnson, M. Douze, and H. Jégou (2017a) Billion-scale similarity search with gpus. CoRR abs/1702.08734. External Links: Link Cited by: §3.2.2.
  • J. Johnson, B. Hariharan, L. van der Maaten, J. Hoffman, L. Fei-Fei, C. L. Zitnick, and R. Girshick (2017b) Inferring and executing programs for visual reasoning. In ICCV, External Links: Link Cited by: §6.
  • A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov (2017) Bag of tricks for efficient text classification. In EACL, Valencia, Spain, pp. 427–431. External Links: Link Cited by: §3.2.1.
  • G. Lample, A. Conneau, L. Denoyer, and M. Ranzato (2018) Unsupervised machine translation using monolingual corpora only. In ICLR, External Links: Link Cited by: §B.1, §3.2.3, §3.2.3.
  • G. Lample and A. Conneau (2019) Cross-lingual language model pretraining. In NeurIPS, External Links: Link Cited by: §B.2, §B.2, §1, §3.2.3, §3.2.3.
  • P. Lewis, L. Denoyer, and S. Riedel (2019) Unsupervised question answering by cloze translation. In ACL, Florence, Italy, pp. 4896–4910. External Links: Link, Document Cited by: §3.2.1, §6.
  • C. Liang, M. Norouzi, J. Berant, Q. V. Le, and N. Lao (2018) Memory augmented policy optimization for program synthesis and semantic parsing. In NeurIPS, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 9994–10006. External Links: Link Cited by: §6.
  • Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov (2019) RoBERTa: a robustly optimized bert pretraining approach. CoRR abs/1907.11692. External Links: Link Cited by: §C.2, §3.3.
  • P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen, D. Garcia, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh, and H. Wu (2018) Mixed precision training. In ICLR, External Links: Link Cited by: §C.2.
  • S. Min, E. Wallace, S. Singh, M. Gardner, H. Hajishirzi, and L. Zettlemoyer (2019a) Compositional questions do not necessitate multi-hop reasoning. In ACL, Florence, Italy, pp. 4249–4257. External Links: Link, Document Cited by: §3.3, §4.
  • S. Min, V. Zhong, L. Zettlemoyer, and H. Hajishirzi (2019b) Multi-hop reading comprehension through question decomposition and rescoring. In ACL, Florence, Italy, pp. 6097–6109. External Links: Link, Document Cited by: §A.1, §B.1, §1, §1, §3.3, §3.3, §3.3, §4.1, §4.1, Table 1, §4, §5.1, §5.2, Table 5, §6.
  • Y. Nie, S. Wang, and M. Bansal (2019) Revealing the importance of semantic retrieval for machine reading at scale. In EMNLP, External Links: Link Cited by: §3.3, §4.3.
  • R. Nogueira and K. Cho (2017) Task-oriented query reformulation with reinforcement learning. In EMNLP, Copenhagen, Denmark, pp. 574–583. External Links: Link, Document Cited by: §6.
  • K. Papineni, S. Roukos, T. Ward, and W. Zhu (2002) Bleu: a method for automatic evaluation of machine translation. In ACL, Philadelphia, Pennsylvania, USA, pp. 311–318. External Links: Link, Document Cited by: §3.2.3.
  • A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever (2019) Language models are unsupervised multitask learners. External Links: Link Cited by: §5.1.
  • R. Sennrich, B. Haddow, and A. Birch (2016) Improving neural machine translation models with monolingual data. In ACL, Berlin, Germany, pp. 86–96. External Links: Link, Document Cited by: §6.
  • F. Strub, H. de Vries, J. Mary, B. Piot, A. Courville, and O. Pietquin (2017) End-to-end optimization of goal-driven and visually grounded dialogue systems. In IJCAI, pp. 2765–2771. External Links: Document, Link Cited by: §6.
  • A. Talmor and J. Berant (2018) The web as a knowledge-base for answering complex questions. In NAACL, New Orleans, Louisiana, pp. 641–651. External Links: Link, Document Cited by: §1, §6.
  • M. Tu, K. Huang, G. Wang, J. Huang, X. He, and B. Zhou (2020) Select, answer and explain: interpretable multi-hop reading comprehension over multiple documents. In AAAI, External Links: Link Cited by: Table 7, §1, §4, §4.3, Table 1, §6.
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. In NeurIPS, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 5998–6008. External Links: Link Cited by: §1, §3.2.3.
  • Z. Wang and B. M. Lake (2019) Modeling question asking using neural program generation. CoRR abs/1907.09899. External Links: Link, 1907.09899 Cited by: §6.
  • T. Winograd (1972) Understanding natural language. Academic Press, Inc., USA. External Links: ISBN 0127597506, Link Cited by: §6.
  • T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, and J. Brew (2019) HuggingFace’s transformers: state-of-the-art natural language processing. CoRR abs/1910.03771. External Links: Link Cited by: footnote 5.
  • T. Wolfson, M. Geva, A. Gupta, M. Gardner, Y. Goldberg, D. Deutch, and J. Berant (2020) Break it down: a question understanding benchmark. TACL. External Links: Link Cited by: §6.
  • W. Woods, R. Kaplan, and B. Nash-Webber (1974) The lunar sciences natural language information system. Final Report Technical Report 2378, Bolt, Beranek and Newman, Inc., Cambridge, MA. External Links: Link Cited by: §6.
  • H. Xu and P. Koehn (2017) Zipporah: a fast and scalable data cleaning system for noisy web-crawled parallel corpora. In EMNLP, Copenhagen, Denmark, pp. 2945–2950. External Links: Link, Document Cited by: §3.2.1.
  • Z. Yang, P. Qi, S. Zhang, Y. Bengio, W. Cohen, R. Salakhutdinov, and C. D. Manning (2018) HotpotQA: a dataset for diverse, explainable multi-hop question answering. In EMNLP, Brussels, Belgium, pp. 2369–2380. External Links: Link, Document Cited by: §1, §1, §3.1.
  • Y. Zhu, R. Kiros, R. Zemel, R. Salakhutdinov, R. Urtasun, A. Torralba, and S. Fidler (2015) Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In ICCV, ICCV ’15, USA, pp. 19–27. External Links: ISBN 9781467383912, Link, Document Cited by: §3.2.3.

Appendix A Pseudo-Decompositions

Tables 8-13 show examples of pseudo-decompositions and learned decompositions from various models.

a.1 Variable Length Pseudo-Decompositions

In §3.2.2, we leveraged domain knowledge about the task to fix the pseudo-decomposition length . A general algorithm for creating pseudo-decompositions should find a suitable for each question. We find that Eq. 1 in 2.1.1 always results in decompositions of length , as the regularization term grows quickly with . Thus, we test another formulation based on Euclidean distance:

(4)

We create pseudo-decompositions in an similar way as before, first finding a set of candidate sub-questions with high cosine similarity to , then performing beam search up to a maximum value of . We test pseudo-decomposition formulations by creating synthetic compositional questions by combining 2-3 single-hop questions with “and.” We then measure the ranking of the correct decomposition (a concatenation of the single-hop questions). For , both methods perform well, but Eq. 1 does not work for decompositions where , whereas Eq. 4 does, achieving a mean reciprocal rank of 30%. However, Eq. 1 outperforms Eq. 4 on HotpotQA, e.g., achieving 79.9 vs. 79.4 F1 when using the   ensemble from Min et al. (2019b) to answer sub-questions. Eq. 1 is also faster to compute and easier to scale. Moreover, Eq. 4 requires an embedding space where summing sub-question representations is meaningful, whereas Eq. 1 only requires embeddings that encode semantic similarity. Thus, we adopt Eq. 1 for our main experiments. Table  8 contains an example where the variable length decomposition method mentioned above produces a three-subquestion decomposition whereas the other methods are fixed to two subquestions.

a.2 Impact of Question Corpus Size

In addition to our previous results on FastText vs. Random pseudo-decompositions, we found it important to use a large question corpus to create pseudo-decompositions. QA F1 increased from 79.2 to 80.1 when we trained decomposition models on pseudo-decompositions comprised of questions retrieved from Common Crawl (10M questions) rather than only SQuAD 2 (130K questions), using an appropriately larger beam size (100 1000).

a.3 Pseudo-Decomposition Retrieval Method

Decomp. Pseudo- HotpotQA F1
Method Decomps. Dev Advers. OOD
✗ (1hop) 66.7 63.7 66.5
✗ (Baseline) 77.0.2 65.2.2 67.1.5
No Learn Random 78.4.2 70.9.2 70.7.4
BERT 78.9.4 71.5.3 71.5.2
TFIDF 79.2.3 72.2.3 72.0.5
FastText 78.9.2 72.4.1 72.0.1
Seq2Seq Random 77.7.2 69.4.3 70.0.7
BERT 79.1.3 72.6.3 73.1.3
TFIDF 79.2.1 73.0.3 72.9.3
FastText 78.9.2 73.1.2 73.0.3
CSeq2Seq Random 79.4.2 75.1.2 75.2.4
BERT 78.9.2 74.9.1 75.2.2
TFIDF 78.6.3 72.4.4 72.8.2
FastText 79.9.2 76.0.1 76.9.1
USeq2Seq Random 79.8.1 76.0.2 76.5.2
BERT 79.8.3 76.2.3 76.7.3
TFIDF 79.6.2 75.5.2 76.0.2
FastText 80.1.2 76.2.1 77.1.1
DecompRC 79.8.2 76.3.4 77.7.2
SAE (Tu et al., 2020) 80.2 61.1 62.6
HGN (Fang et al., 2019) 82.2 78.9 76.1
Table 7: QA F1 scores for all combinations of learning methods and pseudo-decomposition retrieval methods that we tried.

Table 7 shows QA results with pseudo-decompositions retrieved using sum-bag-of-word representations from FastText, TFIDF,   first layer hidden states. We also vary the learning method and include results Curriculum Seq2Seq (CSeq2Seq), where we initialize the USeq2Seq approach with the Seq2Seq model trained on the same data.

Appendix B Unsupervised Decomposition Model

b.1 Unsupervised Stopping Criterion

To stop USeq2Seq training, we use an unsupervised stopping criterion to avoid relying on a supervised validation set of decompositions. We generate a decomposition for a multi-hop question , and we measure BLEU between and the model-generated question for , similar to round-trip BLEU in unsupervised translation (Lample et al., 2018). We scale round-trip BLEU score by the fraction of “good” decompositions, where a good decomposition has (1) 2 sub-questions (question marks), (2) no sub-question which contains all words in the multi-hop question, and (3) no sub-question longer than the multi-hop question. Without scaling, decomposition models achieve perfect round-trip BLEU by copying the multi-hop question as the decomposition. We measure scaled BLEU across multi-hop questions in HotpotQA dev, and we stop training when the metric does not increase for 3 consecutive epochs.

It is possible to stop training the decomposition model based on downstream QA accuracy. However, training a QA model on each decomposition model checkpoint (1) is computationally expensive and (2) ties decompositions to a specific, downstream QA model. In Figure 5, we show downstream QA results across various USeq2Seq checkpoints when using the   single-hop QA ensemble from Min et al. (2019b). The unsupervised stopping criterion does not significantly hurt downstream QA compared to using a weakly-supervised stopping criterion.

Figure 5: How multi-hop QA accuracy varies over the course of decomposition model training, for one training run of USeq2Seq on FastText pseudo-decompositions. Our unsupervised stopping criterion selects the epoch 3 checkpoint, which performs roughly as well as the best checkpoint (epoch 5).

b.2 Training Hyperparameters

MLM Pre-training

We pre-train our encoder-decoder distributed across 8 DGX-1 machines, each with 8, 32GB NVIDIA V100 GPUs interconnected by Infiniband. We pre-train using the largest possible batch size (), and we choose the best learning rate () based on training loss after a small number of iterations. We chose a maximum sequence length of . We keep other hyperparameters identical to those from Lample and Conneau (2019) used in unsupervised translation.

USeq2Seq

We train each decomposition model with distributed training across 8, 32GB NVIDIA V100 GPUs. We chose the largest possible batch size () and then the largest learning rate which resulted in stable training (). Other hyperparameters are the same as Lample and Conneau (2019).

Seq2Seq

We use a large batch size () and chose the largest learning rate which resulted in stable training across many pseudo-decomposition training corpora (). We keep other training settings and hyperparameters the same as for USeq2Seq.

Appendix C Multi-hop QA Model

c.1 Varying the Number of Training Examples

Figure 6: Performance of downstream, multi-hop QA model, with and without decompositions, when varying the amount of training data. We also assess the impact of removing single-hop training data (SQuAD 2.0 and HotpotQA“easy” questions).

To understand how decompositions impact performance given different amounts of training data, we vary the number of multi-hop training examples. We use the “medium” and “hard” level labels in HotpotQA to determine which examples are multi-hop. We consider training setups where the multi-hop QA model does or does not use data augmentation via training on hotpot “easy”/single-hop questions and SQuAD 2 questions. Fig. 6 shows the results. Decompositions improve QA, so long as the multi-hop QA model has enough training data, either via single-hop QA examples or enough multi-hop QA examples.

c.2 Training Hyperparameters

To train  , we fix the number of training epochs to 2, as training longer did not help. We sweep over batch size , learning rate , and weight decay , similar to the ranges used in the original paper (Liu et al., 2019). We chose the hyperparameters that did best for the baseline QA model (without decompositions) on our validation set: batch size , learning rate , and weight decay . Similarly, for the experiments with BERT, we fix the number of epochs to 2 and choose hyperparameters by sweeping over the recommended ranges from Devlin et al. (2019) for learning rate () and batch size (). For  , we thus choose learning rate and batch size , and for  , we use the whole-word masking model with learning rate and batch size

. We train all QA models with mixed precision floating point arithmetic 

(Micikevicius et al., 2018), distributing training across 8, 32GB NVIDIA V100 GPUs.

c.3 Improvements across Detailed Question Types

To better understand where decompositions improve QA, we show the improvement across various fine-grained splits of the evaluation sets in Figures 7-11.

Figure 7: Performance difference for various answer entity types when the QA model does vs. does not use decompositions. We see the largest, consistent gains for entity-centric answers.
Figure 8: Performance difference for bridge and comparison questions when the QA model does vs. does not use decompositions. Here, we use the original bridge/comparison splits from HotpotQA, which does not have a one-hop category and categorizes intersection questions as bridge. For the original dev set, the improvement with decompositions is greater for comparison questions than bridge questions. The multi-hop set does not alter comparison questions from the original version, so these scores do not change much.
Figure 9: Performance difference for yes/no and span answer types for comparison questions when the QA model does vs. does not use decompositions. Decompositions are roughly as helpful for yes/no questions as for span-based questions.
Figure 10: Performance difference for various multi-hop “wh”-words when the QA model does vs. does not use decompositions. Improvements by question word vary across dev sets.
Figure 11: Performance difference between a model when the QA model does vs. does not use decompositions, stratified by whether the gold final answer is in a sub-answer sentence. We find a larger improvement over the baseline when the gold answer contained in a sub-answer sentence.
Q: What is the name of the singer who’s song was released as the lead single from the album ”Confessions”, and that had popular song stuck behind for eight consecutive weeks?
USeq2Seq + Random Sub-Q1 what is the name of the singer who ’s song was released as the lead single from the album ” confessions?
Sub-A1 is a song by American singer Usher. He co-wrote the song with Sean Garrett, Patrick J. Que Smith, Robert McDowell, LRoc, Ludacris, Da’Ron, and Lil Jon.
Sub-Q2 why did popular song stuck behind for eight consecutive weeks?
Sub-A2 It reached number 1 in the United Kingdom and in Netherlands, and number 2 in the United States, stuck for eight consecutive weeks behind the Usher singles ”Yeah” and ”Burn”, respectively.
USeq2Seq + FastText Sub-Q1 what is the name of the singer?
Sub-A1 ”No One” is a song by American singer and songwriter Alicia Keys from her third studio album, ”As I Am” (2007).
Sub-Q2 what song was released as the lead single from the album ” confessions , ” and that had popular song stuck behind for eight consecutive weeks?
Sub-A2 ”Yeah!” is a song by American singer Usher.
Seq2Seq + Random Sub-Q1 what is the maximum number of students allowed to take part in the program?
Sub-A1 ”No One” is a song by American singer and songwriter Alicia Keys from her third studio album, ”As I Am” (2007).
Sub-Q2 what is the maximum number of students allowed per year?
Sub-A2 ”No One” is a song by American singer and songwriter Alicia Keys from her third studio album, ”As I Am” (2007).
Seq2Seq + FastText Sub-Q1 what is the name of the lead singer of the group?
Sub-A1 Billy Ray Cyrus is an American country music singer, songwriter, actor and philanthropist, who helped make country music a worldwide phenomenon.
Sub-Q2 when was the last time you heard a song called ” on and on ” by a female vocalist?
Sub-A2 ”No One” is a song by American singer and songwriter Alicia Keys from her third studio album, ”As I Am” (2007). It was written and produced by Keys, Kerry Brothers Jr., and George M. Harry, and released as the album’s lead single on September 11, 2007.
No Learn + Random Sub-Q1 what was sold to foreign ptts?
Sub-A1 ”No One” is a song by American singer and songwriter Alicia Keys from her third studio album, ”As I Am” (2007).
Sub-Q2 what tends to lead to less money?
Sub-A2 ”No One” is a song by American singer and songwriter Alicia Keys from her third studio album, ”As I Am” (2007).
No Learn + FastText Sub-Q1 which single was released as the album ’s lead single?
Sub-A1 ”No One” is a song by American singer and songwriter Alicia Keys from her third studio album, ”As I Am” (2007).
Sub-Q2 who did the song writing , what are some of the stories behind the songs?
Sub-A2 It was written and produced by Keys, Kerry Brothers Jr., and George M. Harry, and released as the album’s lead single on September 11, 2007.
DecompRC Sub-Q1 which singer?
Sub-A1 ”No One” is a song by American singer and songwriter Alicia Keys from her third studio album, ”As I Am” (2007).
Sub-Q2 what is the name of alicia keys ’s song was released as the lead single from the album ”confessions”, and that had popular song stuck behind for eight consecutive weeks?
Sub-A2 It reached number 1 in the United Kingdom and in Netherlands, and number 2 in the United States, stuck for eight consecutive weeks behind the Usher singles ”Yeah” and ”Burn”, respectively.
Variable USeq2Seq + FastText Sub-Q1 what is the name of the singer?
Sub-A1 is a song by american singer usher. he co-wrote the song with sean garrett, patrick j. que smith, robert mcdowell, lroc, ludacris, da’ron, and lil jon.
Sub-Q2 who ’ s the song was released as the lead single from the album?
Sub-A2 ”no one” is a song by american singer and songwriter alicia keys from her third studio album, ”as i am” (2007).
Sub-Q3 what popular song was stuck behind for eight consecutive weeks?
Sub-A3 it reached number 1 in the united kingdom and in netherlands, and number 2 in the united states, stuck for eight consecutive weeks behind the usher singles ”yeah” and ”burn”, respectively.
Table 8: Various decomposition methods for the question “What is the name of the singer who’s song was released as the lead single from the album “Confessions,” and that had popular song stuck behind for eight consecutive weeks?” Here, the Variable USeq2Seq model has decomposed the question into three subquestions rather than two.
Q: Are both Coldplay and Pierre Bouvier from the same country?
USeq2Seq + Random Sub-Q1 why are both coldplay and pierre bouvier from the same country?
Sub-A1 Coldplay are a British rock band formed in 1996 by lead vocalist and keyboardist Chris Martin and lead guitarist Jonny Buckland at University College London (UCL).
Sub-Q2 what is the purpose of a speech?
Sub-A2 Pierre Charles Bouvier (born 9 May 1979) is a Canadian singer, songwriter, musician, composer and actor who is best known as the lead singer and guitarist of the rock band Simple Plan.
USeq2Seq + FastText Sub-Q1 where are coldplay and coldplay from?
Sub-A1 Coldplay are a British rock band formed in 1996 by lead vocalist and keyboardist Chris Martin and lead guitarist Jonny Buckland at University College London (UCL).
Sub-Q2 what country is pierre bouvier from?
Sub-A2 Pierre Charles Bouvier (born 9 May 1979) is a Canadian singer, songwriter, musician, composer and actor who is best known as the lead singer and guitarist of the rock band Simple Plan.
Seq2Seq + Random Sub-Q1 what is the maximum number of students allowed to take part in the program?
Sub-A1 Coldplay are a British rock band formed in 1996 by lead vocalist and keyboardist Chris Martin and lead guitarist Jonny Buckland at University College London (UCL).
Sub-Q2 what is the maximum number of students allowed to take part in the course of the course of the course of the course of the course of the course of the course of the course of the course of the course of the course of the course of the course of the course of?
Sub-A2 Coldplay are a British rock band formed in 1996 by lead vocalist and keyboardist Chris Martin and lead guitarist Jonny Buckland at University College London (UCL).
Seq2Seq + FastText Sub-Q1 who are similar musical artists to pierre bouvier?
Sub-A1 Coldplay are a British rock band formed in 1996 by lead vocalist and keyboardist Chris Martin and lead guitarist Jonny Buckland at University College London (UCL).
Sub-Q2 in the same year , pierre bouvier , pierre bouvier , pierre bouvier and pierre bouvier?
Sub-A2 Pierre Charles Bouvier (born 9 May 1979) is a Canadian singer, songwriter, musician, composer and actor who is best known as the lead singer and guitarist of the rock band Simple Plan.
No Learn + Random Sub-Q1 in what year did fermat declare fermat ’s little theorem?
Sub-A1 Coldplay are a British rock band formed in 1996 by lead vocalist and keyboardist Chris Martin and lead guitarist Jonny Buckland at University College London (UCL).
Sub-Q2 when did the united states withdraw from the bretton woods accord?
Sub-A2 It first aired on MTV on March 6, 2005. The last episode was broadcast on April 24, 2005.
No Learn + FastText Sub-Q1 what do a handful of couples have in common with coldplay , coldplay , and pierre bouvier?
Sub-A1 Coldplay are a British rock band formed in 1996 by lead vocalist and keyboardist Chris Martin and lead guitarist Jonny Buckland at University College London (UCL).
Sub-Q2 where are the french alps?
Sub-A2 St Pierre is a former parish and hamlet in Monmouthshire, south east Wales, 3 mi south west of Chepstow and adjacent to the Severn estuary.
DecompRC Sub-Q1 is coldplay from which country?
Sub-A1 Coldplay are a British rock band formed in 1996 by lead vocalist and keyboardist Chris Martin and lead guitarist Jonny Buckland at University College London (UCL).
Sub-Q2 is pierre bouvier from which country?
Sub-A2 Pierre Charles Bouvier (born 9 May 1979) is a Canadian singer, songwriter, musician, composer and actor who is best known as the lead singer and guitarist of the rock band Simple Plan.
Variable USeq2Seq + FastText Sub-Q1 who are similar musical artists to coldplay?
Sub-A1 pierre charles bouvier (born 9 may 1979) is a canadian singer, songwriter, musician, composer and actor who is best known as the lead singer and guitarist of the rock band simple plan.
Sub-Q2 where is pierre bouvier from?
Sub-A2 pierre charles bouvier (born 9 may 1979) is a canadian singer, songwriter, musician, composer and actor who is best known as the lead singer and guitarist of the rock band simple plan.
Table 9: Various decomposition methods for the question “Are both Coldplay and Pierre Bouvier from the same country?”
Q: Who is older, Annie Morton or Terry Richardson?
USeq2Seq + Random Sub-Q1 who is older , annie morton?
Sub-A1 Annie Morton (born October 8, 1970) is an American model born in Pennsylvania.
Sub-Q2 who is terry richardson?
Sub-A2 Terrence ”Uncle Terry” Richardson (born August 14, 1965) is an American fashion and portrait photographer who has shot advertising campaigns for Marc Jacobs, Aldo, Supreme, Sisley, Tom Ford, and Yves Saint Laurent among others.
USeq2Seq + FastText Sub-Q1 who is annie morton?
Sub-A1 Annie Morton (born October 8, 1970) is an American model born in Pennsylvania.
Sub-Q2 when was terry richardson born?
Sub-A2 Kenton Terry Richardson (born 26 July 1999) is an English professional footballer who plays as a defender for League Two side Hartlepool United.
Seq2Seq + Random Sub-Q1 what is the maximum number of students allowed to take part in the program?
Sub-A1 Kenton Terry Richardson (born 26 July 1999) is an English professional footballer who plays as a defender for League Two side Hartlepool United.
Sub-Q2 what is the maximum number of students allowed to take part in the program?
Sub-A2 Kenton Terry Richardson (born 26 July 1999) is an English professional footballer who plays as a defender for League Two side Hartlepool United.
Seq2Seq + FastText Sub-Q1 who is terry morton?
Sub-A1 Terrence ”Uncle Terry” Richardson (born August 14, 1965) is an American fashion and portrait photographer who has shot advertising campaigns for Marc Jacobs, Aldo, Supreme, Sisley, Tom Ford, and Yves Saint Laurent among others.
Sub-Q2 who is terry morton?
Sub-A2 Terrence ”Uncle Terry” Richardson (born August 14, 1965) is an American fashion and portrait photographer who has shot advertising campaigns for Marc Jacobs, Aldo, Supreme, Sisley, Tom Ford, and Yves Saint Laurent among others.
No Learn + Random Sub-Q1 what did decnet phase i become?
Sub-A1 Snoecks is a Belgian magazine. The huge, 550-plus-page magazine appears once a year in October and focuses on the most interesting new international developments in the arts, photography and literature.
Sub-Q2 what group can amend the victorian constitution?
Sub-A2 Kenton Terry Richardson (born 26 July 1999) is an English professional footballer who plays as a defender for League Two side Hartlepool United.
No Learn + FastText Sub-Q1 who was terry richardson?
Sub-A1 Terrence ”Uncle Terry” Richardson (born August 14, 1965) is an American fashion and portrait photographer who has shot advertising campaigns for Marc Jacobs, Aldo, Supreme, Sisley, Tom Ford, and Yves Saint Laurent among others.
Sub-Q2 who changes positions more , annie morton , cotton hill , or terry richardson?
Sub-A2 Annie Morton (born October 8, 1970) is an American model born in Pennsylvania.
DecompRC Sub-Q1 annie morton is born when?
Sub-A1 Annie Morton (born October 8, 1970) is an American model born in Pennsylvania.
Sub-Q2 terry richardson is born when?
Sub-A2 Terrence ”Uncle Terry” Richardson (born August 14, 1965) is an American fashion and portrait photographer who has shot advertising campaigns for Marc Jacobs, Aldo, Supreme, Sisley, Tom Ford, and Yves Saint Laurent among others.
Variable USeq2Seq + FastText Sub-Q1 who is annie morton?
Sub-A1 annie morton (born october 8, 1970) is an american model born in pennsylvania.
Sub-Q2 who was terry morton?
Sub-A2 madonna is a biography by english author andrew morton, chronicling the life of american recording artist madonna.
Table 10: Various decomposition methods for the question “Who is older, Annie Morton or Terry Richardson?”
Q: In which year was the King who made the 1925 Birthday Honours born?
USeq2Seq + Random Sub-Q1 in which year was the king made?
Sub-A1 George V (George Frederick Ernest Albert; 3 June 1865 – 20 January 1936) was King of the United Kingdom and the British Dominions, and Emperor of India, from 6 May 1910 until his death in 1936.
Sub-Q2 who made the 1925 birthday honours?
Sub-A2 The 1925 Birthday Honours were appointments by King George V to various orders and honours to reward and highlight good works by citizens of the British Empire.
USeq2Seq + FastText Sub-Q1 in which year was the king born?
Sub-A1 George V (George Frederick Ernest Albert; 3 June 1865 – 20 January 1936) was King of the United Kingdom and the British Dominions, and Emperor of India, from 6 May 1910 until his death in 1936.
Sub-Q2 who made the 1925 birthday honours?
Sub-A2 The 1925 Birthday Honours were appointments by King George V to various orders and honours to reward and highlight good works by citizens of the British Empire.
Seq2Seq + Random Sub-Q1 what is the maximum number of students allowed to take part in the program?
Sub-A1 George V (George Frederick Ernest Albert; 3 June 1865 – 20 January 1936) was King of the United Kingdom and the British Dominions, and Emperor of India, from 6 May 1910 until his death in 1936.
Sub-Q2 what is the maximum number of students allowed to take part in the course of the course of the course of the course of the course of the course of the course of the course of the course of the course of the course of the course of the course?
Sub-A2 George V (George Frederick Ernest Albert; 3 June 1865 – 20 January 1936) was King of the United Kingdom and the British Dominions, and Emperor of India, from 6 May 1910 until his death in 1936.
Seq2Seq + FastText Sub-Q1 who was born in 1925?
Sub-A1 George V (George Frederick Ernest Albert; 3 June 1865 – 20 January 1936) was King of the United Kingdom and the British Dominions, and Emperor of India, from 6 May 1910 until his death in 1936.
Sub-Q2 in which year was the king born?
Sub-A2 George V (George Frederick Ernest Albert; 3 June 1865 – 20 January 1936) was King of the United Kingdom and the British Dominions, and Emperor of India, from 6 May 1910 until his death in 1936.
No Learn + Random Sub-Q1 what did telecom australia start?
Sub-A1 George V (George Frederick Ernest Albert; 3 June 1865 – 20 January 1936) was King of the United Kingdom and the British Dominions, and Emperor of India, from 6 May 1910 until his death in 1936.
Sub-Q2 what cells are not eliminated by the immune system?
Sub-A2 George V (George Frederick Ernest Albert; 3 June 1865 – 20 January 1936) was King of the United Kingdom and the British Dominions, and Emperor of India, from 6 May 1910 until his death in 1936.
No Learn + FastText Sub-Q1 in the new year honours list , who was awarded the mbe for services to hockey?
Sub-A1 George V (George Frederick Ernest Albert; 3 June 1865 – 20 January 1936) was King of the United Kingdom and the British Dominions, and Emperor of India, from 6 May 1910 until his death in 1936.
Sub-Q2 in 1925 when she was born?
Sub-A2 George V (George Frederick Ernest Albert; 3 June 1865 – 20 January 1936) was King of the United Kingdom and the British Dominions, and Emperor of India, from 6 May 1910 until his death in 1936.
DecompRC Sub-Q1 which king who made the 1925 birthday honours?
Sub-A1 The 1925 Birthday Honours were appointments by King George V to various orders and honours to reward and highlight good works by citizens of the British Empire.
Sub-Q2 in which year was george v born?
Sub-A2 George V (George Frederick Ernest Albert; 3 June 1865 – 20 January 1936) was King of the United Kingdom and the British Dominions, and Emperor of India, from 6 May 1910 until his death in 1936.
Variable USeq2Seq + FastText Sub-Q1 in which year was the king made?
Sub-A1 george v (george frederick ernest albert; 3 june 1865 – 20 january 1936) was king of the united kingdom and the british dominions, and emperor of india, from 6 may 1910 until his death in 1936.
Sub-Q2 who made the 1925 birthday honours?
Sub-A2 george v (george frederick ernest albert; 3 june 1865 – 20 january 1936) was king of the united kingdom and the british dominions, and emperor of india, from 6 may 1910 until his death in 1936.
Table 11: Various decomposition methods for the question “In which year was the King who made the 1925 Birthday Honours born?”
Q: Where are Teide National Park and Garajonay National Park located?
USeq2Seq + Random Sub-Q1 where are teide national park?
Sub-A1 Teide National Park (Spanish: ”Parque nacional del Teide”) is a national park located in Tenerife (Canary Islands, Spain).
Sub-Q2 what is garajonay national park?
Sub-A2 Garajonay National Park (Spanish: ”Parque nacional de Garajonay”) is located in the center and north of the island of La Gomera, one of the Canary Islands (Spain). It was declared a national park in 1981 and a World Heritage Site by UNESCO in 1986.
USeq2Seq + FastText Sub-Q1 where are teide national park?
Sub-A1 Teide National Park (Spanish: ”Parque nacional del Teide”) is a national park located in Tenerife (Canary Islands, Spain).
Sub-Q2 where is garajonay national park?
Sub-A2 Garajonay National Park (Spanish: ”Parque nacional de Garajonay”) is located in the center and north of the island of La Gomera, one of the Canary Islands (Spain). It was declared a national park in 1981 and a World Heritage Site by UNESCO in 1986.
Seq2Seq + Random Sub-Q1 what is the maximum number of students allowed to take part in the program?
Sub-A1 Teide National Park (Spanish: ”Parque nacional del Teide”) is a national park located in Tenerife (Canary Islands, Spain).
Sub-Q2 what is the maximum number of students allowed to take part in the course of the course of the course of the course of the course of the course of the course of the course of the course of the course of the course of the course of the course?
Sub-A2 It occupies 40 km (15 sq mi) and it extends into each of the six municipalities on the island.
Seq2Seq + FastText Sub-Q1 where is garajonay national park located?
Sub-A1 Garajonay National Park (Spanish: ”Parque nacional de Garajonay”) is located in the center and north of the island of La Gomera, one of the Canary Islands (Spain). It was declared a national park in 1981 and a World Heritage Site by UNESCO in 1986.
Sub-Q2 the national park of galicia national park?
Sub-A2 Teide National Park (Spanish: ”Parque nacional del Teide”) is a national park located in Tenerife (Canary Islands, Spain).
No Learn + Random Sub-Q1 what was the australian public x.75 network operated by telstra?
Sub-A1 Teide National Park (Spanish: ”Parque nacional del Teide”) is a national park located in Tenerife (Canary Islands, Spain).
Sub-Q2 when were theories developed suggesting inequality may have some positive effect on economic development?
Sub-A2 Teide National Park (Spanish: ”Parque nacional del Teide”) is a national park located in Tenerife (Canary Islands, Spain).
No Learn + FastText Sub-Q1 where is garajonay national park?
Sub-A1 Garajonay National Park (Spanish: ”Parque nacional de Garajonay”) is located in the center and north of the island of La Gomera, one of the Canary Islands (Spain). It was declared a national park in 1981 and a World Heritage Site by UNESCO in 1986.
Sub-Q2 how to visit the teide observatory in tenerife in what languages are guided tours of the teide observatory held?
Sub-A2 Teide National Park (Spanish: ”Parque nacional del Teide”) is a national park located in Tenerife (Canary Islands, Spain).
DecompRC Sub-Q1 where is teide national park located?
Sub-A1 Teide National Park (Spanish: ”Parque nacional del Teide”) is a national park located in Tenerife (Canary Islands, Spain).
Sub-Q2 where is garajonay national park located?
Sub-A2 Garajonay National Park (Spanish: ”Parque nacional de Garajonay”) is located in the center and north of the island of La Gomera, one of the Canary Islands (Spain). It was declared a national park in 1981 and a World Heritage Site by UNESCO in 1986.
Variable USeq2Seq + FastText Sub-Q1 where are teide national park?
Sub-A1 teide national park (spanish: ”parque nacional del teide”) is a national park located in tenerife (canary islands, spain).
Sub-Q2 where is garajonay national park?
Sub-A2 garajonay national park (spanish: ”parque nacional de garajonay”) is located in the center and north of the island of la gomera, one of the canary islands (spain).
Table 12: Various decomposition methods for the question “Where are Teide National Park and Garajonay National Park located?
Q: Since 2 June 2017, The Leader of Fine Gael had been held by which Irish Fine Gael politician who has served as Taoiseach and Minister for Defence?
USeq2Seq + Random Sub-Q1 how long since 2 june 2017 , the leader of fine gael had been in office?
Sub-A1 Enda Patrick Kenny (born 24 April 1951) is an Irish Fine Gael politician who served as Taoiseach from 2011 to 2017, Leader of Fine Gael from 2002 to 2017, Minister for Defence from May 2014 to July 2014 and 2016 to 2017, Leader of the Opposition from 2002 to 2011, Minister for Tourism and Trade from 1994 to 1997 and Minister of State for Youth Affairs from 1986 to 1987.
Sub-Q2 which irish fine gael politician has served as taoiseach and minister for defence?
Sub-A2 Leo Eric Varadkar ( ; Irish: ”Leo de Varad” ;born 18 January 1979) is an Irish Fine Gael politician who has served as Taoiseach, Minister for Defence and Leader of Fine Gael since June 2017.
USeq2Seq + FastText Sub-Q1 since 2 june 2017 , the leader of fine gael had been?
Sub-A1 Since 2 June 2017, the office had been held by Leo Varadkar following the resignation of Enda Kenny.
Sub-Q2 which irish fine gael politician has served as taoiseach and minister for defence?
Sub-A2 Leo Eric Varadkar ( ; Irish: ”Leo de Varad” ;born 18 January 1979) is an Irish Fine Gael politician who has served as Taoiseach, Minister for Defence and Leader of Fine Gael since June 2017.
Seq2Seq + Random Sub-Q1 what is the maximum number of students allowed to take part in the program?
Sub-A1 Leo Eric Varadkar ( ; Irish: ”Leo de Varad” ;born 18 January 1979) is an Irish Fine Gael politician who has served as Taoiseach, Minister for Defence and Leader of Fine Gael since June 2017.
Sub-Q2 what is the maximum number of students allowed per year?
Sub-A2 The Leader of Fine Gael is the most senior politician within the Fine Gael political party in Ireland. Since 2 June 2017, the office had been held by Leo Varadkar following the resignation of Enda Kenny.
Seq2Seq + FastText Sub-Q1 who has been appointed as the new deputy leader of fine gael since 2 june 2017?
Sub-A1 Simon Anthony Coveney (born 16 June 1972) is an Irish Fine Gael politician who has served as Minister for Foreign Affairs and Trade and Deputy Leader of Fine Gael since June 2017.
Sub-Q2 the fine gael fine gael , the fine gael of fine gael?
Sub-A2 Leo Eric Varadkar ( ; Irish: ”Leo de Varad” ;born 18 January 1979) is an Irish Fine Gael politician who has served as Taoiseach, Minister for Defence and Leader of Fine Gael since June 2017.
No Learn + Random Sub-Q1 what was considered to be a major milestone?
Sub-A1 The 2017 Fine Gael leadership election was triggered in May 2017, when Enda Kenny resigned as party leader.
Sub-Q2 what was the air force not interested in for their message system?
Sub-A2 The 2017 Fine Gael leadership election was triggered in May 2017, when Enda Kenny resigned as party leader.
No Learn + FastText Sub-Q1 what if fine gael did support fine gael after the next election?
Sub-A1 With Fine Gael being the governing party at the time, this election effectively appointed a new Taoiseach for Ireland.
Sub-Q2 who has been appointed as defence minister of india?
Sub-A2 Leo Eric Varadkar ( ; Irish: ”Leo de Varad” ;born 18 January 1979) is an Irish Fine Gael politician who has served as Taoiseach, Minister for Defence and Leader of Fine Gael since June 2017.
DecompRC Sub-Q1 which leader of fine gael?
Sub-A1 Since 2 June 2017, the office had been held by Leo Varadkar following the resignation of Enda Kenny.
Sub-Q2 since 2 june 2017 enda patrick kenny had been held by which irish fine gael politician who has served as taoiseach and minister for defence?
Sub-A2 Leo Eric Varadkar ( ; Irish: ”Leo de Varad” ;born 18 January 1979) is an Irish Fine Gael politician who has served as Taoiseach, Minister for Defence and Leader of Fine Gael since June 2017.
Variable USeq2Seq + FastText Sub-Q1 since 2 june 2017 , the leader of fine gael had been held by?
Sub-A1 since 2 june 2017, the office had been held by leo varadkar following the resignation of enda kenny.
Sub-Q2 which irish fine gael politician has served as taoiseach and minister for defence?
Sub-A2 enda patrick kenny (born 24 april 1951) is an irish fine gael politician who served as taoiseach from 2011 to 2017, leader of fine gael from 2002 to 2017, minister for defence from may 2014 to july 2014 and 2016 to 2017, leader of the opposition from 2002 to 2011, minister for tourism and trade from 1994 to 1997 and minister of state for youth affairs from 1986 to 1987. he has been a teachta dála (td) since 1975, currently for the mayo constituency.
Table 13: Various decomposition methods for the question “Since 2 June 2017, The Leader of Fine Gael had been held by which Irish Fine Gael politician who has served as Taoiseach and Minister for Defence?”