A Split-and-Recombine Approach for Follow-up Query Analysis

09/19/2019 ∙ by Qian Liu, et al. ∙ Beihang University Microsoft 5

Context-dependent semantic parsing has proven to be an important yet challenging task. To leverage the advances in context-independent semantic parsing, we propose to perform follow-up query analysis, aiming to restate context-dependent natural language queries with contextual information. To accomplish the task, we propose STAR, a novel approach with a well-designed two-phase process. It is parser-independent and able to handle multifarious follow-up scenarios in different domains. Experiments on the FollowUp dataset show that STAR outperforms the state-of-the-art baseline by a large margin of nearly 8 follow-up query analysis. We also explore the extensibility of STAR on the SQA dataset, which is very promising.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 4

page 5

page 6

page 8

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recently, Natural Language Interfaces to Data- bases (NLIDB) has received considerable attention, as they allow users to query databases by directly using natural language. Current studies mainly focus on context-independent semantic parsing, which translates a single natural language sentence into its corresponding executable form (e.g. Structured Query Language) and retrieves the answer from databases regardless of its context. However, context does matter in real world applications. Users tend to issue queries in a coherent way when communicating with NLIDB. For example, after the query “How much money has Smith earned?” (Precedent Query), users may pose another query by simply asking “How about Bill Collins?” (Follow-up Query) instead of the complete “How much money has Bill Collins earned?” (Restated Query). Therefore, contextual information is essential for more accurate and robust semantic parsing, namely context-dependent semantic parsing.

Compared with context-independent semantic parsing, context-dependent semantic parsing has received less attention. Several attempts include a statistical model with parser trees Miller et al. (1996), a linear model with context-dependent logical forms Zettlemoyer and Collins (2009) and a sequence-to-sequence model Suhr et al. (2018). However, all these methods cannot apply to different domains, since the ATIS dataset Dahl et al. (1994) they rely on is domain-specific. A search-based neural method DynSP arises along with the SequentialQA (SQA) dataset Iyyer et al. (2017), which takes the first step towards cross-domain context-dependent semantic parsing. Nevertheless, DynSP focuses on dealing with relatively simple scenarios. All the aforementioned methods design context-dependent semantic parser from scratch. Instead, inspired by Liu et al. (2019), we propose to directly leverage the technical advances in context-independent semantic parsing. We define follow-up query analysis as restating the follow-up queries using contextual information in natural language, then the restated queries can be translated to the corresponding executable forms by existing context-independent parsers. In this way, we boost the performance of context-dependent semantic parsing.

In this paper, we focus on follow-up query analysis and present a novel approach. The main idea is to decompose the task into two phases by introducing a learnable intermediate structure span

: two queries first get split into several spans, and then undergo the recombination process. As no intermediate annotation is involved, we design rewards to jointly train the two phases by applying reinforcement learning (RL)

Sutton and Barto (1998). Our major contributions are as follows:

  • [leftmargin=*]

  • We propose a novel approach, named SpliT-And-Recombine (StAR)111Code is available at http://github.com/microsoft/EMNLP2019-Split-And-Recombine., to restate follow-up queries via two phases. It is parser-independent and can be seamlessly integrated with existing context-independent semantic parsers.

  • We conduct experiments on the FollowUp dataset Liu et al. (2019), which covers multifarious cross-domain follow-up scenarios. The results demonstrate that our approach significantly outperforms the state-of-the-art baseline.

  • We redesign the recombination process and extend StAR to the SQA dataset, where the annotations are answers. Experiments show promising results, that demonstrates the extensibility of our approach.

2 Methodology

In this section, we first give an overview of our proposed method with the idea of two-phase process, then introduce the two phases in turn.

2.1 Overview of Split-And-Recombine

Let , and denote the precedent query, follow-up query and restated query respectively, each of which is a natural language sentence. Our goal is to interpret the follow-up query with its precedent query as context, and generate the corresponding restated query . The restated query has the same meaning with the follow-up query, but it is complete and unambiguous to facilitate better downstream parsing. Formally, given the pair , we aim to learn a model and maximize the objective:

(1)

where represents the set of training data. As to , since always overlaps a great with and , it is intuitively more straightforward to find a way to merge and . To this end, we design a two-phase process and present a novel approach StAR to perform follow-up query analysis with reinforcement learning.

Figure 1: The two-phase process of an example from the FollowUp dataset (More real cases of diverse follow-up scenarios can be found in Table 3).
Figure 2: The overview of StAR with two phases.

A concrete example of the two-phase process is shown in Figure 1. Phase I is to Split input queries into several spans. For example, the precedent query is split into spans: “How much money has”, “Smith” and “earned”. Let denote a kind of way to split , then Phase I can be formulated as . Phase II is to Recombine

the spans by finding out the most probable

conflicting way, and generating the final output by restatement, denoted as . Two spans being conflicting means they are semantically similar. For example, “Smith” conflicts with “Bill Collins”. A conflicting way contains all conflicts between the precedent and follow-up spans. Backed by the two-phase idea of splitting and recombination, the overall likelihood of generating given is:

(2)

where represents the set of all possible ways to split

. Due to the lack of annotations for splitting and recombination, it is hard to directly perform supervised learning. Inspired by

Liang et al. (2017), we employ RL to optimize . Denoting the predicted restated query by , simplifying as , the goal of the RL training is to maximize following objective:

(3)

where is the space of all restated query candidates and represents the reward defined by comparing and the annotation . However, the overall candidate space is vast, making it impossible to exactly maximize . The most straightforward usage of the REINFORCE algorithm Williams (1992), sampling both and , also poses challenges for learning. To alleviate the problem, we propose to sample and enumerate all candidate after is determined. It could shrink the sampling space with an acceptable computational cost, which will be discussed in Section 3.2.2. Thus the problem turns to design a reward function to evaluate and guide the learning. To achieve it, we reformulate Equation 3 as:

(4)

and set the as:

(5)

The overview of StAR is summarized in Figure 2. Given , during training of Phase I (in blue), we fix to provide the reward , then can be learnt by the REINFORCE algorithm. During training of Phase II (in red), we fix and utilize it to generate , is trained to maximize Equation 5. In this way, and can be jointly trained. The details are introduced below.

2.2 Phase I: Split

As mentioned above, fixed , Phase I updates

, the Split Neural Network (

SplitNet). Taking the precedent query and follow-up query as input, as shown in Figure 2, splitting spans can be viewed as a sequence labeling problem over input. For each word, SplitNet outputs a label Split or Retain, indicating whether a split operation will be performed after the corresponding word. A label sequence uniquely identifies a way of splitting , mentioned as in Section 2.1. Figure 3 gives an example on the bottom. In the precedent query, two split operations are performed after “has” and “Smith” , since their labels are Split.

2.2.1 Split Neural Network

Intuitively, only after obtaining information from both the precedent query and follow-up query can SplitNet get to know the reasonable way to split spans. Inspired by BiDAF Seo et al. (2017), we apply a bidirectional attention mechanism to capture the interrelations between the two queries.

Embedding Layer

We consider embedding in three levels: character, word and sentence, respectively denoted as , and

. Character-level embedding maps each word to a vector in a high-dimensional space using Convolutional Neural Networks

Kim (2014). Word-level embedding is initialized using GloVe Pennington et al. (2014), and then it is updated along with other parameters. Sentence-level embedding is a one-hot vector designed to distinguish between precedent and follow-up queries. Then, the overall embedding function is .

Context Layer

On top of the embedding layer, Bidirectional Long Short-Term Memory Network (BiLSTM)

Hochreiter and Schmidhuber (1997); Schuster and Paliwal (1997) is applied to capture contextual information within one query. For word in the precedent query , the hidden state is computed, where the forward hidden state is:

(6)

Similarly, a hidden state is computed for word . The BiLSTMs for and share the same parameters.

Figure 3: Illustration of reward computation in Phase II.
Attention Layer

The interrelations between the precedent and follow-up queries are captured via attention layer. Let and denote the hidden states of two queries respectively, the similarity matrix is:

(7)

where and the entry represents the similarity between words and . Then the softmax function is used to obtain the precedent-to-follow (P2F) attention and the follow-to-precedent (F2P) attention. P2F attention represents using the similarities between and every word in . Specifically, let , where denotes the attention weights on according to . Then can be represented by a precedent-aware vector . Similarly, F2P attention computes the attention weights on according to , and represents as .

Output Layer

Combining the outputs of the context layer and the attention layer, we design the final hidden state as follows:

(8)
(9)

where , and denotes element-wise multiplication Lee et al. (2017). Let denote the final hidden state sequence. At each position , the probability of Split is , where denotes the sigmoid function and denotes the parameters.

2.2.2 Training

It is difficult to train RL model from scratch. Therefore, we propose to initialize SplitNet via pre-training, and then use reward to optimize it.

Pre-training

We obtain the pre-training annotation by finding the common substrings between and . One is a label sequence, each of which is Split or Retain. Given the pre-training data set whose training instance is as , the objective function of pre-training is:

(10)

where is the parameter of SplitNet.

Policy Gradient

After pre-training, we treat the label sequence as a variable . The reward (details in Section 2.3) is used to optimize the parameter with policy gradient methods Sutton et al. (1999). SplitNet is trained to maximize the following objective function:

(11)

In practice, REINFORCE algorithm Williams (1992) is applied to approximate Equation 11 via sampling from for times, where is a hyper-parameter representing the sample size. Furthermore, subtracting a baseline Weaver and Tao (2001) on

is also applied to reduce variance. The final objective function is as follows:

where (12)

2.3 Phase II: Recombine

Here we present Phase II with two questions: (1) Receiving the sampled label sequence , how to compute its reward ; (2) How to do training and inference for .

2.3.1 Reward Computation

Receiving the label sequence , we first enumerate all conflicting way candidates. Following the example in Figure 3, once we get a deterministic , the split of is uniquely determined. Here and are split into and spans respectively. Treating spans as units, we enumerate all conflicting way candidates methodically. We act up to the one-to-one conflicting principle, which means a span either has no conflict (denoted as EMPTY) or has only one conflict with a span in another query. Let denote the set of all conflicting way candidates, the size of which is in Figure 3.

For each conflicting way, we deterministically generate a restated query via the process named Restatement. In general, we simply replace spans in the precedent query with their conflicting spans to generate the restated query. For example, in Figure 3, the first one in is restated as “How about Bill Collins earned”. For spans in the follow-up query, if they contain column names or cell values and do not have any conflict, they are appended to the tail of the precedent query. It is designed to remedy the sub-query situation where there is no conflict (e.g. “Which opponent received over 537 attendance” and “And which got the result won 5-4”). Specially, if a span in the follow-up query contains a pronoun, we will in reverse replace it with its conflicting span to obtain the restated query.

Finally, the reward can be computed. Here we use BLEU and SymAcc222Their definitions along with the motivations of using them will be explained in Section 3.2. to build the reward function, expanding in Equation 5 as:

(13)

where and . The reward for can be obtained using Equation 5.

2.3.2 Training and Inference

Besides the reward computation, the recombination model needs to be trained to maximize Equation 5. To achieve this, we define a conflicting probability matrix , where and denote the number of spans in and respectively. The entry , the conflicting probability between the -th span in and the -th span in

, is obtained by normalizing the cosine similarity between their representations. Here the span representation is the subtraction representation

Wang and Chang (2016); Cross and Huang (2016), which means that span is represented by from the same BiLSTM in the context layer in Section 2.2.1. Given a conflicting way denoted as , the probability of generating its corresponding can be written as the multiplication over :

(14)

where if the -th span in conflicts with the -th span in ; otherwise, . With the above formulation, we can maximize Equation 5 through automatic differentiation. To reduce the computation, we only maximize , the near-optimal solution to Equation 5, where denotes the best predicted restated query so far.

Guided by the golden restated query , in training, we find out by computing the reward of each candidate. However in inference, where there is no golden restate query, we can only obtain from . Specially, for the -th span in the follow-up query, we find . That means, compared to other spans in the precedent query, the -th span has the highest probability to conflict with the -th span in the follow-up query. Moreover, similar to Lee et al. (2017), if , then the -th span in the follow-up query has no conflict. The hyper-parameter denotes the threshold.

Model Dev Test
SymAcc (%) BLEU (%) SymAcc (%) BLEU (%) AnsAcc (%)
 Bahdanau et al. (2015)   0.63 0.00 21.34 1.14   0.50 0.22 20.72 1.31
Gu et al. (2016) 17.50 0.87 43.36 0.54 19.30 0.93 43.34 0.45
Copy+BERT Devlin et al. (2019) 18.63 0.61 45.14 0.68 22.00 0.45 44.87 0.52
        –         – 22.00   – 52.02   – 25.24
Lee et al. (2017)         –         – 27.00   – 52.47   – 27.18
Liu et al. (2019) 49.00 1.28 60.14 0.98 47.80 1.14 59.02 0.54 60.19
StAR 55.38 1.21 67.62 0.65 54.00 1.09 67.05 1.05 65.05
Table 1: SymAcc, BLEU and AnsAcc on the FollowUp dataset. Results marked are from Liu et al. (2019).

2.4 Extension

So far, we have introduced the whole process of StAR. Next we explore its extensibility. As observed, when the annotations are restated queries, StAR is parser-independent and can be incorporated into any context-independent semantic parser. But what if the annotations are answers to follow-up queries? Assuming we have an ideal semantic parser, a predicted restated query can be converted into its corresponding answer . For example, given as “where are the players from”, could be “Las Vegas”. Therefore, revisiting Equation 3, in theory StAR is able to be extended by redesigning as , where denotes the answer annotation. We conduct an extension experiment to verify it, as discussed in Section 3.3.

3 Experiments

In this section, we demonstrate the effectiveness of StAR on the FollowUp dataset333http://github.com/SivilTaram/FollowUp with restated query annotations, and its promising extensibility on the SQA dataset444http://aka.ms/sqa with answer annotations.

3.1 Implementation details

We utilize PyTorch

Paszke et al. (2017) and AllenNLP Gardner et al. (2018) for implementation, and adopt Adam Kingma and Ba (2015) as the optimizer. The dimensions of word embedding and hidden state are both . Variational dropout Blum et al. (2015) is employed at embedding layer for better generalization ability (with probability ). The learning rate is set to be for pre-training, for RL training on FollowUp, and for SQA. In the implementation of the REINFORCE algorithm, we set to be . Finally, for hyper-parameters, we set , and . All the results are averaged over runs with random initialization.

3.2 Results on FollowUp dataset

The FollowUp dataset contains natural language query triples . Each triple belongs to a single database table, and there are tables in several different domains. Following the previous work, we split them into the sets of size // for train/dev/test. We evaluate the methods using both answer level and query level metrics. AnsAcc is to check the answer accuracy of predicted queries manually. Concretely, golden restated queries can be successfully parsed by Coarse2Fine Dong and Lapata (2018). We parse their corresponding predicted queries into SQL using Coarse2Fine and manually check the answers. Although AnsAcc is most convincing, it cannot cover the entire test set. Therefore, we apply two query level metrics: SymAcc detects whether all the SQL-related words are correctly involved in the predicted queries, for example column names, cell values and so on. It reflects the approximate upper bound of AnsAcc, as the correctness of SQL-related words is a prerequisite of correct execution in most cases; BLEU, referring to the cumulative 4-gram BLEU score, evaluates how similar the predicted queries are to the golden ones Papineni et al. (2002). SymAcc focuses on limited keywords, so we introduce BLEU to evaluate quality of the entire predicted query.

3.2.1 Model Comparison

Our baselines fall into two categories. Generation-based methods conform to the architecture of sequence-to-sequence  Sutskever et al. (2014) and generate restated queries by decoding each word from scratch. Seq2Seq Bahdanau et al. (2015) is the sequence-to-sequence model with attention, and CopyNet further incorporates a copy mechanism. Copy+BERT incorporates the latest pre-trained BERT model Devlin et al. (2019) as the encoder of CopyNet. Rewriting-based methods obtain restated queries by rewriting precedent and follow-up queries. Concat directly concatenates the two queries. E2ECR Lee et al. (2017) obtain restated queries by performing coreference resolution in follow-up queries. FanDa Liu et al. (2019) utilizes a structure-aware model to merge the two queries. Our method StAR also belongs to this category.

Answer Level

Table 1 shows AnsAcc results of competitive baselines on the test set. Compared with them, StAR achieves the highest, , which demonstrates its superiority. Meanwhile, it verifies the feasibility of follow-up query analysis in cooperating with context-independent semantic parsing. Compared with Concat, our approach boosts over on Coarse2Fine for the capability of context-dependent semantic parsing.

Query Level

Table 1 also shows SymAcc and BLEU of different methods on the dev and test sets. As observed, StAR significantly outperforms all baselines, demonstrating its effectiveness. For example, StAR achieves an absolute improvement of BLEU over the state-of-the-art baseline FanDa on testing. Moreover, the rewriting-based baselines, even the simplest Concat, perform better than the generation-based ones. It suggests that the idea of rewriting is more reasonable for the task, where precedent and follow-up queries are of full utilization.

Variant SymAcc (%) BLEU (%)
StAR 55.38 67.62
– Phase I 40.63 61.82
– Phase II 23.12 48.65
– RL 41.25 60.19
+ Basic Reward 43.13 58.48
+ Oracle Reward 45.20 63.04
+ Uniform Reward 53.40 66.93
Table 2: Variant results on FollowUp dev set.

3.2.2 Variant Analysis

Figure 4: Learning curve on FollowUp train set.

Besides baselines, we also conduct experiments with several variants of StAR to further validate the design of our model. As shown in Table 2, there are three variants with ablation: “– Phase I” takes out SplitNet and performs Phase II on word level; “– Phase II” performs random guess in the recombination process for testing; and “– RL” only contains pre-training. The SymAcc drops from about to by ablating Phase I, and to by ablating Phase II. Their poor performances indicate both of the two phases are indispensable. “– RL” also performs worse, which again demonstrates the rationality of applying RL.

Table 3: Case analysis of StAR on FollowUp dataset. Square brackets denote different spans.

Three more variants are presented with different designs of to prove the efficiency and effectiveness of Equation 5 as a reward. “+ Basic Reward” represents the most straightforward REINFORCE algorithm, which samples both and , then takes as . “+ Oracle Reward” assumes the conflicts are always correct and rewrites as . “+ Uniform Reward” assigns the same probability to all and obtains as . As shown in Table 2 and Figure 4, StAR learns better and faster than the variants due to the reasonable reward design. In fact, as mentioned in Section 2.1, the vast action space of the most straightforward REINFORCE algorithm leads to poor learning. StAR shrinks the space from down to by enumerating . Meanwhile, statistics show that StAR obtains a speedup over “+ Basic Reward” on the convergence time.

Figure 5: An example of similarity matrix in SplitNet.

3.2.3 Case Study

Figure 5 shows a concrete example of the similarity matrix on attention layer of SplitNet. The span “before week 10” is evidently more similar to “After the week 6” than to others, which meets our expectations. Moreover, the results of three real cases are shown in Table 3. The spans in blue are those have conflicts, and the histograms represent the conflict probabilities to all the spans in precedent queries. In Case 1, “glebe park”, “hampden park” and “balmoor” are all cell values in the database table with similar meanings. StAR correctly finds out the conflict between “compared to glebe park” and “compared to balmoor” with the highest probability. Case 2 shows StAR can discover the interrelation of words, where “the writer Nancy miller” is learnt as a whole span to replace “Nancy miller” in the precedent query. As for Case 3, StAR successfully performs coreference resolution and interprets “those two films” as “greatest love and promised land”. Benefiting from two phases, StAR is able to deal with diverse follow-up scenarios in different domains.

3.2.4 Error Analysis

Our approach works well in most cases except for few ones, where SplitNet fails. For example, given the precedent query “what’s the biggest zone?” and the follow-up query “the smallest one”, StAR prefers to recognize “the biggest zone” and “the smallest one” as two spans, rather than perform split operations inside them. The SplitNet fails probably because the conflicting spans, “the biggest” “the smallest” and “zone” “one”, are adjacent, which makes it difficult to identify span boundaries well.

3.3 Extension on SQA dataset

Model Precedent Follow-up
DynSP Iyyer et al. (2017) 70.9 35.8
NP Neelakantan et al. (2016) 58.9 35.9
NP + StAR 58.9 38.1
DynSP + StAR 70.9 39.5
DynSP Iyyer et al. (2017) 70.4 41.1
Table 4: Answer accuracy on SQA test set.

Finally, we demonstrate StAR’s extensibility in working with different annotations. As mentioned in Section 2.4, by designing , StAR can cooperate with the answer annotations. We conduct experiments on the SQA dataset, which consists of query sequences (/ for train/test). Each sequence contains multiple natural language queries and their answers, where we are only interested in the first query and the immediate follow-up one. As discussed in Iyyer et al. (2017), every answer can be represented as a set of cells in the tables, each of which is a multi-word value, and the intentions of the follow-up queries mainly fall into three categories. Column selection means the follow-up answer is an entire column; Subset selection means the follow-up answer is a subset of the precedent answer; and Row selection means the follow-up answer has the same rows with the precedent answer.

We employ two context-independent parsers, DynSP Iyyer et al. (2017) and NP Neelakantan et al. (2016), which are trained on the SQA dataset to provide relatively reliable answers for reward computing. Unfortunately, they both perform poorly for the restated queries, as the restated queries are quite different from the original queries in SQA. To address the problem, we redesign the recombination process. Instead of generating the restated query, we recombine the predicted precedent answer and the predicted follow-up answer to produce the restated answer

. Therefore, the objective of Phase II is to assign an appropriate intention to each follow-up span via an additional classifier. The goal of Phase I turns to split out spans having obvious intentions such as “of those”. The way of recombining answer is determined by the voting from intentions on all spans. If the intention column selection wins, then

; for subset selection, we obtain the subset by taking the rows of as the constraint and applying it to ; and for row selection, we take the rows of and the columns of as the constraints, then apply them to the whole database table to obtain the answer retrieved by the predicted SQL. The reward is computed based on Jaccard similarity between the gold answer and as in Iyyer et al. (2017), and the overall training process remains unchanged.

Table 4 shows the answer accuracy of precedent and follow-up queries on test set. DynSP Iyyer et al. (2017) is designed for SQA by introducing a special action Subsequent to handle follow-up queries based on DynSP. DynSP is incapable of being extended to work with the annotation of the restated queries. We attempt to apply DynSP (trained on SQA) directly on FollowUp test set, which results in an extremely low AnsAcc. On the contrary, StAR is extensible. “+StAR” means our method StAR is incorporated into the context-independent parser and empowers them with the ability to perform follow-up query analysis. As observed, integrating StAR consistently improves performance for follow-up queries, which demonstrates the effectiveness of StAR in collaborating with different semantic parsers. The comparable results of DynSP+StAR to DynSP further verifies the promising extensibility of StAR.

4 Related Work

Our work is closely related to two lines of work: context-dependent sentence analysis and reinforcement learning. From the perspective of context-dependent sentence analysis, our work is related to researches like reading comprehension in dialogue Reddy et al. (2019); Choi et al. (2018), dialogue state tracking Williams et al. (2013), conversational question answering in knowledge base Saha et al. (2018); Guo et al. (2018), context-dependent logic forms Long et al. (2016), and non-sentential utterance resolution in open-domain question answering Raghu et al. (2015); Kumar and Joshi (2017). The main difference is that we focus on the context-dependent queries in NLIDB which contain complex scenarios. As for the most related context-dependent semantic parsing, Zettlemoyer and Collins (2009) proposes a context-independent CCG parser and then conduct context-dependent substitution, Iyyer et al. (2017) presents a search-based method for sequential questions, and Suhr et al. (2018) presents a sequence-to-sequence model to solve the problem. Compared to their methods, our work achieves context-dependent semantic parsing via learnable restated queries and existing context-independent semantic parsers.

Moreover, the technique of reinforcement learning has also been successfully applied to natural language tasks in dialogue, such as hyper-parameters tuning for coreference resolution Clark and Manning (2016), sequential question answering Iyyer et al. (2017) and coherent dialogue responses generation Li et al. (2016). In this paper, we employ reinforcement learning to capture the structures of queries, which is similar to Zhang et al. (2018) for text classification.

5 Conclusion and Future Work

We present a novel method, named Split-And-Recombine (StAR), to perform follow-up query analysis. A two-phase process has been designed: one for splitting precedent and follow-up queries into spans, and the other for recombining them. Experiments on two different datasets demonstrate the effectiveness and extensibility of our method. For future work, we may extend our method to other natural language tasks.

Acknowledgments

We thank all the anonymous reviewers for their valuable comments. This work was supported by the National Natural Science Foundation of China (Grant Nos. U1736217 and 61932003).

References

  • D. Bahdanau, K. Cho, and Y. Bengio (2015) Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, External Links: Link Cited by: Table 1, §3.2.1.
  • A. Blum, N. Haghtalab, and A. D. Procaccia (2015) Variational dropout and the local reparameterization trick. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, NIPS 2015, Montreal, Quebec, Canada, December 7-12, 2015, pp. 2575–2583. External Links: Link Cited by: §3.1.
  • E. Choi, H. He, M. Iyyer, M. Yatskar, W. Yih, Y. Choi, P. Liang, and L. Zettlemoyer (2018) QuAC: question answering in context. In

    Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, Brussels, Belgium, October 31 - November 4, 2018

    ,
    pp. 2174–2184. External Links: Link Cited by: §4.
  • K. Clark and C. D. Manning (2016) Deep reinforcement learning for mention-ranking coreference models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pp. 2256–2262. External Links: Link Cited by: §4.
  • J. Cross and L. Huang (2016) Span-based constituency parsing with a structure-label system and provably optimal dynamic oracles. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pp. 1–11. External Links: Link Cited by: §2.3.2.
  • D. A. Dahl, M. Bates, M. Brown, W. M. Fisher, K. Hunicke-Smith, D. S. Pallett, C. Pao, A. I. Rudnicky, and E. Shriberg (1994) Expanding the scope of the ATIS task: the ATIS-3 corpus. In Proceedings of the 1994 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 1994, ,Plainsboro, New Jerey, USA, March 8-11, 1994, External Links: Link Cited by: §1.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1:Long and Short Papers, pp. 4171–4186. External Links: Link Cited by: Table 1, §3.2.1.
  • L. Dong and M. Lapata (2018) Coarse-to-Fine decoding for neural semantic parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, pp. 731–742. External Links: Link, Document Cited by: §3.2.
  • M. Gardner, J. Grus, M. Neumann, O. Tafjord, P. Dasigi, N. F. Liu, M. Peters, M. Schmitz, and L. Zettlemoyer (2018) AllenNLP: a deep semantic natural language processing platform. In Proceedings of Workshop for NLP Open Source Software, NLP-OSS, of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, pp. 1–6. External Links: Link, Document Cited by: §3.1.
  • J. Gu, Z. Lu, H. Li, and V. O. K. Li (2016) Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, Berlin, Germany, August 7-12, 2016, Volume 1: Long Papers, External Links: Link Cited by: Table 1.
  • D. Guo, D. Tang, N. Duan, M. Zhou, and J. Yin (2018) Dialog-to-action: conversational question answering over a large-scale knowledge base. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems, NeurIPS 2018, Montréal, Canada, December 3-8, 2018, pp. 2946–2955. External Links: Link Cited by: §4.
  • S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural Computation, Volume 9, pp. 1735–1780. External Links: Link, Document Cited by: §2.2.1.
  • M. Iyyer, W. Yih, and M. Chang (2017) Search-based neural structured learning for sequential question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pp. 1821–1831. External Links: Link, Document Cited by: §1, §3.3, §3.3, §3.3, Table 4, §4, §4.
  • Y. Kim (2014) Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, October 25-29, 2014, pp. 1746–1751. External Links: Link Cited by: §2.2.1.
  • D. P. Kingma and J. Ba (2015) Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, External Links: Link Cited by: §3.1.
  • V. Kumar and S. Joshi (2017) Incomplete follow-up question resolution using retrieval based sequence to sequence learning. In Proceedings of the 40th International ACM Conference on Research and Development in Information Retrieval, SIGIR 2017, Shinjuku, Tokyo, Japan, August 7-11, 2017, pp. 705–714. External Links: Link, Document Cited by: §4.
  • K. Lee, L. He, M. Lewis, and L. Zettlemoyer (2017) End-to-end neural coreference resolution. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pp. 188–197. External Links: Link Cited by: §2.2.1, §2.3.2, Table 1, §3.2.1.
  • J. Li, W. Monroe, A. Ritter, D. Jurafsky, M. Galley, and J. Gao (2016) Deep reinforcement learning for dialogue generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pp. 1192–1202. External Links: Link Cited by: §4.
  • C. Liang, J. Berant, Q. V. Le, K. D. Forbus, and N. Lao (2017)

    Neural symbolic machines: learning semantic parsers on freebase with weak supervision

    .
    In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30-August 4, 2017, Volume 1: Long Papers, pp. 23–33. External Links: Link, Document Cited by: §2.1.
  • Q. Liu, B. Chen, J. Lou, G. Jin, and D. Zhang (2019) FANDA: A novel approach to perform follow-up query analysis. In

    Proceedings of the 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019

    ,
    pp. 6770–6777. External Links: Link Cited by: 2nd item, §1, Table 1, §3.2.1.
  • R. Long, P. Pasupat, and P. Liang (2016) Simpler context-dependent logical forms via model projections. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, Berlin, Germany, August 7-12, 2016, Volume 1: Long Papers, External Links: Link Cited by: §4.
  • S. Miller, D. Stallard, R. J. Bobrow, and R. M. Schwartz (1996) A fully statistical approach to natural language interfaces. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, ACL 1996, University of California, Santa Cruz, California, June 24-27, 1996, pp. 55–61. External Links: Link Cited by: §1.
  • A. Neelakantan, Q. V. Le, and I. Sutskever (2016) Neural programmer: inducing latent programs with gradient descent. In Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, External Links: Link Cited by: §3.3, Table 4.
  • K. Papineni, S. Roukos, T. Ward, and W. Zhu (2002) BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, ACL 2002, Philadelphia, PA, USA., July 6-12, 2002, pp. 311–318. External Links: Link Cited by: §3.2.
  • A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in PyTorch. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, NIPS 2017, Long Beach, CA, USA, December 4-9, 2017, External Links: Link Cited by: §3.1.
  • J. Pennington, R. Socher, and C. D. Manning (2014) GloVe: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, October 25-29, 2014, pp. 1532–1543. External Links: Link Cited by: §2.2.1.
  • D. Raghu, S. Indurthi, J. Ajmera, and S. Joshi (2015) A statistical approach for non-sentential utterance resolution for interactive QA system. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2015, Prague, Czech Republic, September 2-4, 2015, pp. 335–343. External Links: Link Cited by: §4.
  • S. Reddy, D. Chen, and C. D. Manning (2019) CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, Volume 7, pp. 249–266. External Links: Link Cited by: §4.
  • A. Saha, V. Pahuja, M. M. Khapra, K. Sankaranarayanan, and S. Chandar (2018)

    Complex sequential question answering: towards learning to converse over linked question answer pairs with a knowledge graph

    .
    In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, New Orleans, Louisiana, USA, February 2-7, 2018, pp. 705–713. External Links: Link Cited by: §4.
  • M. Schuster and K. K. Paliwal (1997) Bidirectional recurrent neural networks. IEEE Trans. Signal Processing, Volume 45, pp. 2673–2681. External Links: Link, Document Cited by: §2.2.1.
  • M. J. Seo, A. Kembhavi, A. Farhadi, and H. Hajishirzi (2017) Bidirectional attention flow for machine comprehension. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, External Links: Link Cited by: §2.2.1.
  • A. Suhr, S. Iyer, and Y. Artzi (2018) Learning to map context-dependent sentences to executable formal queries. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1:Long Papers, pp. 2238–2249. External Links: Link Cited by: §1, §4.
  • I. Sutskever, O. Vinyals, and Q. V. Le (2014) Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, NIPS 2014, Montreal, Quebec, Canada, December 8-13, 2014, pp. 3104–3112. External Links: Link Cited by: §3.2.1.
  • R. S. Sutton and A. G. Barto (1998) Reinforcement learning: an introduction. IEEE Trans. Neural Networks, Volume 9, pp. 1054–1054. External Links: Link, Document Cited by: §1.
  • R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour (1999) Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12: Annual Conference on Neural Information Processing Systems, NIPS 1999, Denver Colorado, USA, November 29 - December 4, 1999, pp. 1057–1063. External Links: Link Cited by: §2.2.2.
  • W. Wang and B. Chang (2016) Graph-based dependency parsing with bidirectional LSTM. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, Berlin, Germany, August 7-12, 2016, Volume 1: Long Papers, External Links: Link Cited by: §2.3.2.
  • L. Weaver and N. Tao (2001) The optimal reward baseline for gradient-based reinforcement learning. In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, UAI 2001, Seattle, Washington, USA, August 2-5, 2001, pp. 538–545. External Links: Link Cited by: §2.2.2.
  • J. D. Williams, A. Raux, D. Ramachandran, and A. W. Black (2013) The dialog state tracking challenge. In Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2013, Metz, France, August 22-24, 2013, pp. 404–413. External Links: Link Cited by: §4.
  • R. J. Williams (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, Volume 8, pp. 229–256. External Links: Link, Document Cited by: §2.1, §2.2.2.
  • L. S. Zettlemoyer and M. Collins (2009) Learning context-dependent mappings from sentences to logical form. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics, ACL 2009, and the 4th International Joint Conference on Natural Language, IJCNLP 2009, Singapore, August 2-7, 2009, pp. 976–984. External Links: Link Cited by: §1, §4.
  • T. Zhang, M. Huang, and L. Zhao (2018) Learning structured representation for text classification via reinforcement learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, AAAI-18, New Orleans, Louisiana, USA, February 2-7, 2018, pp. 6053–6060. External Links: Link Cited by: §4.