Semantic parsing is the task of mapping natural language (NL) utterances to meaning representations. While there has been much progress in this area, earlier work has primarily focused on evaluating parsers in-domain (e.g., tables or databases) and often with the same programs as those provided in training Finegan-Dollak et al. (2018). A much more challenging goal is achieving domain generalization, i.e., building parsers which can be successfully applied to new domains and are able to produce complex unseen programs. Achieving this generalization goal would, in principle, let users query arbitrary (semi-)structured data on the Web and reduce the annotation effort required to build multi-domain NL interfaces (e.g., Apple Siri or Amazon Alexa). Current parsers struggle in this setting; for example, we show in Section 5 that a modern parser trained on the challenging Spider dataset Yu et al. (2018b) has a gap of more than 25% in accuracy between in-domain and out-of-domain performance. While the importance of domain generalization has been previously acknowledged Cai and Yates (2013); Chang et al. (2019), and datasets targetting zero-shot (or out-of-domain) performance are becoming increasingly available Pasupat and Liang (2015); Wang et al. (2015); Zhong et al. (2017); Yu et al. (2018b), little or no attention has been devoted to studying learning algorithms or objectives which promote domain generalization.
Conventional supervised learning simply assumes that source-domain and target-domain data originate from the same distribution, and as a result struggles to capture this notion of domain generalization for zero-shot semantic parsing. Previous approaches Guo et al. (2019b); Wang et al. (2019); Herzig and Berant (2018) facilitate domain generalization by incorporating inductive biases in the model, e.g., designing linking features or functions which should be invariant under domain shifts. In this work, we take a different direction and improve domain generalization of a semantic parser by modifying the learning algorithm and the objective. We draw inspiration from meta-learning Finn et al. (2017); Li et al. (2018a) and use an objective that optimizes for domain generalization. That is, we consider a set of tasks, where each task is a zero-shot semantic parsing task with its own source and target domains. By optimizing towards better target-domain performance on each task, we encourage a parser to extrapolate from source-domain data and achieve better domain generalization.
Specifically, we focus on text-to-SQL parsing where we aim at translating NL questions to SQL queries and conduct evaluations on unseen databases. Consider the example in Figure 1, a parser needs to process questions to a new database at test time. To simulate this scenario during training, we synthesize a set of virtual zero-shot parsing tasks by sampling disjoint source and target domains111We use the terms domain and database interchangeably. for each task from the training domains. The objective we require is that gradient steps computed towards better source-domain performance would also be beneficial to target-domain performance. One can think of the objective as consisting of both the loss on the source domain (as in standard supervised learning) and a regularizer, equal to the dot product between gradients computed on source- and target-domain data. Maximizing this regularizer favours finding model parameters that work not only on the source domain but also generalize to target-domain data. The objective is borrowed from Li et al. (2018a) who adapt a Model-Agnostic Meta-Learning (MAML; Finn et al. 2017
) technique for domain generalization in computer vision. In this work, we study the effectiveness of this objective in the context of semantic parsing. This objective is model-agnostic, simple to incorporate and not requiring any changes in the parsing model itself. Moreover, it does not introduce new parameters for meta-learning.
Our contributions can be summarized as follows.
To handle zero-shot semantic parsing, we apply a meta-learning objective that directly optimizes for domain generalization.
We propose an approximation of the meta-learning objective that is more efficient and allows more scalable training.
We perform experiments on two text-to-SQL benchmarks: Spider and Chinese Spider. Our new training objectives obtain significant improvements in accuracy over a baseline parser trained with conventional supervised learning.
We show that even when parsers are augmented with pre-trained models, e.g., BERT, our method can still effectively improve domain generalization in terms of accuracy.
Our code will be available at https://github.com/berlino/dgmaml-semparse.
2 Related Work
Zero-Shot Semantic Parsing
Developing a parser that can generalize to unseen domains has been drawing increasing attention in recent years. Previous work has mainly focused on the sub-task of schema linking as means of promoting domain generalization. In schema linking, we need to recognize which columns or tables are mentioned in a question. For example, a parser would decide to select the column Status because of the word statuses in Figure 1
. However, in the setting of zero-shot parsing, columns or tables might be mentioned in a question without being observed during training. One line of work tries to incorporate inductive biases, e.g., domain-invariant n-gram matching featuresGuo et al. (2019b); Wang et al. (2019), cross-domain alignment functions Herzig and Berant (2018), or auxiliary linking tasks Chang et al. (2019) to improve schema linking. However, in the cross-lingual setting of Chinese Spider Min et al. (2019), where questions and schemas are not in the same language, it is not obvious how to design such inductive biases like n-gram matching features for the cross-lingual setting. Another line of work relies on large-scale unsupervised pre-training on massive tables Herzig et al. (2020); Yin et al. (2020) to obtain better representations for both questions and database schemas. Our work is orthogonal to these approaches and can be easily coupled with them. As an example, we show in Section 5 that our training procedure can improve the performance of a parser already enhanced with n-gram matching features Guo et al. (2019b); Wang et al. (2019).
Our work is similar in spirit to Givoli and Reichart (2019), who also attempts to simulate source and target domains during learning. However, their optimization updates on virtual source and target domains are loosely connected by a two-step training procedure where a parser is first pre-trained on virtual source domains and then fine-tuned on virtual target domains. As we will show in Section 3, our training procedure does not fine-tune on virtual target domains but rather, for every batch, uses them to evaluate a gradient step made on source domains. This is better aligned with the test time: there will be no fine-tuning on the real target domains so there should be no fine-tuning on the simulated ones as well. Moreover, they treat the construction of virtual train and test domains as a hyper-parameter, which would only be possible when there are a small number of domains, making it not applicable to text-to-SQL parsing which typically has hundreds of domains.
Meta-Learning for NLP
Meta-learning has been receiving soaring interest in the machine learning community. Unlike conventional supervised learning, meta-learning operates on tasks, instead of data points. Most previous workVinyals et al. (2016); Ravi and Larochelle (2016); Finn et al. (2017) has focused on few-shot learning where meta-learning helps address the problem of learning to learn fast for adaptation to a new task or domains. The concept of fast adaptation has been reflected in many low-resource NLP tasks, e.g., low-resource machine translation Gu et al. (2018) and relation classification with limited supervision Obamuyide and Vlachos (2019). The basic motivation is that meta-learning, specifically MAML Finn et al. (2017) can learn a good initialization of parameters such that it could be easily tuned for a new task where only limited training data is available.
Very recently, there have been some adaptations of MAML to semantic parsing tasks Huang et al. (2018); Guo et al. (2019a); Sun et al. (2019). These approaches simulate few-shot learning scenarios by constructing a pseudo-task for each example, where pseudo training examples are those that are relevant to the example and are retrieved from original training examples. The notion of relevance is then encoded by MAML, and can be intuitively understood as: train a parser such that it can be easily fine-tuned for an example on its relevant examples at test time. Lee et al. (2019) use matching networks Vinyals et al. (2016) to enable one-shot text-to-SQL parsing where tasks for meta-learning are defined by SQL templates, i.e., a parser is expected to generalize to a new SQL template with one example. In contrast, the tasks we construct for meta-learning aim to encourage generalization across domains, instead of adaptation to a new (pseudo-)task with one or few examples. A clear difference lies in how meta-train and meta-test sets are constructed. In previous work (e.g., Huang et al. 2018), these come from the same domain whereas we simulate domain shift and sample different sets of domains for meta-train and meta-test.
Although the notion of domain generalization has been less explored in semantic parsing, it has been extensively studied in other areas like computer vision Ghifary et al. (2015); Zaheer et al. (2017); Li et al. (2018b). Recent work Li et al. (2018a); Balaji et al. (2018) employed optimization-based meta-learning to handle domain shift issues in domain generalization. We employ the meta-learning objective originally proposed in Li et al. (2018a), where they adapt MAML to encourage generalization in unseen domains (of images). Based on this objective, we propose a cheap alternative that only requires first-order gradients, thus alleviating the overhead of computing second-order derivatives required by MAML.
3 Meta-Learning for Domain Generalization
We first formally define the problem of domain generalization in the context of zero-shot text-to-SQL parsing. Then, we introduce DG-MAML, a training algorithm that helps a parser achieve better domain generalization. Finally, we propose a computationally cheap approximation of DG-MAML.
3.1 Problem Definition
Given a natural language question in the context of a relational database , we aim at generating the corresponding SQL . In the setting of zero-shot parsing, we have a set of source domains where labeled question-SQL pairs are available. We aim at developing a parser that can perform well on a set of unseen target domains . We refer to this problem as domain generalization.
We assume a parameterized parsing model that specifies a predictive distribution over all possible SQLs. For domain generalization, a parsing model needs to properly condition on its input of questions and databases such that it can generalize well to unseen domains.
Conventional Supervised Learning
Assuming that question-SQL pairs from source domains and target domains are sampled i.i.d from the same distribution, the typical training objective of supervised learning is to minimize the loss function of the negative log-likelihood of gold SQL query:
where is the size of mini-batch . Since a mini-batch is randomly sampled from all training source domains , it usually contains question-SQL pairs from a mixture of different domains.
Distribution of Tasks
Instead of treating semantic parsing as a conventional supervised learning problem, we take an alternative view based on the meta-learning perspective. Basically, we are interested in a learning algorithm that can benefit from a distribution of choices of source and target domains, denoted by , where refers to an instance of a zero-shot semantic parsing task that has its own source and target domains.
In practice, we usually have a fixed set of training source domains . We construct a set of virtual tasks by randomly sampling disjoint source and target domains from the training domains. Intuitively, we assume that divergences between test and train domain during the learning phase are representative of differences between training domains and actual test domains. This is still an assumption, but considerably weaker compared to the i.i.d. assumption used in conventional supervised learning. Next, we introduce our training algorithm called DG-MAML based on this assumption.
3.2 Learning to Generalize with DG-MAML
Having simulated source and target domains for each virtual task, we now need a training algorithm that encourages generalization to unseen target domains in each task. For this, we turn to optimization-based meta-learning algorithms Finn et al. (2017); Nichol et al. (2018); Li et al. (2018a) and apply DG-MAML (Domain Generalization with Model-Agnostic Meta-Learning), a variant of MAML Finn et al. (2017) for such purpose. Intuitively, DG-MAML encourages the optimization in the source domain to have a positive effect on the target domain as well.
During each learning episode of DG-MAML, we randomly sample a task which has its own source domain and target domain . For the sake of efficiency, we randomly sample mini-batch question-SQL pairs and from and respectively for learning in each task.
DG-MAML conducts optimization in two steps, namely meta-train and meta-test. We explain both steps as follows.
DG-MAML first optimize parameters towards better performance in the virtual source domain
by taking one step of stochastic gradient descent (SGD) from the loss under.
where is a scalar denoting the learning rate of meta-train. This step resembles conventional supervised learning where we use stochastic gradient descent to optimize the parameters.
We then evaluate the resulting parameter in the virtual target domain by computing the loss under , which is denoted as .
Our final objective for a task is to minimize the joint loss on and :
where we optimize towards the better source and target domain performance simultaneously. Intuitively, the objective requires that the gradient step conducted in the source domains in Equation (2) should be beneficial to the performance of the target domain as well. In comparison, conventional supervised learning, whose objective would be equivalent to , does not pose any constraint on the gradient updates. As we will elaborate shortly, DG-MAML can be viewed as a regularization of gradient updates in addition to the objective of conventional supervised learning.
We summarize our DG-MAML training process in Algorithm 1. Basically, it requires two steps of gradient update (Step 5 and Step 7). Note that is a function of after the meta-train update. Hence, optimizing with respect to involves optimizing the gradient update in Equation (2) as well. That is, when we update the parameters in the final update of Step 7, the gradients need to back-propagate though the meta-train updates in Step 5. We will elaborate on this shortly.
The update function in Step 7 could be based on any gradient descent algorithm. In this work, we use the update rule of Adam Kingma and Ba (2014). In principle, gradient updates during meta-train in Step 5 could also be replaced by other gradient descent algorithms. However, we leave this to future work.
3.3 Analysis of DG-MAML
where in the last step we expand the function at . The approximated objective sheds light on what DG-MAML optimizes. In addition to minimizing the losses from both source and target domains, which are , DG-MAML further tries to maximize , the dot product between the gradients of source and target domain. That is, it encourages gradients to generalize between source and target domain within each task .
3.4 First-Order Approximation
The final update in Step 7 of Algorithm 1 requires second-order derivatives, which may be problematic, inefficient or non-stable with certain classes of models Mensch and Blondel (2018). Hence, we propose an approximation that only requires computing first-order derivatives.
First, the gradient of the objective in Equation (3) can be computed as:
is an identity matrix andis the Hessian of at . We consider the alternative of ignoring this second-order term and simply assume that . In this variant, we simply combine gradients from source and target domains. We show in the Appendix that this objective can still be viewed as maximizing the dot product of gradients from source and target domain.
The resulting first-order training objective, which we refer to as DG-FMAML, is inspired by Reptile, a first-order meta-learning algorithm Nichol et al. (2018) for few-shot learning. A two-step Reptile would compute SGD on the same batch twice while DG-FMAML computes SGD on two different batches, and , once. To put it differently, DG-FMAML tries to encourage cross-domain generalization while Reptile encourages in-domain generalization.
4 Semantic Parser
In general, DG-MAML is model-agnostic and can be coupled with any semantic parser to improve its domain generalization. In this work, we use a base parser that is based on RAT-SQL Wang et al. (2019), which currently achieves state-of-the-art performance on Spider.
Formally, RAT-SQL takes as input question and schema of its corresponding database. Then it produces a program which is represented as an abstract syntax tree in the context-free grammar of SQL Yin and Neubig (2018). RAT-SQL adopts the encoder-decoder framework for text-to-SQL parsing. It basically has three components: an initial encoder, a transformer-based encoder and an LSTM-based decoder. The initial encoder provides initial representations, denoted as and for the question and the schema, respectively. A relation-aware transformer (RAT) module then takes the initial representations and further computes contextual-aware representations and for the question and the schema respectively. Finally, a decoder generates a sequence of production rules that constitute the abstract syntax tree based on and . To obtain and , the initial encoder could either be 1) LSTMs Hochreiter and Schmidhuber (1997) on top of pre-trained word embeddings, like GloVe Pennington et al. (2014), or 2) pre-trained contextual embeddings like BERT Devlin et al. (2018). In our work, we will test the effectiveness of our method for both variants.
As shown in Wang et al. (2019), the final encodings and
, which are the output of the RAT module, heavily rely on schema-linking features. These features are extracted from a heuristic function that links question words to columns and tables based on n-gram matching, and they are readily available in the conventional mono-lingual setting of the Spider dataset. However, we hypothesize that the parser’s over-reliance on these features is specific to the Spider dataset, where annotators were shown the database schema and asked to formulate queries. As a result, they were prone to re-using terms from the schema verbatim in their questions. This would not be the case in a real-world application where users are unfamiliar with the structure of the underlying database and free to use arbitrary terms which would not necessarily match column or table namesSuhr et al. (2020). Hence, we will also evaluate our parser in the cross-lingual setting where and are not in the same language, and such features would not be available.
To show the effectiveness of DG-MAML, we integrate it with a base parser and test it on zero-shot text-to-SQL tasks. Then we present further analysis of DG-MAML to show how it affects the domain generalization of the parser. By designing an in-domain benchmark, we also show that the out-of-domain improvement does not come at the cost of in-domain performance.
5.1 Datasets and Metrics
We evaluate DG-MAML on two zero-shot text-to-SQL benchmarks, namely, (English) Spider Yu et al. (2018b) and Chines Spider Min et al. (2019). Spider consists of 10,181 examples (questions and SQL pairs) from 206 databases, including 1,659 examples taken from the Restaurants (Popescu et al., 2003; Tang and Mooney, 2000), GeoQuery (Zelle and Mooney, 1996), Scholar (Iyer et al., 2017), Academic (Li and Jagadish, 2014)
, Yelp and IMDB(Yaghmazadeh et al., 2017) datasets. We follow their split and use 8,659 examples (from 146 databases) for training, and 1,034 examples (from 20 databases) as our development set. The remaining 2,147 examples from 40 test databases are held out and kept by the authors for evaluation.
Chinese Spider is a Chinese version of Spider that translates all NL questions from English to Chinese and keeps the original English database. It simulates the real-life scenario where schemas for most relational databases in industry are written in English while NL questions from users could be in any other language. It poses to a parser an additional challenge of encoding cross-lingual correspondences between Chinese and English. Following Min et al. (2019), we use the same split of train/development/test as the Spider dataset.
In both datasets, we report results using the metric of exact set match accuracy, following Yu et al. (2018b). As the test set for both datasets are not publicly available, we will mostly use the development sets for further analyses.
5.2 Implementation and Hyperparameters
Our base parser is based on RAT-SQL Wang et al. (2019)
, which is implemented in PyTorchPaszke et al. (2019). During preprocessing, input questions, column names and table names in schemas are tokenized and lemmatized by Stanza Qi et al. (2020) which can handle both English and Chinese. For English questions and schemas, we use GloVe Pennington et al. (2014) and BERT-large Devlin et al. (2018) as the pre-trained embeddings for encoding. For Chinese questions, we use Tencent embeddings Song et al. (2018) and Multilingual-BERT Devlin et al. (2018).
In all experiments, we use a batch size of
and train for up to 20,000 steps. See Appendix for details and configurations of other hyperparameters.
5.3 Main Results
We present our main results in Table 2. On Spider, DG-MAML boosts the performance of the non-BERT base parser by 2.1%, showing its effectiveness in promoting domain generalization. Moreover, the improvement is not cancelled out when the base parsers are augmented with BERT representations. On Chinese Spider, DG-MAML helps the non-BERT base parser achieve a substantial improvement (+4.5%). For parsers augmented with multilingual BERT, DG-MAML is also beneficial. Overall, DG-MAML consistently helps the base parser achieve better accuracy, and it is empirically complementary to using pre-trained representations (e.g., BERT).
Compared with the mono-lingual setting of Spider, the performance margin by DG-MAML is more significant in the cross-lingual setting of Chinese Spider. This is presumably due to that heuristic schema-linking features, which help promote domain generalization for Spider, are not feasible in Chinese Spider. We will elaborate on this in Section 5.4.
To confirm that the base parser struggles when applied out-of-domain, we construct an in-domain setting and measure the gap in performance. This setting also helps us address a natural question: does using DG-MAML hurt in-domain performance? This would not have been surprising as the parser is explicitly optimized towards better performance on unseen target domains. To answer these questions, we create a new split of Spider. Specifically, for each database from the training and development set of Spider, we include 80% of its question-SQL pairs in the new training set and assign the remaining 20% to the new test set. As a result, the new split consists of 7702 training examples and 1991 test examples. When using this split, the parser is tested on databases that are all have been seen during training. We evaluate the non-BERT parsers with the same metric of set match for evaluation.
As in-domain and out-of-domain setting have different splits, and thus do not use the same test set, the direct comparison between them only serves as a proxy to illustrate the effect of domain shift. We show that, despite the original split of out-of-domain setting containing a larger number of training examples (8659 vs 7702), the base parser tested in-domain achieves a much better performance (78.2%) than its counterpart tested out-of-domain (56.4%). This suggests that the domain shift genuinely hurts the base parser.
We further study DG-MAML in this in-domain setting to see if it causes a drop in in-domain performance. Somewhat surprisingly, we instead observe a modest improvement (+1.1%) over the base parser trained with conventional supervised learning. This suggests that DG-MAML, despite optimizing the model towards domain generalization, captures, to a certain degree, a more general notion of generalization or robustness, which appears beneficial even in the in-domain setting.
5.4 Additional Experiments and Analysis
We first discuss additional experiments on linking features and DG-FMAML, and then present further analysis probing how DG-MAML works.
As mentioned in Section 2, previous work addressed domain generalization by focusing on the sub-task of schema linking. For Spider, where questions and schemas are both in English, Wang et al. (2019) leverage n-gram matching features which improve schema linking and significantly boost parsing performance. However, in Chinese Spider, it is not easy and obvious how to design such linking heuristics. Moreover, as pointed out by Suhr et al. (2020), the assumption that columns/tables are explicitly mentioned is not general enough, implying that exploiting matching features would not be a good general solution to domain generalization. Hence, we would like to see whether DG-MAML can be beneficial when those features are not present.
Specifically, we consider the variant of the base parser that does not use this feature, and train it with conventional supervised learning and with DG-MAML for Spider. As shown222Some results in Table 3 differ from Table 2. The former reports dev set performance over three runs, while the latter shows the best model, selected based on dev set performance. in Table 3, we confirm that those features have a big impact on the base parser. More importantly, in the absence of those features, DG-MAML boosts the performance of the base parser by a larger margin. This is consistent with the observation that DG-MAML is more beneficial for Chinese Spider than Spider, in the sense that the parser would need to rely more on DG-MAML when these heuristics are not integrated or not available for domain generalization.
Effect of DG-FMAML
We investigate the effect of the first-order approximation in DG-FMAML to see if it would provide a reasonable performance compared with DG-MAML. We evaluate it on the development sets of the two datasets, see Table 3. DG-FMAML consistently boosts the performance of the base parser, although it lags behind DG-MAML. For a fair comparison, we use the same batch size for DG-MAML and DG-FMAML. However, considering the fact that DG-FMAML uses less memory, it could potentially benefit from a larger batch size. In practice, DG-FMAML is twice faster to train than DG-MAML, see Appendix for details.
Probing Domain Generalization
Schema linking has been the focus of previous work on zero-shot semantic parsing. We take the opposite direction and use this task to probe the parser to see if it, at least to a certain degree, achieves domain generalization due to improving schema linking. Our hypothesis is that improving linking is the mechanism which prevents the parser from being trapped in overfitting the source domains.
We propose to use ‘relevant column recognition’ as a probing task. Specifically, relevant columns refer to the columns that are mentioned in SQL queries. For example, the SQL query “Select Status, avg(Population) From City Groupby Status” in Figure 1
contains two relevant columns: ‘Status’ and ‘Population’. We formalize this task as a binary classification problem: given a NL question and a column from the corresponding database schema, a binary classifier should predict whether the column is mentioned in the gold SQL query. We hypothesize that representations from the DG-MAML parser will be more predictive of the relevance than those of the baseline parser, and the probing classifier will be able to detect this difference in the quality of the representations.
We first obtain the representations for NL questions and schemas from the parsers and keep them fixed. The binary classifier is then trained based only on these representations. For classifier training we use the same split as the Spider dataset, i.e. the classifier is evaluated on unseen databases. Details of the classifier are provided in the Appendix. The results are shown in Table 4. The classifier trained relying on the parser with DG-MAML achieves better performance. This confirms our hypothesis that using DG-MAML makes the parser have better encodings of NL questions and database schemas and that this is one of the mechanisms the parsing model uses to ensure generalization.
|Base Parser + DG-MAML||73.8||70.6||72.1|
|Base Parser + DG-MAML||66.8||61.2||63.9|
The task of zero-shot semantic parsing has been gaining momentum in recent years. However, previous work has not proposed algorithms or objectives that explicitly promote domain generalization. We rely on the meta-learning framework to encourage domain generalization. Instead of learning from individual data points, DG-MAML learns from a set of virtual zero-shot parsing tasks. By optimizing towards better target-domain performance in each simulated task, DG-MAML encourages the parser to generalize better to unseen domains.
We conduct experiments on two zero-shot text-to-SQL parsing datasets. In both cases, using DG-MAML leads to a substantial boost in performance. Furthermore, we show that the faster first-order approximation DG-FMAML can also help a parser achieve better domain generalization.
We would like to thank the anonymous reviewers for their valuable comments. We gratefully acknowledge the support of the European Research Council (Titov: ERC StG BroadSem 678254; Lapata: ERC CoG TransModal 681760) and the Dutch National Science Foundation (NWO VIDI 639.022.518).
- Balaji et al. (2018) Yogesh Balaji, Swami Sankaranarayanan, and Rama Chellappa. 2018. Metareg: Towards domain generalization using meta-regularization. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 998–1008. Curran Associates, Inc.
Bogin et al. (2019)
Ben Bogin, Matt Gardner, and Jonathan Berant. 2019.
Global reasoning over
database structures for text-to-SQL parsing.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3659–3664, Hong Kong, China. Association for Computational Linguistics.
- Cai and Yates (2013) Qingqing Cai and Alexander Yates. 2013. Large-scale semantic parsing via schema matching and lexicon extension. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 423–433, Sofia, Bulgaria. Association for Computational Linguistics.
- Chang et al. (2019) Shuaichen Chang, Pengfei Liu, Yun Tang, Jing Huang, Xiaodong He, and Bowen Zhou. 2019. Zero-shot text-to-sql learning with auxiliary task.
- Choi et al. (2020) DongHyun Choi, Myeong Cheol Shin, EungGyun Kim, and Dong Ryeol Shin. 2020. Ryansql: Recursively applying sketch-based slot fillings for complex text-to-sql in cross-domain databases. arXiv preprint arXiv:2004.03125.
- Devlin et al. (2018) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Finegan-Dollak et al. (2018) Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, and Dragomir Radev. 2018. Improving text-to-SQL evaluation methodology. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 351–360, Melbourne, Australia. Association for Computational Linguistics.
- Finn et al. (2017) Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1126–1135. JMLR. org.
Ghifary et al. (2015)
Muhammad Ghifary, W Bastiaan Kleijn, Mengjie Zhang, and David Balduzzi. 2015.
Domain generalization for object recognition with multi-task autoencoders.In Proceedings of the IEEE international conference on computer vision, pages 2551–2559.
- Givoli and Reichart (2019) Ofer Givoli and Roi Reichart. 2019. Zero-shot semantic parsing for instructions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4454–4464, Florence, Italy. Association for Computational Linguistics.
- Gu et al. (2018) Jiatao Gu, Yong Wang, Yun Chen, Victor O. K. Li, and Kyunghyun Cho. 2018. Meta-learning for low-resource neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3622–3631, Brussels, Belgium. Association for Computational Linguistics.
- Guo et al. (2019a) Daya Guo, Duyu Tang, Nan Duan, Ming Zhou, and Jian Yin. 2019a. Coupling retrieval and meta-learning for context-dependent semantic parsing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 855–866, Florence, Italy. Association for Computational Linguistics.
- Guo et al. (2019b) Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, and Dongmei Zhang. 2019b. Towards complex text-to-SQL in cross-domain database with intermediate representation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4524–4535, Florence, Italy. Association for Computational Linguistics.
- Herzig and Berant (2018) Jonathan Herzig and Jonathan Berant. 2018. Decoupling structure and lexicon for zero-shot semantic parsing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1619–1629, Brussels, Belgium. Association for Computational Linguistics.
- Herzig et al. (2020) Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno, and Julian Martin Eisenschlos. 2020. Tapas: Weakly supervised table parsing via pre-training.
- Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
- Huang et al. (2018) Po-Sen Huang, Chenglong Wang, Rishabh Singh, Wen-tau Yih, and Xiaodong He. 2018. Natural language to structured query generation via meta-learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 732–738, New Orleans, Louisiana. Association for Computational Linguistics.
- Iyer et al. (2017) Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer. 2017. Learning a neural semantic parser from user feedback. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 963–973.
- Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Lee et al. (2019) Dongjun Lee, Jaesik Yoon, Jongyun Song, Sanggil Lee, and Sungroh Yoon. 2019. One-shot learning for text-to-sql generation.
Li et al. (2018a)
Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M Hospedales. 2018a.
Learning to generalize: Meta-learning for domain generalization.
Thirty-Second AAAI Conference on Artificial Intelligence.
- Li and Jagadish (2014) Fei Li and H. V. Jagadish. 2014. Constructing an interactive natural language interface for relational databases. Proceedings of the VLDB Endowment, 8(1):73–84.
Li et al. (2018b)
Haoliang Li, Sinno Jialin Pan, Shiqi Wang, and Alex C Kot. 2018b.
Domain generalization with adversarial feature learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5400–5409.
- Mensch and Blondel (2018) Arthur Mensch and Mathieu Blondel. 2018. Differentiable dynamic programming for structured prediction and attention.
- Min et al. (2019) Qingkai Min, Yuefeng Shi, and Yue Zhang. 2019. A pilot study for Chinese SQL semantic parsing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3652–3658, Hong Kong, China. Association for Computational Linguistics.
- Nichol et al. (2018) Alex Nichol, Joshua Achiam, and John Schulman. 2018. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999.
- Obamuyide and Vlachos (2019) Abiola Obamuyide and Andreas Vlachos. 2019. Model-agnostic meta-learning for relation classification with limited supervision. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5873–5879, Florence, Italy. Association for Computational Linguistics.
- Pasupat and Liang (2015) Panupong Pasupat and Percy Liang. 2015. Compositional semantic parsing on semi-structured tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1470–1480, Beijing, China. Association for Computational Linguistics.
Paszke et al. (2019)
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory
Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al.
Pytorch: An imperative style, high-performance deep learning library.In Advances in Neural Information Processing Systems, pages 8024–8035.
Pennington et al. (2014)
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014.
Glove: Global vectors for word representation.Citeseer.
- Popescu et al. (2003) Ana-Maria Popescu, Oren Etzioni, , and Henry Kautz. 2003. Towards a theory of natural language interfaces to databases. In Proceedings of the 8th International Conference on Intelligent User Interfaces, pages 149–157.
- Qi et al. (2020) Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D. Manning. 2020. Stanza: A Python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations.
- Ravi and Larochelle (2016) Sachin Ravi and Hugo Larochelle. 2016. Optimization as a model for few-shot learning.
- Song et al. (2018) Yan Song, Shuming Shi, Jing Li, and Haisong Zhang. 2018. Directional skip-gram: Explicitly distinguishing left and right context for word embeddings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 175–180, New Orleans, Louisiana. Association for Computational Linguistics.
- Suhr et al. (2020) Alane Suhr, Ming-Wei Chang, Peter Shaw, and Kenton Lee. 2020. Exploring unexplored generalization challenges for cross-database semantic parsing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8372–8388, Online. Association for Computational Linguistics.
- Sun et al. (2019) Yibo Sun, Duyu Tang, Nan Duan, Yeyun Gong, Xiaocheng Feng, Bing Qin, and Daxin Jiang. 2019. Neural semantic parsing in low-resource settings with back-translation and meta-learning.
- Tang and Mooney (2000) Lappoon R. Tang and Raymond J. Mooney. 2000. Automated construction of database interfaces: Intergrating statistical and relational learning for semantic parsing. In 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 133–141.
- Vinyals et al. (2016) Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. 2016. Matching networks for one shot learning. In Advances in neural information processing systems, pages 3630–3638.
- Wang et al. (2019) Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson. 2019. Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers. arXiv preprint arXiv:1911.04942.
- Wang et al. (2015) Yushi Wang, Jonathan Berant, and Percy Liang. 2015. Building a semantic parser overnight. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1332–1342, Beijing, China. Association for Computational Linguistics.
- Yaghmazadeh et al. (2017) Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. 2017. Sqlizer: Query synthesis from natural language. In International Conference on Object-Oriented Programming, Systems, Languages, and Applications, ACM, pages 63:1–63:26.
- Yin and Neubig (2018) Pengcheng Yin and Graham Neubig. 2018. TRANX: A transition-based neural abstract syntax parser for semantic parsing and code generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 7–12, Brussels, Belgium. Association for Computational Linguistics.
- Yin et al. (2020) Pengcheng Yin, Graham Neubig, Wen tau Yih, and Sebastian Riedel. 2020. TaBERT: Pretraining for joint understanding of textual and tabular data. In Annual Conference of the Association for Computational Linguistics (ACL).
- Yu et al. (2018a) Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, and Dragomir Radev. 2018a. SyntaxSQLNet: Syntax tree networks for complex and cross-domain text-to-SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1653–1663, Brussels, Belgium. Association for Computational Linguistics.
- Yu et al. (2018b) Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2018b. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921, Brussels, Belgium. Association for Computational Linguistics.
- Zaheer et al. (2017) Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. 2017. Deep sets. In Advances in neural information processing systems, pages 3391–3401.
- Zelle and Mooney (1996) John M. Zelle and Raymond J. Mooney. 1996. Learning to parse database queries using inductive logic programming. In Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2, pages 1050–1055.
- Zhong et al. (2017) Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2sql: Generating structured queries from natural language using reinforcement learning. CoRR, abs/1709.00103.