Log In Sign Up

Transferable Natural Language Interface to Structured Queries aided by Adversarial Generation

A natural language interface (NLI) to structured query is intriguing due to its wide industrial applications and high economical values. In this work, we tackle the problem of domain adaptation for NLI with limited data on target domain. Two important approaches are considered: (a) effective general-knowledge-learning on source domain semantic parsing, and (b) data augmentation on target domain. We present a Structured Query Inference Network (SQIN) to enhance learning for domain adaptation, by separating schema information from NL and decoding SQL in a more structural-aware manner; we also propose a GAN-based augmentation technique (AugmentGAN) to mitigate the issue of lacking target domain data. We report solid results on GeoQuery, Overnight, and WikiSQL to demonstrate state-of-the-art performances for both in-domain and domain-transfer tasks.


page 1

page 2

page 3

page 4


Adversarial Domain Adaptation for Variational Neural Language Generation in Dialogue Systems

Domain Adaptation arises when we aim at learning from source domain a mo...

CatGAN: Coupled Adversarial Transfer for Domain Generation

This paper introduces a Coupled adversarial transfer GAN (CatGAN), an ef...

Domain Adaptation for Semantic Parsing

Recently, semantic parsing has attracted much attention in the community...

Tree-Structured Semantic Encoder with Knowledge Sharing for Domain Adaptation in Natural Language Generation

Domain adaptation in natural language generation (NLG) remains challengi...

Knothe-Rosenblatt transport for Unsupervised Domain Adaptation

Unsupervised domain adaptation (UDA) aims at exploiting related but diff...

SKATE: A Natural Language Interface for Encoding Structured Knowledge

In Natural Language (NL) applications, there is often a mismatch between...


Great efforts have been invested in deep-learning-based semantic parsing to convert natural language (NL) texts to structured representation or logical forms

[Wang2015BuildingAS, PasupatL15, Jia2016DataRF]. In particular, a special case of semantic parsing - natural language interface (NLI) to structured query like SQL [androu1995natural, popescu2003towards, li2005nalix, li2014nalir] - has incited significant interests; the motivation is two-fold: 1) majority data in the world is stored in relational tables (databases), and a NLI to database engine has the potential to support a great many dialogue-based applications; 2) it is extremely difficult for machine to understand the meanings of arbitrary NL texts, especially involving multiple different domains, but the complexity is more likely to be reduced by converting texts to formal languages.

Some previous works have been using seq2seq [sutskever2014sequence] models to generate structured queries for a certain database, given NL queries [Dong2016, Jia2016DataRF]. Provided with abundant data and through an end-to-end training process, seq2seq model achieves decent performance on a single relational table; however, it is not straightforward to apply a trained model to a new table. For example, suppose we have two queries and against a geography table and an employee table, respectively:

a model trained for the geography table is able to parse , but when if comes to the employee table, the model would fail to directly parse .

This is because seq2seq model with end-to-end training mixes up three types of knowledge: (a) the ontology of NL (grammar), (b) domain-specific language usage, and (c) the schema information of the relational table (column and value). Back to the example, the end-to-end model has only trained on “age” but not “size”, “john smith” but not “south america”, so even though they are both schema information, it is difficult to use a model trained for one table (source domain) to answer queries against another (target domain).

Therefore, for a reliable domain-adaptation solution, two important approaches should be considered: (1) improve the learning of the NL ontology knowledge on source domain, which could then extend to target domain; and (2) manage to augment more data on target domain, so general NL knowledge and domain-specific usage are better learned. In this paper, we address the two problems accordingly:

  1. We design a Structured-Query Inference Network (SQIN) for better cross-domain semantic-parsing, by separating schema-related information from NL query and decoding SQL in a more structural-aware way;

  2. We design a generative adversarial network AugmentGAN to augment the limited number of training data on target domain.

General Approaches

To make domain adaptation more effective and with less required resources, we are going to: (1) explicitly separate the relational-table-related information in the NL query and structurally generate SQL, and (2) given limited number of target domain data, try to effectively augment to a larger size. We will first introduce the scope of our method and explain its validity.


1. Single relational table (self-join operations supported as subquery): This assumption implies that the SQL we support is a subset of standard SQL. A recent detailed analysis [JohnsonNS18] reveals that of 8.1 million industrial SQL queries are against single relational tables with self-join; in real life people are more likely to inquire simple and structured data such as weather or stock prices, so it is fair to say the percentage of assumed types of queries is even higher in most practical NL-based applications. Our method is also capable of extending to more complex cases.

2. Column names (and corresponding types) of a table are provided only: In most circumstances for privacy concerns, values stored in the table should not be accessible by NLI providers, unlike a new work STAMP [Sun2018SemanticPW], where value/cell could be accessible. During domain transfer, in addition to the schema, a limited number of (NL, SQL) pairs on target domain are given for training.

3. Column/value information is explicitly mentioned: This assumption ensures that we could identify and match the columns and values against an NL query. We do not require the columns to match exactly their appearances in the table; different forms (like plurals or past tense), synonyms, and common usages are allowed. For examples,

are also dealt with by our method. In this paper, NL queries which are too implicit will not be our focus.

Domain-adaptive Semantic Parsing

We present a Structured-Query Inference Network (SQIN), by dividing the semantic parsing task into two stages:

(1) Tag the column names and value information against the NL input. Some existing works proposed to detect schema information and copy them directly to the output using an attention-copying mechanism [Jia2016DataRF, vinyals2015pointer]; however, intensive learning is still needed when moving to another domain. Here, we use a convolutional tagging network (CTN) to determine, for each token in the NL query, whether it is a column, a value of column, or nan. For example, suppose we have schema [‘country’,‘size’,‘population’] for the geography table and [‘name’,‘salary’,‘age’] for the employee table, then and will be tagged in the following forms:

(2) Convert the tagged NL query to SQL query, e.g. and will be converted to SQL formats:

where a FROM statement is omitted. In the end, the column tags are substituted by the column names in the schema, and the value tags substituted by the corresponding substrings from the input.

For more complex SQLs, to decode in a more structural-aware manner, both the (a) hierarchical and (b) compositional properties of SQL queries should be addressed accordingly. SQLNet [xu2017sqlnet] uses a Seq2Set model with a sketch to deal with the compositional nature of SQL, but it only works for simple types of SQL and requires a significant amount of human efforts to define and retrain a new sketch, making it hard to adapt to different types of SQL. Seq2Tree [Dong2016] tackles the hierarchical structure of SQL by using a hierarchical tree decoder, but it still requires the model to memorize different possible compositions of keywords. ASN [rabinovich2017abstract] incorporated both tree-like structure and recursive decoding, but the multi-module design could be significantly simplified. Therefore, we use a simple structured sequence-based parts-of-SQL (seq2PSQL) generation to capture both natures of SQL with the help of the tagged information from the NL query.

More details of the model design will be introduced in the later sections.

Augment Data on Target Domain

Jia and Liang (2016) [Jia2016DataRF] previously developed a recombination method for data augmentation; for example, their AbsEntity method replaces a value in the query with different values for the same column; their AbsWholePhrases method replaces a value with its column under certain conditions. However, it would appear that the model might be heavily biased by the small set of seed queries used to generate the query variations, which may cause the augmented set of NL queries simpler than the full scope of NL expression.

We propose an augmenting algorithm that goes beyond. Given two different NL queries, it is very difficult to hybridize parts of them and generate a new fluent NL text; however, it is simple to recombine two SQL queries, as SQL is based on strict grammar rules. Therefore, we propose to train another sequence-based model to generate a NL query given certain SQL, and the augmentation process is to generate the corresponding NL queries for recombined SQL queries. We adopt a generative adversarial network (GAN), by using a discriminator to classify whether the generated NL query resembles the human usage, and the result is used as reward to the generator.

More details of the model design will be introduced in the later sections.

Related Works

Seq2seq-based [sutskever2014sequence] models enable semantic parser training in an end-to-end manner without manual feature engineering. Besides common seq2seq framework [Jia2016DataRF, xiao2016sequence, zhong2017seq2sql], there are other sequence-based models with structural-aware decoders like Seq2Tree [Dong2016], SQLNet [xu2017sqlnet], EG [wang2018robust] and Abstract Syntax Networks [rabinovich2017abstract]. Due to the black-box nature of seq2seq, both Cheng et al. [cheng2017learning] and Coarse2fine [Dong2018coarse] proposed two-stage semantic parsers with the 1st stage mapping utterances to intermediate states, and 2nd stage converting intermediate states to logical forms. STAMP [Sun2018SemanticPW] realizes the importance of “linking” between the question and the table columns, and adopts a a switching gate in decoder and include value/cell information in SQL generation. A most recent work MQAN [mccann2018natural] designs a multi-pointer-generator decoder for the generation. As another line of works in deep-learning-based semantic parsing for relational tables, Neural Enquirer [YinLLK15] proposes a fully distributed end-to-end model where all components (query, table, answer) are stored and differentiable, and Neural Programmer [Neelakantan2016] defines a set of symbolic operators; these approaches lack of explicit interpretation and adaptability to different tables, and the input will be executed to generate an answer, instead of a structured query. Other progresses like Neural Symbolic Machine [liang2017neural] adopts memory for seq2seq model, but this will not be our focus in this work. Two cross-domain seq2seq approaches [su2017cross, herzig2017neural] are relevant, but both require a large amount of target-domain data to achieve a good domain adaptation. One recent work [xiong2018transfer] has made a good attempt to separate the schema information from the natural language query through annotation. As a future direction, DialSQL [Gur2018DialSQL] incorporates user feedbacks to enhance generation.

The idea of GANs [RadfordMC15, chen2016infogan, salimans2016improved] has recently enjoyed success in NLP fields [lamb2016professor, yu2017seqgan]. For example, a success application of GAN is used in Neural Dialogue Generation [li2017adversarial], where the generator is a RL-based seq2seq model, and the outputs from the discriminator are then used as rewards for the generator, pushing the system to generate dialogues that mostly resemble human usage.

Structural Query Inference Network (SQIN)

Figure 1: The working flow of SQIN. (a) CTN: For each token in

, we use its Glove (green) and char-n-gram (purple) embeddings as input; at each layer the conv op is followed by a

(yellow dot); the output of last layer multiplies with the embeddings from the schema through a bilinear matrix, followed by a softmax (red dot). The output are the column/value tags. (b) Seq2PSQL. The embeddings of each token and its tag are concatenated as the input of bi-encoder; with different starting tokens, the encoded hidden state generates different parts of SQL query , and <sub> hierarchically launches a subquery generation.

In this section, we tackle the problem of domain-adaptive semantic parsing. Given the NL query , and the schema of a relational table the query is against, our goal is to convert to corresponding SQL .

Convolutional Tagging Network (CTN)

As discussed in General Approaches, first we want to identify and tag the columns and values information against the NL input with a sequence of tags , i.e. for each token , we predict it with a tag denoting it as a column cj, value vk, or nan:


where and both .

One challenge for the tagging is that a column name or a value could possibly consist of multiple tokens, so the model should capture the feature of neighboring tokens as well; therefore, we use a convolutional model. Another challenge is to choose suitable embeddings to represent the tokens: to capture a token with both semantic and character-level accuracies, we use both GloVe embedding [pennington2014glove] and char-n-gram embedding [kim2016character]

, and regard them as two separate ‘channels’ of this token. For the embedding of a column name (multiple tokens considered), we take a bi-directional GRU to encode the two-channel word vector of each token


We use the multi-layer convolutional operations to process the NL query and assign tag for each token. For each conv layer, the input is the concatenation of consecutive -dimensional embeddings with channels, and the output is a -dimensional embeddings with channels, followed by a function; the convolution filter is with size of .

For the last layer, each output is multiplied with the embeddings of the schema (plus nan) through a bilinear matrix with size ; the result goes through a softmax function to return a probability vector; the index with the highest probability is related to one of the columns or nan. We call this model convolutional tagging network (CTN). Practically, we first use one CTN to tag column name against the NL input, and then add extra layers for values tagging.

Sequence-based Parts of SQL (seq2PSQL) generation

To encode the tagged NL query in the previous section, we use a sequence of embeddings , where as a concatenation of original token ’s GloVe vector and its tag ’s embeddings. The tag embedding itself is concatenated by three parts: (1) the embedding of tag type (column or value), (2) the embedding of index , which indicates the tag is either the th column, or a value of the th column, (3) the embedding of the value type (integer, string, etc). Tags that share similar attributes (like same tag type or same id) also share part of embeddings.

is taken as the input to a bidirectional multi-layer GRU encoder and encoded to a hidden representation. Since the encoder and decoder share the same vocabulary for the tags, we use the same embedding for tags on both sides, and synchronize the updates of this embedding during back-propagation.

The decoder adopts a uni-directional multi-layer GRU and generates SQL queries in a top-down manner:

(1) to address the compositional nature of SQL, we use different starting tokens (like <select>, <where>…) to generate different clauses of SQL; for each clause at each step, the output could be a column tag (c1), a value tag (v2), or a SQL functional word (like logical or aggregation operators); generation terminates with an ending token <eos>; the decoders for different clauses share the same set of parameters, and by doing so all possible SQL clauses are adapted in one universal setting.

(2) to address the hierarchical nature, we define a nonterminal <sub> token which indicates the onset of a subquery. If <sub> is predicted, a new set of clauses start to decode by conditioning on the nonterminal’s hidden vector. This process terminates when no more nonterminals are emitted.

In Fig. 1, we use an example from Overnight [Wang2015BuildingAS] to demonstrate the adaptive semantic parsing: a NL input is converted to a self-join SQL (supported as subquery).

Data Augmentation based on GAN

To augment the seed data, it is much easier to recombine SQLs and generate NL queries accordingly. The problem can be framed as follows: given a SQL query , the model needs to generate a NL query . We view the query generation as a sequence of actions taken according to a policy defined by a seq2seq model.


In this section, we describe the proposed AugmentGAN model in detail.

The adversarial paradigm is composed of a generator and a discriminator . The key idea is to encourage to generate NL query that are indistinguishable from human, using to provide reward for generation at each step. In detail, takes an attention-based seq2seq model [bahdanau2014neural] that generates a NL query step by step given SQL , and at each step, a partial query is generated and evaluated by ;

is updated through reinforcement learning.

is a binary classifier that takes a pair of SQL and NL queries (, ) as input, and encodes into vector representations with size using two bi-directional GRU encoders, respectively; then the two hidden vectors are combined through a bilinear matrix with size to give a vector, which is fed to a 2-class feed-forward network, returning the score of being machine-generated or human-generated.

Figure 2: The paradigm of AugmentGAN. Partially decoded sequence is passed into as reward for in the intermediate steps.

To calculate the score for the partial query at each step, we propose to use Monte Carlo Search [li2017adversarial, yu2017seqgan]: the model keeps sampling tokens from the distribution until the decoding finishes, and repeats (set to ) times; we use the mean score of times sampling being human-generated () as the reward to update the policy of for the next step (Fig. 2). The training objective is to maximize the expected reward of generated sequences based on policy gradient method [williams1992simple]:

where is the policy of , and

is the baseline function used to reduce the variance.

During the training, we also feed the human-generated query to the generator with a positive reward for model updates, which serves as a teacher intervene to give the generator more direct access to the gold-standard targets [lamb2016professor, li2017adversarial].

Experiments & Analyses

We present experiment results on both in-domain and domain-transfer tasks, and also analyze our models and compare with previous works.

Datasets and Implementation

We train and evaluate our models on GeoQuery [zettlemoyer2005learning], WikiSQL [zhong2017seq2sql], and Overnight (sub-domain Blocks excluded for not being a relational table) [Wang2015BuildingAS]. For data in GeoQuery and Overnight, we manually convert each original logical form to SQL query.

For GeoQuery and Overnight, we use the standard train-test splits as released, and randomly divide the train sets to splits for model cross validation ( for each train-valid cycle); the accuracies are calculated as the percentage of correct SQLs. For WikiSQL, we use the standard train-dev-test splits, and the accuracies are the percentage of correct logical forms (SQL queries).

We implement SQIN and AugmentGAN using Tensorflow, and train models using NVIDIA GPU card GTX-1080-Ti. For training, each iteration takes data with a batch size of 128, and the evaluation on development set happens for every 50 iterations. For

Overnight dataset, it usually requires iterations for the models to achieve the performance shown in this work.

In-domain Semantic Parsing

For both CTN and seq2PSQL, we use pre-trained GloVe vector [pennington2014glove] with dimension

; for out-of-vocabulary (OOV) tokens not covered by GloVe, we randomly generate a vector using Gaussian distribution (with inferred element-wise mean and variance). The char-n-gram embeddings we use in CTN are pre-generated with

[kim2016character]. The tag embedding is concatenated by three parts as discussed in previous section: (a) tag type, (b) id, and (c) value type, which are with dimension , respectively, and randomly initialized using uniform scaling initializer .

center Seq-based Models Geo. Overn. WkSQL seq2seq w/o RCB (Jia, 2016) Seq2Tree (Dong, 2016) Seq2SQL (Zhong 2017) ASN (Rabinovich, 2017) SQLNet (Xu, 2017) - - STAMP w/o cell (Sun, 2018) - - Coarse2Fine (Dong, 2018) - Execution-Guided (Wang, 2018) - - MQAN unordered (McCann, 2018) - - seq2PSQL CTN + seq2seq SQIN

Table 1: Test accuracies of SQLs on different datasets. Overnight subdomain Blocks is excluded. The seq2seq model [Jia2016DataRF] is without augmentation, and the STAMP model [Sun2018SemanticPW] is without cell information.

center select article having count (venue) 2 Ground Truth article that has maximum two venues AugmentGAN article with at most 2 venues Recomb. [Jia2016DataRF] article whose venues are at most two select author where article = (select article where (publish date) = 2004) Ground Truth authors of articles published in 2004 AugmentGAN authors published articles in 2004 Recomb. [Jia2016DataRF] authors whose articles published date is 2004 select restaurant where cuisine = thai and takeout = True Ground Truth thai restaurants that have takeout AugmentGAN restaurant has thai cuisine and takeout Recomb. [Jia2016DataRF] restaurant that cuisine is thai and has takeout select meal where restaurant = (select restaurant where star = 3) Ground Truth what is a meal served at a three star rated restaurant AugmentGAN meal served at 3 star restaurant Recomb. [Jia2016DataRF] meal that restaurant whose star rating is 3

Table 2: Examples of NL queries generated by AugmentGAN and recombination [Jia2016DataRF], compared with ground truth composed by crowdsourcing [Wang2015BuildingAS].

We first train a two-layer CTN for column tagging, and then value tagging is based on the pretrained two-layer column CTN with one extra layer on top; by doing so the value alignment during value tagging is improved, given that the pretrained two layers provide important information related to columns. For both encoder and decoder in seq2PSQL, we use 2-hidden-layer GRU cells [chung2014empirical] and hidden states with size ; dropout for both encoder and decoder is applied during training with keep-rate for input and for output. The decoder is based on beam-search with a beam size of .

We conduct ablation analysis (CTN and seq2PSQL) to demonstrate the performance of SQIN, and compare with previous works on in-domain tasks. From Table 1, our model exhibits a better performance on all three datasets: seq2PSQL alone without a CTN demonstrates a better structural-aware decoder; to demonstrate the performance of CTN, we evaluate both SQIN and a combined model CTN+seq2seq, which feeds the tagged input into a seq2seq model [Jia2016DataRF]; CTN significantly enhances the performance of seq2seq by separating the schema-related information from the NL inputs, and with a better structural-aware decoder (seq2PSQL), SQIN shows a state-of-the-art performance for in-domain semantic parsing.

Data Augmentation and Evaluation

The generator is first pre-trained by predicting the NL queries given the SQLs based on maximum-likelihood-estimation (MLE) loss. The discriminator is also pre-trained: half of the negative examples we use are partial NL queries with incomplete information of corresponding SQLs; a quarter of the negative examples are complete NL queries with sequence being randomly permutated; the other quarter is generated from sampling.

In Table. 2 we show several generated examples in Overnight by AugmentGAN and recombination method [Jia2016DataRF], with comparison to the ground truths, which are originally composed through crowdsourcing [Wang2015BuildingAS]. From Table. 2, the examples generated by recombination have more strict rules, whereas examples by AugmentGAN are more flexible in both sentence structure and words selection, thereby more resemble to human usage.

center Subdomain GAN Tie Ground truth Restaurants 8 28 64 Publication 7 24 69

Table 3: Crowd-sourced evaluation of randomly selected (GAN-generated, ground truth) NL pairs on each of two Overnight subdomains. The values represent the number of examples picked to be more human-like usage.

In order to qualitatively evaluate how good the NL queries are generated from AugmentGAN, we employ crowd-sourced judges to evaluate 100 randomly sampling pairs of human and GAN-generated queries. For each pair of queries, we ask 3 judges to decide which one is better, with tie allowed. A small set of pairs is used to validate the quality of the crowd-sourced annotations, where only one annotator passes the validation set with a correctness could his annotations be accepted.

In Table. 3 we show the crowd-sourced evaluation of the GAN-generated NL queries versus the ground truth in two Overnight subdomains; around of the generations resemble (or better than) human usage. For industrial purpose, even if the AugmentGAN cannot generate human-like queries, an extra selection step could be added; for most circumstances, the action of selecting is significantly less time-consuming than human directly composing or paraphrasing a NL query. Therefore, AugmentGAN exhibits both academic and practical impacts.

center Domain-Transfer Approaches seq2seq [su2017cross] seq2PSQL CTN + seq2seq SQIN In-domain Plain transfer Limited target-domain data Limited target-domain data + ReComb. [Jia2016DataRF] Limited target-domain data + GAN Massive target-domain data

Table 4: Domain Adaptation Evaluation on Overnight (Blocks excluded). We compare (vanilla seq2seq, seq2PSQL, CTN + seq2seq, SQIN) on five domain-transfer approaches.

Domain Adaptation and Evaluation

We evaluate how well our models (SQIN, seq2PSQL without tagging, and CTN + seq2seq) could leverage on the learning from source-domain data to generate SQL queries against target domain in dataset Overnight, compared with vanilla seq2seq (Table 4).

(1) in-domain setting refers to the model both trained and tested on the target domain; (2) plain transfer setting directly applies trained model on target-domain, where the source tables are all the Overnight subdomains except the target domain; (3) massive target-domain data uses sufficient amount of target-domain data to fine tune the model; (4) limited target-domain data uses randomly select of the target-domain data to fine tune the model; (5) limited target-domain data + GAN refers to transfer with limited target-domain data with augmentation by GAN; (6) limited target-domain data + ReComb. refers to transfer with limited target-domain data with augmentation by recombination [Jia2016DataRF]. For approaches with data augmentation, the size of augmented data is a factor of of seed data.

From Table 4, massive performs better than in-domain, which illustrate the out-of-domain information could enhance learning [su2017cross]. Both models with schema tagging (SQIN and CTN + seq2seq) have better results than their non-tagging counterparts, as well as demonstrate a better plain transfer performance, showing that the separation of schema information selectively enhances learning of general NL knowledge and provides a better domain adaptability.

For approaches using augmentation, AugmentGAN is more effective than recombination [Jia2016DataRF], by showing a better performance for all models. Using AugmentGAN, the accuracies for all models are higher than those of in-domain setting, and even close to massive

setting, showing that even with limited target-domain data, good domain adaptation is still achieved. One interesting thing to note: for the cases using AugmentGAN, even though there are two thirds less human-like queries in the training data, with the help of schema tagging, the models are still able to generate high-accuracy SQLs, which implies that a satisfactory transfer learning doesn’t require completely human-resemble augmentation.

Figure 3: SQL accuracies on Publication, transferring from different number of source domains, with plain transfer (pink) and limited.+ GAN (purple) approaches.

In the end, we evaluate how many source domains are needed for the model to generate correct SQL on target domain: we use different number of Overnight subdomains as the source tables, and subdomain Publication as the target table, and calculate the SQL generation accuracies for both plain transfer and limited target domain data + GAN approaches.

From Fig. 3, there are two observations: (1) if the source tables do not fully cover all possible queries types on target table, fine-tuning from target domain data is necessary to achieve a better saturation performance: i.e. self-join type of queries are included in subdomain Publication but not in other subdomains; (2) for a model previously trained on a sufficient number of source tables ( in this case), it is enough to feed a small amount of target domain data (+GAN) to achieve a good domain adaptation, a promising technique that could save resources and man-power when adapting to a new table.

Conclusion & Perspective

As one of our main insights, we developed SQIN to separate schema-related information from the NL inputs, which enhances the learning of sequence-based models on general NL knowledge from source domains, thereby improving the in-domain performance and cross-domain adaptability. Based on recombining a formal language (SQL) and correspondingly generating NL texts, we develop an effective GAN-based data augmentation algorithm, which could significantly reduce the human effort for composing data. Our extensive experiments demonstrate the advantage of our approaches. Our extensive experimental analyses demonstrate the effectiveness of our approach on standard datasets. Future work could be to extend to other types of structured data by combing syntax-directed generation [dai2018syntaxdir].