is a widely-used type of data source on the web, which has a formal structure and contains valuable information. Understanding the meaning of a table and describing its content is an important problem in artificial intelligence, with potential applications like question answering, building conversational agents and supporting search engines.[Pasupat and Liang2015, Sun et al.2016, Yin et al.2015, Jauhar, Turney, and Hovy2016, Konstas and Lapata2013, Li et al.2016, Yan et al.2016]
. In this paper, we focus on the task of table-to-text generation. The goal is to automatically describe a table region (e.g., a row) with natural language.
The task of table-to-text could be used to support many applications, such as search engines and conversational agents. On one hand, the task could be used to generate descriptive sentences for the structured tables on the web. Current search engines could serve structured tables as answers by regarding the generated sentences as keys and tables as values. On the other hand, tables could also be used as responses for conversational agents such as the intents of ticket booking and production comparison. However, it is impractical for a conversational agent to read a table of multiple rows and columns on a smart-phone. Table-to-text technology could transform the data into natural language sentences which could be sent back to users with utterances or voice via text-to-speech transformation.
The task of table-to-text generation has three challenges. The first one is how to learn a good representation of a table. A table has underlying structures such as attributes, cells and a caption. Understanding the meaning of a table is the foundation of the following steps for table-to-text generation. The second challenge is how to automatically generate a natural language sentence, which is not only fluent but also closely relevant to the meaning of the table. The third challenge is how to effectively use the informative words from a table which are typically of low-frequency, such as name entities and numbers, to generate a sentence.
To address the aforementioned challenges, we introduce a neural network model that takes a row from a table and generates a natural language sentence describing that row. The backbone of our approach is the encoder-decoder framework, which has been successfully applied in many tasks including machine translation[Kalchbrenner and Blunsom2013, Sutskever, Vinyals, and Le2014, Bahdanau, Cho, and Bengio2014] and dialogue generation [Sordoni et al.2015]. In the encoder part, the model leverages table structures to represent a row as a continuous vector. In the decoder part, we develop a powerful copying mechanism that is capable of generating rare words from table cells, attributes and caption. The entire model can be conventionally trained in an end-to-end fashion with back-propagation. Furthermore, we introduce an open-domain dataset, WIKITABLETEXT, including 13,318 explanatory sentences for 4,962 tables.
We conduct experiments on three datasets to verify the effectiveness of the proposed approach. On WIKITABLETEXT, our approach achieves a BLEU-4 score of 38.23, substantially better than template-based and neural language model based approaches by an order of magnitude. Thorough model analysis shows that our copying mechanism not only dramatically boosts performance, but also has the ability to selectively replicate appropriate contents from a table to the output sentence. On two synthetic datasets, WIKIBIO and SIMPLEQUESTIONS, our approach improves the state-of-the-art BLEU-4 score from 34.7 to 40.26 and from 33.32 to 39.12, respectively.
This work makes the following contributions. We present a neural network approach for table-to-text generation which effectively uses the structure of a table. We introduce a powerful copying mechanism that is capable of generating rare words from a table. We release an open-domain dataset WIKITABLETEXT, and hope that it can offer opportunities to further research in this area.
Task Formalization and Dataset
We first formulate the task of table-to-text generation. Afterwards, we present the construction of an open-domain dataset WIKITABLETEXT, which includes sentences that describe table regions.
A table is defined as a tuple , where includes attributes (column headers) of the table. includes cells of the table, where is the number of columns, is the number of rows, is the cell where the column and row interacts. is typically a natural language explanatory about the entire table.
We formulate the task of table-to-text generation as follows. Given a region of a table as input, the task is to output a natural language sentence to describe the selected table region. In this work, we restrict the table region as a row, and leave a large region like multiple rows or entire table to future work. Besides, information (cells) in the row can be selectively used to generate a sentence. Figure 1 gives an example that illustrates the task. Given a selected row, which is highlighted in orange, the goal is to output a descriptive sentence “Singapore Armed forces was the champion of Singapore Cup in 1997.”. In this case, only the information from the first two columns are used.
It is worth noting that, we deal with regular tables in this work and leave the irregular tables to future work. We regard a table as a regular one if it does not contain merged attributes or merged cells, and the number of cells in each row is equal to the number of attributes.
We describe the construction of WIKITABLETEXT. We crawl tables from Wikipedia, and randomly select 5,000 regular tables, each of which has at least 3 rows and 2 columns. For each table, we randomly select three rows, resulting in 15,000 rows that are further used for manual annotation. Each annotator is given a selected row, the corresponding attributes and the caption. We require that rows from the same table are labeled by different annotators. If a table does not contain a caption, we use its page title instead. Each annotator is asked to write a sentence to describe at least two cells from a table, but not required to cover every cell. For example, the sentence in Figure 1 does not use the “Runner-up” column. In addition, we also ask annotators not to search the meaning of a table from the web, as we would like to ensure that external knowledge is not used. This makes the dataset more suitable for the real scenario. To increase the diversity of the generated language, we assign different rows from the same table to different annotators. If a row is hard to be described, we ask the annotator to write “It’s-hard-to-annotate”. Finally, we get 13,318 row-text pairs. Statistics are given in Table 1. We randomly split the entire dataset into training (10,000), development (1,318), and test (2,000) sets.
|Number of tables||4,962|
|Number of sentences||13,318|
|Avg #sentences per table||2.68|
|Avg #words per sentence||13.91|
|Avg / Min / Max #words per caption||3.55 / 1 / 14|
|Avg / Min / Max #cells per sentence||3.13 / 2 / 9|
|Avg / Min / Max #columns per table||4.02 / 2 / 10|
|Avg / Min / Max #rows per table||7.95 / 3 / 19|
To the best of our knowledge, WIKITABLETEXT is the first open-domain dataset for table-to-text generation. It differs from WEATHERGOV [Liang, Jordan, and Klein2009] and ROBOCUP [Chen and Mooney2008] in that the schemes are not restricted to a specific domain, such as weather forecasting and RoboCup sportscasting. We believe that WIKITABLETEXT brings more challenges and might be more useful in real world applications. We are aware that WIKITABLEQUESTIONS [Pasupat and Liang2015] is a widely used dataset for table-based question answering task which takes a question and a table as input and outputs an answer. However, we do not use this dataset in this work, because table-to-text generation task takes a row as input and outputs a sentence to describe the row, while a portion of questions in WIKITABLEQUESTIONS involve reasoning over multiple rows which does not satisfy the task constraint. We plan to handle multiple rows as input in future work. Our task is also closely related to infobox-to-biography generation on WIKIBIO [Lebret, Grangier, and Auli2016] and fact-to-question generation on SIMPLEQUESTIONS [Serban et al.2016]. Both infoboxes and knowledge base facts can be viewed as special cases of tables. Our task differs from them in that our input comes from a table with multiple rows and multiple columns. Moreover, our dataset differs from [Lebret, Grangier, and Auli2016] in that their dataset is restricted to biography domain, but our dataset is open-domain. We differ from [Serban et al.2016] in that their task generates questions, but our focus is to generate descriptive sentences.
Our approach is inspired by sequence-to-sequence (seq2seq) learning, which has been successfully applied in many language, speech and computer vision applications. The main idea of seq2seq learning is that it first encodes the meaning of a source sequence into a continuous vector by an encoder, and then decodes the vector to a target sequence with a decoder. In this section, we briefly introduce the neural network for seq2seq learning.
The goal of the encoder component is to represent a variable-length source sequence
as a fixed-length continuous vector. The encoder can be implemented with various neural architectures such as convolutional neural network (CNN)[Meng et al.2015, Gehring et al.2016]
and recurrent neural network (RNN)[Cho et al.2014, Sutskever, Vinyals, and Le2014]. Taking RNN as an example, it deals with a sequence by recursively transforming current words with the output vector in the previous step. It is formulated as where is a nonlinear function, and is the hidden vector at time step . The last hidden vector is usually used as the representation of the input sequence .
The decoder takes the output of the encoder, and outputs a target sequence . Typically, the decoder is implemented with RNN, which generates a word at each time step based on the representation of and the previously predicted word sequence . The process is formulated as follows,
where is a nonlinear function, and is the hidden state of RNN at time step . Since standard RNN suffers from the problem of gradient vanishing [Bengio, Simard, and Frasconi1994]
, existing studies usually use gated RNN units such as Long Short-Term Memory (LSTM)[Hochreiter and Schmidhuber1997]
or Gated Recurrent Unit (GRU)[Cho et al.2014] for .
We present a neural network approach for table-to-text generation in this section. The approach is inspired by advances of seq2seq learning in neural machine translation, and we extend it by taking into account the table structure. The intuition of our approach is to encode the meaning of a row from a table with an encoder, then generate a natural language sentence that describes the table with a decoder. Figure2 shows an illustration of the model.
Table Aware Encoder
We implement a simple yet effective neural network to represent a row. We use information from a list of cells in the row, a list of attributes and a caption. Specifically, we represent each cell as an embedding vector , and represent each attribute as a continuous vector . As attributes and cells carry table information from different granularities, we use different embedding matrices to model them. Afterwards, we get the representation of each column by concatenating and , and then add a linear layer followed by element-wise nonlinear function. As the majority of the captions have a small number of words, we regard a caption as a special cell and use a virtual attribute for it. These column vectors
are used to represent the input row, and will be further used in the decoder part for calculating the attention probability. In order to ensure that randomly exchanging two columns gets the same result, we simply use element-wise average over these column vectors, and use the result as the initial hidden stateof the decoder.
Table Aware Decoder
We develop a table aware decoder that generates a sentence to describe a row. The backbone of the decoder is a GRU based RNN, which works in a sequential way and generates one word at each time step. In order to take into account table structures and generate table-related descriptive sentences, we extend standard GRU-RNN in several ways, which will be detailed separately.
When generating a word at a time step, the decoder should have the ability to selectively use important information from the table. Furthermore, a descriptive sentence rarely uses the content of a cell more than once. An example is given in Figure 2, where the decoder pays more attention to the cell “2008” at the second time step, and uses more information from the cell “China” at the 8-th time step.
Based on the considerations mentioned above, we adopt an attention mechanism, which assigns a probability/weight to each column at one time step. Specifically, the attention weight of the -th column at time step is calculated as
where is the vector representation of the -th column, is the hidden state of the standard GRU decoder, is a non-linear function that computes the importance of , which will be further normalized with a function. The difference between this kind of attention mechanism with the well-known attention mechanisms [Bahdanau, Cho, and Bengio2014, Luong, Pham, and Manning2015] is the incorporation of the attention results in previous time step. Ideally, the model should have the ability to remember which columns have been used before, and not to use previously attended columns in the following time steps. Similar techniques have been successfully applied in many tasks [Chorowski et al.2015, Tu et al.2016, Meng et al.2016, Feng et al.2016].
Afterwards, the attention weights are used to calculate a context vector , which will further influence the hidden state at each time step .
Global and Local Information.
In order to increase the relevance between the generated sentence and the table, we further incorporate two types of table information into the decoder. The first one is the global information about the entire table. The intuition is that the decoder should use different implicit patterns to deal with tables about different topics, such as “Olympic game” and “music album”. As the output of the encoder captures information about the entire table, we set the global information as which is a part of the input of the target function. Similarly, we use local information to remember the relation between the table and the generated word in the last time step. Specifically, we regard the corresponding attribute of as the local information, and feed its embedding as another part of the input of the target . We use a special symbol unk_a to represent the attribute of if does not comes from the table.
We observe that cells in a table typically include informative but low-frequency words, such as named entities and numbers. These words are important to represent the meaning of a table, so that should be appropriately used to generate a meaningful sentence. In the standard RNN decoder, a word is generated from a function, which is calculated over a word vocabulary of fixed length, such as the 20K most frequent words in the training data. In this setting, rare words from table cannot be well covered in the vocabulary, so that they could not be generated in the predicted sequence. Enlarging vocabulary size is not a principal way to solve this problem because it could not handle the rare words not seen in the training data.
Inspired by the recent success of copying mechanism in seq2seq learning [Gulcehre et al.2016, Gu et al.2016, Yin et al.2015, Luong et al.2015, Nallapati et al.2016], we develop a flexible copying mechanism that is capable of copying contents from table cells. The intuition is that a word is generated either from target vocabulary via or from a table cell via the copying mechanism. Specifically, we use a neural gate to trade-off between generating from target vocabulary and copying from the table cells.
is sigmoid function. At time step, the probability of copying a word from table cells is , where is the column index of , and comes from the attention module. The probability of generating a word from target vocabulary is , where is calculated with the target . If a word is covered by both the target vocabulary and the table cells, its probability is the weighted sum of both sides.
The advantage of this copying mechanism is that during training it does not have a preference for copying from table cells or generating from target vocabulary [Gulcehre et al.2016]. This property makes the model more flexible and could be conventionally trained and tested in the same way.
We name our model that copies contents from cells as Table2Seq. We also go one step further and extend the copying mechanism to make it also copy words from attributes. In the decoder part, we double the hidden length of encoder hidden states by adding the attribute embeddings, and regard the attention value of each attribute embedding as the evidence to copy from that attribute. This attribute enhanced model is abbreviated as Table2Seq++.
Training and Inference
The model is trained in an end-to-end fashion using back-propagation under the objective
where is the training set. In the inference process, we use beam search to generate the top- confident results where is the beam size.
In this section, we describe experiment settings, and give the results and analysis on three datasets including WIKITABLETEXT, WIKIBIO and SIMPLEQUESTIONS.
We use the same experiment setting for these three datasets. We randomly initialize the parameters in our model with a Gaussian distribution, set the dimension of the word/attribute embedding as 300, and set the dimension of the decoder hidden state as 500. We adopt Ada-delta[Zeiler2012]
to adapt the learning rate. A dev set is used to half the learning rate when the performance on the dev set does not improve for 6 continuous epoches. We update parameters in an end-to-end fashion using back-propagation. In the inference process, we use beam search and set the beam size as 5. We use BLEU-4[Papineni et al.2002]
score as our evaluation metric. BLEU-4 score is widely used for natural language generation tasks, such as machine translation, question generation, and dialogue response generation. Each instance in these datasets has only one reference sentence.
Table-to-Text Generation with WIkiTAbleTExt
We conduct experiments on WIKITABLETEXT. We treat a caption as a special cell and use a virtual attribute “caption” for it. On average, a caption have 3.55 words.
We implement a table conditional neural language model (TC-NLM) baseline, which is based on a recurrent neural network language model [Mikolov et al.2010]. We feed the model with local and global factors to make it also consider table information. We implement a random-copying baseline which does not use the copying mechanism but replaces the unk with a randomly selected cell from the table. We also implement a template-based baseline. During the training procedure, we construct a list of templates ranked by the frequency for each table scheme. The templates are derived by replacing the cells appearing in the text with the corresponding attributes (slots). In the inference process, we select the template with the highest frequency given a table scheme, and generate text by replacing the attributes (slots) with the corresponding values.
|Table2Seq w/o Caption||26.21||27.06|
|Table2Seq w/o Copying||4.78||5.41|
|Table2Seq w/o Global||34.82||36.68|
|Table2Seq w/o Local||34.08||36.50|
Table 2 shows the experiment results of our model with different settings. The experiment results indicate that our method outperforms the baselines by an order of magnitude. We remove each component of our model at a time, and show the influence of each part. We find that copying mechanism is the most effective factor. Removing the copying mechanism gets down the performance from 37.90 to 5.41. Randomly copying values also performs bad. It is reasonable because many words of the description come from table cells which are rare words and need to be copied.
|Ref: abe forsythe acted as ned kelly in movie ned in 2003 .|
|T2S: abe forsythe acted as ned kelly in the ned in 2003 .|
|T2S w/o Copying: unk played in unk in 2003 .|
Table 3 shows generated sentences by using copying mechanism or not. Caption is the second important factor, which is usually an entity containing important information for generating the description. On this dataset, the global and local factors do not have a big influence on the results. In addition, contents of table attributes are also useful for describing a table. We make a simple but effective adaptation and get an enhanced model, Table2Seq++, which can copy from both the cells and attributes. The results show that copying from both cells and attributes also brings improvements. Table 4 are examples for copying attributes or not.
|Ref: the capacity of cyclone triple was triple .|
|T2S++: the capacity of cyclone triple was triple .|
|T2S: the unk of cyclone triple was triple .|
|(1)||(Reference) herman candries was the chairman of kv mechelen in 1977-82 .|
|(Generated) herman candries was the president of kv mechelen during 1977-82 .|
|(2)||(Reference) stoycho mladenov got 21 goals in the game in 1978 .|
|(Generated) stoycho mladenov got 21 goals in the pfc beroe stara zagora in 1978 .|
|(3)||(Reference) c. narayanareddy won the lifetime contribution award in cinemaa awards in 2010 .|
|(Generated) c. narayanareddy was the champion of cinemaa awards in 2010 .|
|(4)||(Reference) the winner of freedom cup in 2011 was new zealand .|
|(Generated) new zealand was the winner of 1-1 in 2011 .|
|(5)||(Reference) nicholas rowe acted in movie girl on a cycle in 2003 .|
|(Generated) nicholas rowe ( actor ) released girl on a cycle in 2003 .|
|(6)||(Reference) earl w. bascom won the grand marshal in 1984 .|
|(Generated) earl w. bascom won the grand marshal in the grand marshal in 1984 .|
Since the experiment results indicate that the copying mechanism is the most effective component, we further analyze the reason why it has such strong an impact on the results. Specifically, as the attention probabilities are the main part of the copying mechanism, we visualize the attention distributions of an instance in Figure 3, which shows probabilities for each table cell to be copied at each time step . In this example, the reference sentence is shown in Table 3, which contains many table cells that are usually rare words. Our model generates a meaningful and fluent sentence which is almost the same as the reference. The underlined words in the generated sentence are copied from the table cells. From the visualization we can see that, for each underlined word at time step , the corresponding table cell it comes from has the highest probability of being copied. This indicates that our model has the ability to properly copy rare words from the table contents. In addition, we find that the learned gate has the ability of automatically deciding to copy from table contents or generate from target vocabulary. For example, the word “acted” at time step is generated even though the table cell “Ned Kelly” has a high probability of being copied.
In the test process, an attribute that is not seen in the training corpus is represented as a special unk_a. In this part, we conduct experiments to explore the effects of unk_a to the performance of our model. This reveals the ability of our model to deal with the unseen attributes. Therefore, we respectively split the dev and test sets into four portions based on the number (0, 1, 2, or 3) of unseen attributes in a table. Each portion of the dev (test) set contains 97.34%, 2.20%, 0.38% and 0.08% (97.4%, 2.00%, 0.50% and 0.10%) instances. From Figure 5, we find that the BLEU-4 score decreases with the increasing of the number of unseen attributes in a table. How to effectively deal with unk_a is left for future work.
Case Study and Discussion.
We sample 100 examples from the validation set and perform a case study to analyze the advantages of our approach and present challenges of the task. Different groups of examples are shown in Figure 4.
From the results in groups 1, 2 and 3, we find that our model could generate fluent and table related sentences. In group 1, the model generates a sentence with the same meaning as the reference, but uses different expressions. In groups 2 and 3, our model generates sentences with more or less information than the references as we do not restrict which columns are used. For example, the table cell “pfc before stara zagora” in group 2 as additional information is also generated, while table cell “lifetime contribution award” in group 3 is skipped. Although the generated sentences from groups 1 to 3 are meaningful and fluent, they may lower the BLEU-4 score.
From groups 4 to 6, we find that there still exist challenges in table-to-text generation task. Sometimes our model copies a cell from a wrong column as the meaning of the columns are not well understood. For example in group 4, the model copies “1-1” in column “Matched_Won” but not “freedom cup” in the “Caption”. Similarly, some generated sentences use wrong words to describe the relation of two table cells. For example in group 5, the word “release” does not correctly describe the relation between cell “nicholas rowe ( actor )” and “girl on a cycle”. Incorporating more background knowledge, such as a knowledge base or table surrounding text, is a potential solution to this problem. Even though we use recurrent attention to avoid over generating duplicate words, there still exist a few instances with the problem of duplication, e.g., group 6.
Infobox-to-Biography Generation with WIkiBIo
WIKIBIO is introduced by [Lebret, Grangier, and Auli2016] for generating biography to describe an infobox. An infobox can be viewed as a table with one row. They extract the first sentence of each biography article as reference. On average, each reference sentence has 26.1 words. The corpus contains 728,321 instances, which has been divided into three sub-parts to provide 582,659 for training, 72,831 for validation and 72,831 for testing. In our experiments, we only use 2,000 instances for validation. A characteristic of this dataset is that, each cell of an infobox often contains multiple words. Therefore, as suggested by [Lebret, Grangier, and Auli2016], we split one cell into multiple cells where each new cell contains only one word.
|Table2Seq w/o Copying||37.28||36.88|
|Table2Seq w/o Global||40.69||40.11|
|Table2Seq w/o Local||40.43||40.03|
Results and Analysis.
Table 5 shows the experiment results on WIKIBIO. We choose methods introduced by [Lebret, Grangier, and Auli2016] as our baselines, where the “Table NLM” method is considered state-of-the-arts. The results show that our Table2Seq method outperforms the baselines. As we observe that attributes are usually not expressed in a biography, we do not use Table2Seq++. We remove each factor at a time and find that the results have the same trend as previous results on WIKITABLETEXT. On this dataset, removing the copying mechanism does not dramatically decrease performance. We think this is because the average sentence length on WIKIBIO is much longer than WIKITABLETEXT (), so that the rare words have relatively little influence on the results. In addition, our model without a copying mechanism performs better than “Table NLM”. We believe the gain comes from the GRU-based decoder with recurrent attention.
Fact-to-Question Generation with SImpleQUestions
We follow [Serban et al.2016] to generate questions from knowledge base (KB) facts on SIMPLEQUESTIONS [Bordes et al.2015]. A fact in KB is a triple containing a subject, a predicate and an object. We regard a fact as a table with two attributes and a row with two cells. SIMPLEQUESTIONS contains 108,442 fact-question-answer tuples. The dataset is split into three parts: 75,910 for training, 10,845 for validation, and 20,687 for test.
|SP Triples TransE++||-||33.32|
|MP Triples TransE++||-||33.28|
|Table2Seq w/o Copying||14.69||14.70|
|Table2Seq w/o Global||40.34||39.12|
|Table2Seq w/o Local||39.11||38.32|
Results and Analysis.
We use the methods for fact-to-question generation reported by [Serban et al.2016] on SIMPLEQUESTIONS as our baselines. The baselines first generate a question template with a place holder, then replace the place holder with the subject as post processing. Different from their methods, our model automatically learns where to copy from. Table 6 shows the experiment results of the baselines and our model which is exactly the same as the one used in previous experiments. The results indicate that our method outperforms the baselines by a wide margin. As predicates do not appear in questions, we do not evaluate the results on Table2Seq++. Furthermore, the results verify that our model has the ability to generate natural language questions from KB facts if it has rich training datasets.
Our work relates to existing studies that generate natural language sentence from structured data [Barzilay and Lapata2005, Reiter et al.2005, Angeli, Liang, and Klein2010, Konstas and Lapata2012, Androutsopoulos, Lampouras, and Galanis2013]. Recently, [Wiseman, Shieber, and Rush2017] conduct experiments on generating long text from tables and show challenges in data-to-text generation task. This work differs from existing studies in that the structured data in this work is a web table. In [Angeli, Liang, and Klein2010] and [Mei, Bansal, and Walter2015], the structured data is a scenario consisting of a set of database records. In [Lebret, Grangier, and Auli2016], the structured data is an infobox consisting of fields and field values. In [Serban et al.2016], the structured data is a fact consisting of a subject, predicate, and object from Freebase. This work differs from existing neural network based methods in that our approach is an encoder-decoder approach with attention and copying mechanisms. [Mei, Bansal, and Walter2015] and [Serban et al.2016] also adopt neural encoder-decoder approaches , while they do not leverage copying mechanism to replicate source contents in output. The approach of [Lebret, Grangier, and Auli2016] is based on a conditional language model to generate biographies.
We present an encoder-decoder approach for generating a sentence to describe a table row. The model extents the encoder-decoder framework by integrating the semantics of a table. A flexible copying mechanism is developed to replicate contents from cells, attributes, and captions in the output sentence. We release an open-domain dataset WIKITABLETEXT for table-to-text generation, and conduct extensive experiments on WIKITABLETEXT and two synthetic datasets. Experiment results demonstrate the accuracy of our approach and the power of our copying mechanism.
The research work is supported by the National Key Research and Development Program of China under Grant No.2017YFB1002102. We thank Zhirui Zhang for the valuable discussion on building the model. We also thank all the reviewers for their valuable comments.
- [Androutsopoulos, Lampouras, and Galanis2013] Androutsopoulos, I.; Lampouras, G.; and Galanis, D. 2013. Generating natural language descriptions from owl ontologies: the naturalowl system. Journal of Artificial Intelligence Research 48:671–715.
- [Angeli, Liang, and Klein2010] Angeli, G.; Liang, P.; and Klein, D. 2010. A simple domain-independent probabilistic approach to generation. In EMNLP, 502–512.
- [Bahdanau, Cho, and Bengio2014] Bahdanau, D.; Cho, K.; and Bengio, Y. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- [Barzilay and Lapata2005] Barzilay, R., and Lapata, M. 2005. Collective content selection for concept-to-text generation. In EMNLP, 331–338.
- [Bengio, Simard, and Frasconi1994] Bengio, Y.; Simard, P.; and Frasconi, P. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks 5(2):157–166.
- [Bordes et al.2015] Bordes, A.; Usunier, N.; Chopra, S.; and Weston, J. 2015. Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075.
- [Chen and Mooney2008] Chen, D. L., and Mooney, R. J. 2008. Learning to sportscast: a test of grounded language acquisition. In ICML, 128–135.
- [Cho et al.2014] Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In EMNLP, 1724–1734.
- [Chorowski et al.2015] Chorowski, J. K.; Bahdanau, D.; Serdyuk, D.; Cho, K.; and Bengio, Y. 2015. Attention-based models for speech recognition. In NIPS, 577–585.
[Feng et al.2016]
Feng, S.; Liu, S.; Yang, N.; Li, M.; Zhou, M.; and Zhu, K. Q.
Improving attention modeling with implicit distortion and fertility for machine translation.In COLING, 3082–3092.
- [Gehring et al.2016] Gehring, J.; Auli, M.; Grangier, D.; and Dauphin, Y. N. 2016. A convolutional encoder model for neural machine translation. arXiv preprint arXiv:1611.02344.
- [Gu et al.2016] Gu, J.; Lu, Z.; Li, H.; and Li, V. O. 2016. Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393.
- [Gulcehre et al.2016] Gulcehre, C.; Ahn, S.; Nallapati, R.; Zhou, B.; and Bengio, Y. 2016. Pointing the unknown words. arXiv preprint arXiv:1603.08148.
- [Hochreiter and Schmidhuber1997] Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural computation 9(8):1735–1780.
- [Jauhar, Turney, and Hovy2016] Jauhar, S. K.; Turney, P. D.; and Hovy, E. 2016. Tables as semi-structured knowledge for question answering. In ACL, volume 1, 474–483.
- [Kalchbrenner and Blunsom2013] Kalchbrenner, N., and Blunsom, P. 2013. Recurrent continuous translation models. In EMNLP, volume 3, 413.
- [Konstas and Lapata2012] Konstas, I., and Lapata, M. 2012. Concept-to-text generation via discriminative reranking. In ACL, 369–378. Association for Computational Linguistics.
- [Konstas and Lapata2013] Konstas, I., and Lapata, M. 2013. A global model for concept-to-text generation. J. Artif. Intell. Res.(JAIR) 48:305–346.
- [Lebret, Grangier, and Auli2016] Lebret, R.; Grangier, D.; and Auli, M. 2016. Neural text generation from structured data with application to the biography domain. In EMNLP, 1203–1213.
[Li et al.2016]
Li, J.; Monroe, W.; Ritter, A.; Jurafsky, D.; Galley, M.; and Gao, J.
Deep reinforcement learning for dialogue generation.In EMNLP, 1192–1202.
- [Liang, Jordan, and Klein2009] Liang, P.; Jordan, M. I.; and Klein, D. 2009. Learning semantic correspondences with less supervision. In ACL, 91–99.
- [Luong et al.2015] Luong, T.; Sutskever, I.; Le, Q.; Vinyals, O.; and Zaremba, W. 2015. Addressing the rare word problem in neural machine translation. In ACL, 11–19.
- [Luong, Pham, and Manning2015] Luong, M.-T.; Pham, H.; and Manning, C. D. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
- [Mei, Bansal, and Walter2015] Mei, H.; Bansal, M.; and Walter, M. R. 2015. What to talk about and how? selective generation using lstms with coarse-to-fine alignment. arXiv preprint arXiv:1509.00838.
- [Meng et al.2015] Meng, F.; Lu, Z.; Wang, M.; Li, H.; Jiang, W.; and Liu, Q. 2015. Encoding source language with convolutional neural network for machine translation. arXiv preprint arXiv:1503.01838.
- [Meng et al.2016] Meng, F.; Lu, Z.; Li, H.; and Liu, Q. 2016. Interactive attention for neural machine translation. arXiv preprint arXiv:1610.05011.
- [Mikolov et al.2010] Mikolov, T.; Karafiát, M.; Burget, L.; Cernockỳ, J.; and Khudanpur, S. 2010. Recurrent neural network based language model. In Interspeech, volume 2, 3.
- [Nallapati et al.2016] Nallapati, R.; Zhou, B.; Gulcehre, C.; Xiang, B.; et al. 2016. Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023.
- [Papineni et al.2002] Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002. Bleu: a method for automatic evaluation of machine translation. In ACL, 311–318.
- [Pasupat and Liang2015] Pasupat, P., and Liang, P. 2015. Compositional semantic parsing on semi-structured tables. arXiv preprint arXiv:1508.00305.
- [Reiter et al.2005] Reiter, E.; Sripada, S.; Hunter, J.; Yu, J.; and Davy, I. 2005. Choosing words in computer-generated weather forecasts. Artificial Intelligence 167(1-2):137–169.
- [Serban et al.2016] Serban, I. V.; García-Durán, A.; Gulcehre, C.; Ahn, S.; Chandar, S.; Courville, A.; and Bengio, Y. 2016. Generating factoid questions with recurrent neural networks: The 30m factoid question-answer corpus. In ACL, 588–598.
- [Sordoni et al.2015] Sordoni, A.; Galley, M.; Auli, M.; Brockett, C.; Ji, Y.; Mitchell, M.; Nie, J.-Y.; Gao, J.; and Dolan, B. 2015. A neural network approach to context-sensitive generation of conversational responses. In NAACL: Human Language Technologies, 196–205.
- [Sun et al.2016] Sun, H.; Ma, H.; He, X.; Yih, W.-t.; Su, Y.; and Yan, X. 2016. Table cell search for question answering. In WWW, 771–782.
- [Sutskever, Vinyals, and Le2014] Sutskever, I.; Vinyals, O.; and Le, Q. V. 2014. Sequence to sequence learning with neural networks. In NIPS, 3104–3112.
- [Tu et al.2016] Tu, Z.; Lu, Z.; Liu, Y.; Liu, X.; and Li, H. 2016. Modeling coverage for neural machine translation. arXiv preprint arXiv:1601.04811.
- [Wiseman, Shieber, and Rush2017] Wiseman, S.; Shieber, S. M.; and Rush, A. M. 2017. Challenges in data-to-document generation. arXiv preprint arXiv:1707.08052.
- [Yan et al.2016] Yan, Z.; Duan, N.; Bao, J.; Chen, P.; Zhou, M.; Li, Z.; and Zhou, J. 2016. Docchat: An information retrieval approach for chatbot engines using unstructured documents. In ACL, 516–525.
- [Yin et al.2015] Yin, J.; Jiang, X.; Lu, Z.; Shang, L.; Li, H.; and Li, X. 2015. Neural generative question answering. arXiv preprint arXiv:1512.01337.
- [Zeiler2012] Zeiler, M. D. 2012. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701.