Neural Data-to-Text Generation with Dynamic Content Planning

04/16/2020 ∙ by Kai Chen, et al. ∙ Harbin Institute of Technology 0

Neural data-to-text generation models have achieved significant advancement in recent years. However, these models have two shortcomings: the generated texts tend to miss some vital information, and they often generate descriptions that are not consistent with the structured input data. To alleviate these problems, we propose a Neural data-to-text generation model with Dynamic content Planning, named NDP for abbreviation. The NDP can utilize the previously generated text to dynamically select the appropriate entry from the given structured data. We further design a reconstruction mechanism with a novel objective function that can reconstruct the whole entry of the used data sequentially from the hidden states of the decoder, which aids the accuracy of the generated text. Empirical results show that the NDP achieves superior performance over the state-of-the-art on ROTOWIRE dataset, in terms of relation generation (RG), content selection (CS), content ordering (CO) and BLEU metrics. The human evaluation result shows that the texts generated by the proposed NDP are better than the corresponding ones generated by NCP in most of time. And using the proposed reconstruction mechanism, the fidelity of the generated text can be further improved significantly.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Language generation has been applied in many NLP applications, such as machine translation (Bahdanau et al., 2015)

, text summarization 

(See et al., 2017) and dialog system (Shang et al., 2015). Unlike the above fields which take the text as input, data-to-text generation aims to produce informative, fluent and coherent multi-sentences descriptive text from the given structured data such as a table of sport game statistics (Robin, 1994), weather forecast data (Belz, 2008), and so on.

Generally, data-to-text generation needs to tackle two major problems (Kukich, 1983; McKeown, 1985): what to say, i.e., what data should be covered in the output text and how to say, i.e., how to convey the information using grammatically and logically corrected text. Most of the traditional work addresses these two issues via using different isolated modules with domain expert knowledge. On one hand, constructing a data-to-text generation system in this way is time-consuming and laborious. On the other hand, these systems are difficult to be extended to other domains.

Due to the recent fundamental advancements on neural language generation and representation (Bengio et al., 2003)

, neural network based approaches have drawn increasing attentions such as  

Nie et al. (2018)Puduppully et al. (2019), and  Wiseman et al. (2017). Our work also falls into this direction. Unlike the traditional approach, neural network based models can be constructed almost from scratch with an end-to-end fashion. These models are usually based on the encoder-decoder framework, which is mainly borrowed from neural sequence-to-sequence models (See et al., 2017; Sutskever et al., 2014; Bahdanau et al., 2015). Although some recent work (Puduppully et al., 2019; Wiseman et al., 2017) demonstrates that deep models perform much better than traditional approaches on maintaining inter-sentential coherence and a more reasonable ordering of the selected facts in the output text, neural data-to-text generation perform much worse on avoiding redundancy and being faithful to the input without using explicit data selecting and ordering.

To augment the neural models,  Puduppully et al. (2019) propose an explicit content selection and planning model, which can select the data and their order before text generation. Their model is divided into two stages. The explicit content selection and planning are independent of the text generation and the semantic information embedded in the text do not participate in the content selection and planning. Ideally, the data should be dynamically selected in the process of text generation, which can make full use of the semantic information of the generation history. Conversely, the appropriate selected data can benefit to the text generation. In a word, the dynamic content selection and planning can benefit both ”what to say” and ”how to say”. However, to the best of our knowledge, there is no work incorporating dynamic content selection and planning while generating text. Our work aims to energize the neural data-to-text generation model to dynamically select appropriate content from the given structured data with a novel dynamic planning mechanism.

Some recent work focuses on improving the ability of content selection in the encoder part such as Puduppully et al. (2019) and Nie et al. (2018) or the intermediate parts such as copy and coverage mechanism, the importance of the end part i.e., text decoder, lacks deep investigation. Tu et al. (2017) show that the reconstruction mechanism on the top of the decoder can improve the performance of machine translation significantly. Intuitively, a well-designed reconstruction mechanism can encourage the decoder to take more important information from the encoder side. Unlike  Wiseman et al. (2017)

which reconstructs some specific fields (i.e., value and entity) of the data entry using a convolutional classifier, we use another recurrent neural network to reconstruct the whole data entry sequentially with a novel objective function.

The contribution of our work can be summarized as follows:

  • We propose a novel differentiable dynamic content planning mechanism that can make full use of the previously generated history and the importance of the data self to decide which data should be used in the next step. The proposed dynamic content planning mechanism can be easily integrated with the encoder-decoder framework and it has its own objective function.

  • To ensure the decoder generates text as accurate as possible, we further design a novel record reconstruction mechanism with a well-designed objective function, to encourage the decoder to take more accurate information from the encoder side.

  • Finally, we construct a novel Neural data-to-text generation model with proposed Dynamic content Planning mechanism, named NDP for abbreviation. We experimentally evaluate the induced model on the challenging benchmark dataset ROTOWIRE  (Wiseman et al., 2017). The results show that NDP significantly improves the adequacy of generated text and achieves superior performance over state-of-the-art neural data-to-text systems in terms of relation generation (RG), content selection (CS), content ordering (CO) and BLEU metrics. The human evaluation result via using Best-Worst Scaling(BWS) technique (Louviere et al., 2015), shows that the texts generated by the proposed NDP are much better than the corresponding ones generated by NCP in most of the time. Using the proposed reconstruction mechanism in the training period, the fidelity (in terms of precision) of the generated text can be further improved with a large margin.

2 Background: Static Content Planning

In this section, we briefly introduce the explicit content planning mechanism proposed by Puduppully et al. (2019), which is the basis of our work. We call it as static content planning because once the data and its order are acquired, it will not change during the text generation.

The data-to-text generation can be defined as generating a document from a given structured data . For different tasks, the form of may be different. In our scenario, we take NBA basketball game report generation challenge ROTOWIRE (Wiseman et al., 2017) as our task. Concretely, the is a extensive statistical table consisting of a number of records , i.e., . Each record has four fields, i.e., type(e.g., =POINTS), entity(e.g.,= Kevin_Love), value(e.g.,=20), and whether a player is in a home-team(H) or visiting-team(V)(e.g.,=H).

For each record ,  Puduppully et al. (2019) first lookup embedding matrix for each record’s features and concatenate them together. And then, a nonlinear layer is used to get as Eq. 1.

(1)

where

  • , are parameters, is the dimension of embedding;

  • is the the rectifier activation function.

The

is used to attend to other records and the vector

is obtained with following equations.

where , are parameters and .

The is further used to select the information from with Eq. 2 and is obtained.

(2)

where denotes the element-wise product.

The static content plan is a sequence of pointer , the

refer to the record index in the given structured data input. The long short-term memory neural network(LSTM) is used to output the static record plan sequentially. The first hidden state is initialized with the average of

i.e., . On step , the input of the LSTM is the representation of the previous selected record . Then the output hidden state is used to attend to all the other records and get the as Eq. 3.

(3)

where is the parameter and .

For training, suppose the gold static content plan has been obtained denoted as (The gold static content plan extraction will be explained in Experiments

section). The static content planning module can be trained via minimizing the loss function as Eq. 

4.

(4)

For inference, Pointer Network (Vinyals et al., 2015) is explored to output the index of the selected record. The for step is predicted by Eq. 5.

(5)

3 Neural Data-to-Text Generation with Dynamic Content Planning

The overall architecture of the proposed NDP with reconstruction mechanism is shown in Figure 1. It consists of four components:

  • Static Content Planning that acquires the selected records and their order which will be fed into the following component.

  • Dynamic Content Planning is the novel part proposed in this paper. It will decide which record will play an important role in generating the next word according to the current state.

  • Text Decoder that generates word sequentially with attention (Bahdanau et al., 2015) and copy mechanism (Gulcehre et al., 2016) from the dynamic content plan representation.

  • Record Reconstruction is the novel part proposed in this paper which encourages the decoder to generate more accurate information from the dynamic content plan representation. It will only be used in the training period.

Figure 1: The overall architecture of the proposed NDP. The reconstruction mechanism is optional, which is only used in training period.

3.1 Dynamic Content Planning

From the description of Static Content Planning section, we can see that it can select some records from the given structured data input, according to the importance of the record. However, it neglects the semantic information in the text when arranging their order(i.e., Planning). In this part, we propose a dynamic content planning mechanism that can make full use of previous generation history to decide which record will play an important role in generating the next word.

Once the static record plan is acquired, it’s corresponding representations are fed into a bi-directional LSTM sequentially and the bi-directional context representations are obtained as Eq. 6.

(6)

where denotes the bi-directional LSTM.

For using the information of previously generated words, the memory cell state of the text decoder is used to guide which record should be selected on the step . Let , the is the memory cell state of the second layer of the text decoder(we use the two-layer LSTM for text decoder, the detail described in Text Decoder section). And then, is concatenated with each state in

. By using a no-linear layer with a sigmoid activation function, the probability of selecting the record

on step is calculated as Eq. 7.

(7)

where is the dynamic pointer to which record should be used on step . is the parameter and is the concatenation operation.

And then, the is normalized to with respect to other records, as Eq. 8.

(8)

Finally, the dynamic content planning representation on step is obtained by using Eq. 9.

(9)

Comparing with the static content planning representation , it is obvious that the dynamic content planning representation will change dynamically for each decoding step.

Formally, on each step, one record is selected. Suppose the record should be selected on step (with regard to how to acquire which record should be selected for each step during training, it will be explained in Experiments section). To ensure the dynamic content planning mechanism selects the appropriate entry from , we use the objective function as Eq. 10.

(10)

Hence, the loss of the dynamic content planning for generating the text is the accumulative loss for each step, which can be formulated as Eq. 11.

(11)

It should be noticed that the dynamic content planning is trained with the loss in a supervised way. They should be regarded as a whole part.

3.2 Text Decoder

The decoder is a two-layer LSTM denoted as . The initial states are intialized with the last state of the . The generation starts from the given symbol /begin and terminates when the symbol /end is emitted or the maximum length is reached. On step , the takes the previous word and hidden state as input while outputting the hidden states and memory cell state as Eq 12.

(12)

The is used for dynamic content planning and is used for attention mechanism (Bahdanau et al., 2015) as the following equations:

where and are parameters.

For generating word, we exploited conditional copy mechanism (Gulcehre et al., 2016). The probability of generating words from vocabulary is computed as Eq 13:

(13)

The gate of whether copying or generating a word is computed as Eq. 14:

(14)

The final probability of emitting word at step is computed as Eq. 15:

(15)

For training, suppose the reference text is . The loss of the text decoder can be calculated as Eq. 16:

(16)

where the is the average probability of each word in the , i.e., . The item can be viewed as the a regularization which is designed for alleviating repetition problem. The hyper-parameter can be chosen empirically.

3.3 Record Reconstruction

Unlike Wiseman et al. (2017)

which divides the decoder hidden state into two parts and uses the convolutional neural network 

(Collobert et al., 2011) to predict the entities and values of records, we use another LSTM with attention to reconstruct all the fields of the selected data sequentially, as shown in Fig. 1. The reconstruction loss can be computed as Eq. 17.

(17)

where is the generating probability of the th element in the record . The is computed with softmax function and the input of the proposed record reconstruction mechanism is the hidden states of text decoder, the reconstruction is performed by sequentially outputting the filed of the golden records.

Most of the previous work uses objective function 17. In our work, to ensure the decoder generates the information as accurate as possible, we design another loss item that encourages the decoder hidden states to incorporate information that can be used to reconstruct all the fields of the selected record. The extra loss item is formulated as Eq. 18.

(18)

where is the average probability of all elements in record which can be computed as Eq. 19.

(19)

The can be viewed as a regularization item that prevents the record reconstruction of LSTM from over-fitting on some high-frequency elements while neglecting other fields. Finally, the overall loss of the record reconstruction can be summarized as Eq. 20.

(20)

where the hyper-parameter can be chosen empirically.

3.4 Training

To summarize, each component of the proposed NDP has its objective function. The NDP can be trained in an end-to-end fashion, minimizing the overall loss as Eq. 21 while using the reconstruction mechanism.

(21)

where and are the hyper-parameters which can be chosen empirically.

4 Experiments

4.1 Data

We evaluated the proposed NDP on the ROTOWIRE (Wiseman et al., 2017), a large scale NBA basketball game summaries, paired with the corresponding box- and line-score tables. Comparing with other datasets, it is more challenging, much larger and the summaries are professionally written, relatively well structured and long (337 words on average). The number of record types is 39, the average number of records is 628, the vocabulary size is 11.3K and the token count is 1.6M. Followed previous work, we trained on 3,398 summaries, tested on 728, and used 727 for validation.

ID Value Entity Value H/V
1 LeBron LeBron_James F_NAME V
2 James LeBron_James S_NAME V
3 Kevin Kevin_Love F_NAME V
4 Love Kevin_Love S_NAME V
5 25 LeBron_James PTS V
6 14 LeBron_James AST V
7 Kevin Kevin_Love F_NAME V
8 Love Kevin_Love S_NAME V
9 20 Kevin_Love PTS V
10 11 Kevin_Love REB V
11 Kyrie Kyrie_Irving F_NAME V
12 Irving Kyrie_Irving S_NAME V
13 3 Kyrie_Irving FGM V
14 17 Kyrie_Irving FGA V
15 8 Kyrie_Irving PTS V
Table 1: An example of gold static content plan extracted from the text in Table 2.

4.2 The Gold Static and Dynamic Content Plan Extraction

Static Content Plan Extraction. In the training period, the gold static content plan is needed. We adopted the tool developed by Puduppully et al. (2019) to extract the gold record plan. For the training set, the information extraction (IE) system developed by Puduppully et al. (2019), was first used to identify entity and value pairs in the text. And then the type of the entity-value pair was predicted. For the pair in the same sentence, and if there was a record in the record bank with matching entities and values, the pair was assigned the corresponding type. By processing the sentences in the text sequentially, a sequence of records was acquired. Player names were divided into first name and surname; team records are also prepossessed to indicate the name of the team’s city and the team itself. Table 1 shows an example of the partial gold static content plan extracted via the above methods for its corresponding text in Table 2.

The dynamic duo of LeBron James and Kevin continued their outstanding early - season play Saturday . James posted a 25 - point , 14 - assist double - double , while Kevin Love also accomplished the feat with 20 and 11 boards . The stellar production helped overcome a down night for Kyrie Irving , who drained just 3 of his 17 shot attempts , producing a season - low 8 - point tally…..
Table 2: An example of text with gold dynamic Content Plan. The token with red color is the one with the matching value in Table 1. The superscript number of each token is the corresponding record ID.

Dynamic Content Plan Extraction. To train the proposed NDP, we need the gold dynamic content plan to calculate the loss in Eq. 11. We first matched the token of the text with the value of the record in the gold static content plan (Table 1) sequentially. If there was a matching value and the entity of the record was presented in the same sentence, the corresponding record would be assigned to the current token. Otherwise, the next token’s matching record would be assigned to the current token. As shown in Table  2, to match the entity, we did some processing like splitting the entity into tokens and matching each one of them.

4.3 Evaluation Metrics

We used the extractive evaluation tools developed by  Puduppully et al. (2019) to compare our model with competitor models. It employs an accurate IE system on the gold and automatic summaries. Let be the gold text, and the generated text. It consists of three metrics: Relation generation (RG) computes the precision and number of unique relations extracted from that also appear in the given structured input data. Content selection (CS)

computes the precision and recall of unique relations extracted from

matching those found in . Content ordering (CO) computes the normalized Damerau-Levenshtein Distance (Brill and Moore, 2000) between the sequences of records extracted from and that extracted from . Besides, we report BLEU scores and human evaluation results.

4.4 Experiment Setup

We compared the proposed models with the following competitor models. TEMPL: Template-based generator constructed by Wiseman et al. (2017) which creates a document consisting of eight template sentences: an introductory sentence (who won/lost), six player-specific sentences (based on the six highest-scoring players in the game), and a conclusion sentence. Wise: The best reported system in (Wiseman et al., 2017). ED+JC: vanilla encoder-decoder model with attention and joint copy mechanism (Gu et al., 2016). ED+CC: vanilla encoder-decoder model with attention and conditional copy mechanism. NCP+JC: proposed by Puduppully et al. (2019) with static content planning and joint copy mechanism. NCP+CC: proposed by Puduppully et al. (2019) with static content planning and conditional copy mechanism. OpAtt: is operation guided attention-based network proposed by Nie et al. (2018).

To ensure the fairness of comparison with competitor models especially the strong baselines i.e.,NCP+CC and NCP+JC, we used the same embedding and LSTM dimension(600). The setting of the static content planning and text decoder LSTM was the same as NCP. To boost the training, we used the pre-trained parameters of NCP to initialize the corresponding parts of NDP. For the NDP+rec, we continued to train the pre-trained NDP model by adding the record reconstruction mechanism on the top of the text decoder. Input feeding (Luong et al., 2015) was used for the text decoder. Standard fully batched RNN (Bahdanau et al., 2015) was used for record reconstruction. We applied dropout with a rate of 0.3. With the initial learning rate 0.15, learning rate decay 0.97, Adagrad optimizer was used. The hyper-parameters and were set to 1, 0.05, 1, 1, 0.05 and 0.05 respectively. BPTT (Mikolov et al., 2010) was used in text decoder and truncation size was set to 100. We set the batch size to 5 for training and the beam size to 5 for inference. The NDP model is implemented based on PaddlePaddle111http://www.paddlepaddle.org/.

4.5 Result

Validation Set Test Set
Model RG CS CO BU RG CS CO BU
# P(%) P(%) R(%) DL(%) # P(%) P(%) R(%) DL(%)
TEMPL 54.29 99.92 26.61 59.16 14.42 8.51 54.23 99.94 26.99 58.16 14.92 8.46
Wise 23.95 75.10 28.11 35.86 15.33 14.57 23.72 74.80 29.49 36.18 15.42 14.19
ED+JC 22.98 76.07 27.70 33.29 14.36 13.22 - - - - - -
ED+CC 21.94 75.08 27.96 32.71 15.03 13.31 - - - - - -
OpAtt - - - - - 14.96 - - - - - 14.74
NCP+JC 33.37 87.40 32.20 48.56 17.98 14.92 34.09 87.19 32.02 47.29 17.15 14.89
NCP+CC 33.88 87.51 33.52 51.21 18.57 16.19 34.28 87.47 34.18 51.22 18.58 16.50
NDP 33.83 89.22 35.44 53.50 19.91 16.81 34.65 89.3 35.46 52.98 19.47 16.70
NDP+rec 31.63 89.87 37.26 52.16 20.67 16.67 32.43 89.3 36.85 51.65 20.29 16.38
Table 3: Automatic evaluation on ROTOWIRE validation and test sets using relation generation (RG) count (#) and precision (P%), content selection (CS) precision (P%) and recall (R%), content ordering (CO) in normalized Damerau-Levenshtein distance (DL%), and BLEU(BU)

The left part of Table 3 summaries the results on the validation set. NDP outperforms all the neural competitor models (Wise, ED+JCC, ED+CC, NCP+JC, NCP+CC, and OpAtt). Especially, NDP outperforms the NCP which only used static content planning mechanism, irrespective of the copy mechanism being employed. The difference between NDP with NCP+CC is that NDP uses both the proposed dynamic content planning and static content planning while NCP+CC only uses static content planning, which indicates that it is the dynamic content planning which brings performance improvement. As for the template-based system (TEMPL), Table 3

shows that it achieves much higher performances on RG and the recall of CS, comparing with machine learning models. It is not surprising as TEMPL depends on the domain expert knowledge, which can be viewed as an upper-bound on content selection and relation generation for machine learning based models. Neural models especially our proposed NDP irrespective of the reconstruction mechanism perform much better on the precision of content selection, content ordering, and BLEU metrics.

The NDP outperforms NCP+CC significantly on the precision (P%) of RG (89.22 vs 87.51), both of them get the comparable performance (33.83 vs 33.88) on the number (#) of RG. Recall from the previous section that RG(#) is the number of unique relations extracted from that also appear in the given structured data. Our proposed dynamic content planning mechanism takes the output of static content planning as input, which decides how many records the proposed NDP can use from the given structured data. That is the reason that there no significant difference in the performance of the number (#) of RG. The NDP+rec is the NDP model trained with the proposed record reconstruction mechanism. Comparing with NDP, NDP+rec improves the precision of the relation generation(RG) (89.87 VS 89.22), content selection(CS) (37.26 vs 35.44) and content ordering(CO) (20.67 vs 19.91), while it sacrifices some recall performances on RG (31.63 vs 33.83), CS (52.16 vs 53.50 ) and BLEU (16.67 vs 16.81). In other words, the reconstruction mechanism improves the fidelity of the generated text but sacrifices some adequacy. We believe that the proposed reconstruction mechanism is useful for some tasks whose fidelity is much more important than adequacy such as weather reports and stock market summary generation.

The results on the test set shown in Table 3 follow a pattern similar to the validation set. NDP achieves higher performance in all metrics including relation generation, content selection, content ordering, and BLEU, comparing with NCP+CC and other neural competitor models. The proposed reconstruction mechanism can further improve the fidelity of the generated text.

We also did the human evaluation for NCP+CC, NDP and NDP+rec on 50 randomly selected samples. We asked three raters to evaluate 50 randomly selected samples for NCP+CC, NDP, and NDP+rec. For each sample, we arranged the generated summaries into three pairs i.e (NCP+CC, NDP), (NCP+CC, NDP+rec) and (NDP, NDP+rec). Each pair was shown to three raters, who were asked to choose which summary was best and which was worst according to four criteria: Fluency (is the summary fluent?), Conciseness (does the summary avoid redundant information and repetitions?), Fidelity(does the summary avoid erroneous information), and Overall(overall quality of the summary). We adopt the Best-Worst Scaling (BWS) technique  (Louviere et al., 2015) to get the final result. BWS was shown to be less labor-intensive and providing more reliable results as compared to rating scales  (Kiritchenko and Mohammad, 2017). The score ranges from -1.0 (absolutely worst) to +1.0 (absolutely best).

Model Fluency Conciseness Fidelity Overall
NCP+CC -0.77 -0.77 -0.4 -0.7
NDP 0.23 0.37 0.03 0.27
NDP+rec 0.53 0.4 0.37 0.4
Table 4: The human evaluation result on 50 randomly selected samples by using the Best-Worst Scaling (BWS).

The human evaluation result is present in Table 4. It shows that NDP achieves much higher scores on all four criteria than NCP+CC, which is in accordance with the results shown in Table 3 by using automatic metrics. It indicates that the generated summaries by NDP are better than the corresponding ones generated by NCP most of the time. The result in Table 4 also shows that the raters give higher scores to NDP+rec than NDP, although NDP+rec performs lower performance on the recall of relation generation and content selection. It indicates that the fidelity is much more import than the information adequacy of text for human raters. It is easily understood that the mistakes of the generated text are much easier to detect than the number of facts. Our further qualitative analysis indicates that the NDP tends to generate longer text (its length is closer to the human-written text) with fewer repetitions, comparing with the NCP. However, both of them make mistakes on the background knowledge which may be attributed to the lack of this knowledge in the input data. Due to the space limitation, a generated example is listed in Table 6 of Qualitative Example. The text generated by NDP covers more records (blue color) than NCP+CC, which indicates that dynamic content planning can improve the adequacy of the generated text. NCP+CC performs not very well on avoiding the word repetitions (orange color). Both NDP and NCP+CC make some mistakes(red color). The text generated by NDP includes 6 mistakes, while NCP+CC includes 7 mistakes. Especially, the last two mistakes (two team’s next games) of them are the background knowledge that is not included in the input data. These findings highlight the importance of introducing the background knowledge for generating rich descriptive text for sports game reports.

4.6 Qualitative Example

ID Value Entity Type H/V ID Value Entity Type H/V
1 Golden_State Warriors T-CITY V 33 5 Klay_Thompson FGM V
2 Warriors Warriors T-NAME V 34 Derrick Derrick_Favors F_NAME H
3 30 Warriors T-WINS V 35 Favors Derrick_Favors S_NAME H
4 5 Warriors T-LOSSES V 36 10 Derrick_Favors FGM H
5 116 Warriors T-PTS V 37 16 Derrick_Favors FGA H
6 Utah Jazz T-CITY H 38 22 Derrick_Favors PTS H
7 Jazz Jazz T-NAME H 39 11 Derrick_Favors REB H
8 13 Jazz T-WINS H 40 Enes Enes_Kanter F_NAME H
9 26 Jazz T-LOSSES H 41 Kanter Enes_Kanter S_NAME H
10 105 Jazz T-PTS H 42 13 Enes_Kanter PTS H
11 51 Warriors T-FG_PCT V 43 6 Enes_Kanter FGM H
12 52 Warriors T-FG3_PCT V 44 13 Enes_Kanter FGA H
13 48 Jazz T-FG_PCT H 45 1 Enes_Kanter FG3M H
14 33 Jazz T-FG3_PCT H 46 1 Enes_Kanter FG3A H
15 44 Jazz T-REB H 47 10 Enes_Kanter REB H
16 31 Warriors T-REB V 48 Rudy Rudy_Gobert F_NAME H
17 Stephen Stephen_Curry F_NAME V 49 Gobert Rudy_Gobert S_NAME H
18 Curry Stephen_Curry S_NAME V 50 16 Rudy_Gobert PTS H
19 10 Stephen_Curry FGM V 51 4 Rudy_Gobert FGM H
20 16 Stephen_Curry FGA V 52 9 Rudy_Gobert FGA H
21 4 Stephen_Curry FG3M V 53 8 Rudy_Gobert FTM H
22 9 Stephen_Curry FG3A V 54 11 Rudy_Gobert FTA H
23 27 Stephen_Curry PTS V 55 11 Rudy_Gobert REB H
24 Draymond Draymond_Green F_NAME V 56 Gordon Gordon_Hayward F_NAME H
25 Green Draymond_Green S_NAME V 57 Hayward Gordon_Hayward S_NAME H
26 15 Draymond_Green PTS V 58 17 Gordon_Hayward PTS H
27 6 Draymond_Green FGM V 59 5 Gordon_Hayward FGM H
28 9 Draymond_Green FGA V 60 11 Gordon_Hayward FGA H
29 3 Draymond_Green FG3M V 61 1 Gordon_Hayward FG3M H
30 4 Draymond_Green FG3A V 62 4 Gordon_Hayward FG3A H
31 Klay Klay_Thompson F_NAME V 63 6 Gordon_Hayward FTM H
32 Thompson Klay_Thompson S_NAME V 64 6 Gordon_Hayward FTA H
Table 5: An example of gold static content plan for its corresponding text in Table 6.

Table 5 shows the gold static content plan for the human-written text(Gold) in Table 6. Table 6 lists the texts generated by NDP and NCP+CC models. We highlighted the text with blue color if it agrees with records in Table 5 and red if the text contradicts with records in Table 5. We also use the orange color to highlight repetitions.

Table 6 shows that the text generated by NDP is much longer than NCP+CC. The text generated by NDP covers more records(blue color) than NCP+CC, which indicates that dynamic content planning can improve the adequacy of the generated text. NCP+CC performs not very well on avoiding the word repetitions(orange color), compared with the proposed NDP. Both NDP and NCP+CC make some mistakes(red color) on some facts. The text generated by NDP includes 6 mistakes, while NCP+CC includes 7 mistakes. Especially, the last two mistakes (two team’s next games) of them are the background knowledge that is not included in the structured input data. These findings highlight the importance of introducing the background knowledge for generating rich descriptive text for sports game report, which will be our future work.

The Golden State Warriors ( 30 - 5 ) defeated the Utah Jazz ( 13 - 26 ) 116 - 105 on Wednesday at Energy Solutions Arena in Salt Lake City . The Warriors were the superior shooters in this game , going 51 percent from the field and 52 percent from the three - point line , while the Jazz went 48 percent from the floor and just 33 percent from beyond the arc . While the Jazz out - rebounded the Warriors 44 - 31 , the Warriors made up for it by forcing the Jazz into 17 turnovers , while committing only 10 of their own . Stephen Curry was very tough to stop in this game , as he went 10 - for - 16 from the field and 4 - for - 9 from the three - point line to finish with a game - high of 27 points . He also handed out 11 assists , notching his fourth double - double in his last five games . Over those last five outings , Curry is averaging 24 points and 11 assists per game , as he continues to have the hot hand . Draymond Green also had a strong showing in this one , scoring an efficient 15 points ( 6 - 9 FG , 3 - 4 3Pt ) . Surprisingly , he only recorded one rebound though , which was a season - low for him . Klay Thompson only shot the ball eight times , converting on five of those eight attempts . It was the least amount of shots he ’s gotten off all year , but he was still able to add 12 points . The Warriors really spread around the minutes in this game with nine players recording more than 22 minutes . Despite the loss , the Jazz saw three players record double - doubles . Derrick Favors led the charge , going 10 - for - 16 from the field to score 22 points , while adding 11 rebounds as well . He ’s had 20 or more points in four out of his last five games , as he ’s averaged 21 points per game over that span . Enes Kanter was another double - double contributor , with 13 points ( 6 - 13 FG , 1 - 1 3Pt ) and 10 rebounds of his own . The Jazz received an impressive bench performance from Rudy Gobert , who recorded 16 points ( 4 - 9 FG , 8 - 11 FT ) and 11 rebounds . It was arguably one of his best games of the season , as the 16 points were a season - high , and he also had a significant defensive presence with two steals and three blocks . Gordon Hayward finished with 17 points ( 5 - 11 FG , 1 - 4 3Pt , 6 - 6 FT ) as well . The Warriors ’ next game will be at home against the Miami Heat on Wednesday , while the Jazz will be at home against the Los Angeles Lakers on Friday . Gold
The Golden State Warriors ( 30 - 5 ) defeated the host Utah Jazz ( 13 - 26 ) 116 - 105 on Wednesday at Energy Solutions Arena in Salt Lake City . The Warriors were the superior shooters in this game , going 51 percent from the field and 52 percent from the three - point line , while the Jazz went 48 percent from the floor and just 33 percent from deep . The Jazz also shot 48 percent from the field and 52 percent from the three - point line , while the Jazz finished at 48 percent from the floor and a meager 33 percent from deep . The Jazz also forced the Jazz into 20 turnovers , while committing only 12 of their own . The Warriors ’ backcourt of Stephen Curry and Klay Thompson did the bulk of the work offensively . Curry finished with 27 points ( 10 - 16 FG , 4 - 9 3Pt , 3 - 3 FT ) and 11 assists , while Green posted 15 points ( 6 - 9 FG , 3 - 4 3Pt , 0 - 0 FT ) and three assists . He ’s now pushed his season averages to 13 points and 6 rebounds on the year . Also coming off the bench , David Green chipped in with 10 points of his own . The Warriors ’ next game will be on the road against the Portland Trail Blazers on Friday , while the Jazz will travel to Minnesota to play the Timberwolves on Wednesday . NCP+CC
The Golden State Warriors ( 30 - 5 ) defeated the Utah Jazz ( 13 - 26 ) 116 - 105 on Wednesday at Energy Solutions Arena in Salt Lake City . The Warriors were the superior shooters in this game , going 51 percent from the field and 52 percent from the three - point line , while the Jazz finished at 48 percent from the floor and just 33 percent from deep . The Jazz also forced the Jazz into 21 turnovers , while committing only nine of their own . The Jazz were led by the duo of Stephen Curry and Klay Thompson . Curry finished with a game - high of 27 points ( 10 - 16 FG , 4 - 9 3Pt , 3 - 3 FT ) , while also adding 11 assists . It was his second double - double over his last three games , a stretch where he ’s averaging 24 points and 12 assists . Draymond Green also had a strong showing , finishing with 15 points ( 6 - 9 FG , 3 - 4 3Pt , 0 - 0 FT ) and three assists . He ’s now pushed his season averages to 12 points and 6 rebounds on the year . The only other Warrior to reach double figures in points was David Lee , who came off the bench for 10 points . The Warriors ’ next game will be on the road against the Golden State Warriors on Wednesday , while the Jazz will travel to Minnesota to play the Timberwolves on Wednesday . For the Jazz , it was a very tough loss for the Jazz . The Jazz were led by Derrick Favors , who posted a double - double of his own with 22 points ( 10 - 16 FG , 2 - 3 FT ) and 11 rebounds . It was his second double - double in a row , as he ’s combined for 54 points and 19 rebounds over his last two games . Rudy Gobert had a double - double of his own with 16 points ( 4 - 9 FG , 8 - 11 FT ) , 11 rebounds and three blocked shots . It marked his third double - double in a row , a stretch where he ’s averaging 17 points and 12 rebounds . Gordon Hayward chipped in with 17 points ( 5 - 11 FG , 1 - 4 3Pt , 6 - 6 FT ) of his own , while Enes Kanter chipped in with 13 points ( 7 - 14 FG , 1 - 1 3Pt ) and 10 rebounds as well . The Warriors ’ next game will be on the road against the New Orleans Pelicans on Friday , while the Jazz will be at home against the New Orleans Pelicans on Wednesday . NDP
Table 6: Example texts generated by NDP and NCP+CC models. The Gold denotes the human-written text.

5 Related Work

The key challenges in data-to-text generation are mainly contained two aspects: content or data selection and surface realization (Kukich, 1983; McKeown, 1985; Goldberg et al., 1994). The conventional approaches consider them as different individual parts.

The content selection is addressed with the manual rules or obtained via deeply analyzing the alignment between the text and input data (Barzilay and Lapata, 2005; Liang et al., 2009; Angeli et al., 2010). For surface realization, research has show that template-based approaches generally result in texts of high quality (Goldberg et al., 1994; van der Lee et al., 2017). However, it is time-consuming and difficult to make rules that satisfy all situations. Some researchers adopt the statistical machine translation based text generators such as  Wong and Mooney (2007)Belz and Kow (2009) and Pereira et al. (2015). However, these models are generally lower in performance (Reiter, 1995).

With the availability of large data-to-text datasets such as E2E NLG (Novikova et al., 2017) and ROTOWIRE data-set (Wiseman et al., 2017), there has been a growing interest in building neural network based systems. The main characteristic of these models is that there is no clear distinction between content selection and surface realization. Without the explicit content selection, the encoder-decoder frameworks (Sutskever et al., 2014; Novikova et al., 2017) perform much worse on metrics of content selection recall and factual output generation (Wiseman et al., 2017; Nie et al., 2018).  Gehrmann et al. (2018) apply the copy mechanism from neural summarization (See et al., 2017) to improve the ability of content selection and Puduppully et al. (2019) integrates the explicit content planning into neural models. Nie et al. (2018) proposes operation-guided attention to improve the fidelity of the generated text.

6 Conclusion

We propose a novel neural data-to-text generation model with dynamic content planning(NDP). To improve the fidelity of the generated text, we further propose a novel record reconstruction mechanism that encourages the decoder to use more accurate information from the encoder. Experimental results on the ROTOWIRE dataset show that the NDP achieves state-of-the-art performance over the strong baseline models in terms of relation generation, content selection, content order, and BLEU metrics. The human evaluation and qualitative analysis also demonstrate that the texts generated by the proposed NDP are much better than the corresponding ones generated by NCP in most of the time. However, the proposed NDP and previous models can not avoid making some mistakes on some facts generation, because there lacks necessary background knowledge in the input data. In the future, we will explore to use the extra knowledge graph to guide the text generation. In this way, we expect the generated text can include more background knowledge such as the team’s previous performance or the information about future games and fewer mistakes.

7 Acknowledgment

This work is supported by Natural Science Foundation of China (Grant No. 61872113, 61573118, U1813215, 61876052), Special Foundation for Technology Research Program of Guangdong Province (Grant No. 2015B010131010), Strategic Emerging Industry Development Special Funds of Shenzhen (Grant No. JCYJ20170307150528934, JCYJ2017 0811153836555, JCYJ20180306172232154), Innovation Fund of Harbin Institute of Technology (Grant No. HIT. NSRIF.2017052).

References

  • G. Angeli, P. Liang, and D. Klein (2010) A simple domain-independent probabilistic approach to generation. In Proceedings of the EMNLP2010, Cambridge, MA, pp. 502–512. Cited by: §5.
  • D. Bahdanau, K. Cho, and Y. Bengio (2015) Neural machine translation by jointly learning to align and translate. In ICLR, Cited by: §1, §1, item 3, §3.2, §4.4.
  • R. Barzilay and M. Lapata (2005) Collective content selection for concept-to-text generation. In Proceedings of the HLT, pp. 331–338. Cited by: §5.
  • A. Belz and E. Kow (2009) System building cost vs. output quality in data-to-text generation. In Proceedings of the 12th ENLG, pp. 16–24. Cited by: §5.
  • A. Belz (2008) Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models. Natural Language Engineering 14 (4), pp. 431–455. Cited by: §1.
  • Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin (2003) A neural probabilistic language model. JMLR 3, pp. 1137–1155. Cited by: §1.
  • E. Brill and R. C. Moore (2000) An improved error model for noisy channel spelling correction. In Proceedings of the ACL2000, pp. 286–293. Cited by: §4.3.
  • R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa (2011) Natural language processing (almost) from scratch. JMLR 12, pp. 2493–2537. External Links: ISSN 1532-4435 Cited by: §3.3.
  • S. Gehrmann, F. Dai, H. Elder, and A. Rush (2018) End-to-end content and plan selection for data-to-text generation. In Proceedings of the 11th International Conference on Natural Language Generation, pp. 46–56. Cited by: §5.
  • E. Goldberg, N. Driedger, and R. I. Kittredge (1994) Using natural-language processing to produce weather forecasts. IEEE Expert: Intelligent Systems and Their Applications 9 (2), pp. 45–53. Cited by: §5, §5.
  • J. Gu, Z. Lu, H. Li, and V. O.K. Li (2016) Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the ACL2016, pp. 1631–1640. Cited by: §4.4.
  • C. Gulcehre, S. Ahn, R. Nallapati, B. Zhou, and Y. Bengio (2016) Pointing the unknown words. In Proceedings of the ACL2016, pp. 140–149. Cited by: item 3, §3.2.
  • S. Kiritchenko and S. M. Mohammad (2017) Best-worst scaling more reliable than rating scales: a case study on sentiment intensity annotation. arXiv preprint arXiv:1712.01765. Cited by: §4.5.
  • K. Kukich (1983) Design of a knowledge-based report generator. In Proceedings of the ACL1983, pp. 145–150. Cited by: §1, §5.
  • P. Liang, M. Jordan, and D. Klein (2009) Learning semantic correspondences with less supervision. In Proceedings of the ACL2009, Suntec, Singapore, pp. 91–99. Cited by: §5.
  • J.J. Louviere, T.N. Flynn, and A.A.J. Marley (2015) Best-worst scaling: theory, methods and applications. Cambridge books online, Cambridge University Press. External Links: ISBN 9781107043152, LCCN 2014044866 Cited by: item 3, §4.5.
  • T. Luong, H. Pham, and C. D. Manning (2015) Effective approaches to attention-based neural machine translation. In Proceedings of the EMNLP2015, pp. 1412–1421. Cited by: §4.4.
  • K. R. McKeown (1985) Text generation: using discourse strategies and focus constraints to generate natural language text. Cambridge University Press, New York, NY, USA. External Links: ISBN 0-521-30116-5 Cited by: §1, §5.
  • T. Mikolov, M. Karafiát, L. Burget, J. Černocký, and S. Khudanpur (2010) Recurrent neural network based language model. In Proceedings of the INTERSPEECH2010, pp. 1045–1048. Cited by: §4.4.
  • F. Nie, J. Wang, J. Yao, R. Pan, and C. Lin (2018) Operations guided neural networks for high fidelity data-to-text generation. arXiv preprint arXiv:1809.02735. Cited by: §1, §1, §4.4, §5.
  • J. Novikova, O. Dušek, and V. Rieser (2017) The E2E dataset: new challenges for end-to-end generation. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pp. 201–206. Cited by: §5.
  • J. C. Pereira, A. L. J. Teixeira, and J. S. Pinto (2015) Towards a hybrid nlg system for data2text in portuguese. In Proceedings of the 10th CISTI, pp. 1–6. Cited by: §5.
  • R. Puduppully, L. Dong, and M. Lapata (2019) Data-to-text generation with content selection and planning. In Proceedings of the AAA2019, Cited by: §1, §1, §1, §2, §2, §4.2, §4.3, §4.4, §5.
  • E. Reiter (1995) NLG vs. templates. CoRR cmp-lg/9504013. Cited by: §5.
  • J. Robin (1994) Revision-based generation of natural language summaries providing historical background-corpus-based analysis, design, implementation and evaluation. Cited by: §1.
  • A. See, P. J. Liu, and C. D. Manning (2017) Get to the point: summarization with pointer-generator networks. In Proceedings of the ACL2017, pp. 1073–1083. Cited by: §1, §1, §5.
  • L. Shang, Z. Lu, and H. Li (2015) Neural responding machine for short-text conversation. In Proceedings of the ACL2015, pp. 1577–1586. Cited by: §1.
  • I. Sutskever, O. Vinyals, and Q. V. Le (2014) Sequence to sequence learning with neural networks. In NIPS2014, pp. 3104–3112. Cited by: §1, §5.
  • Z. Tu, Y. Liu, L. Shang, X. Liu, and H. Li (2017) Neural machine translation with reconstruction. In Proceedings of the AAAI2017, pp. 3097–3103. Cited by: §1.
  • C. van der Lee, E. Krahmer, and S. Wubben (2017) PASS: a Dutch data-to-text system for soccer, targeted towards specific audiences. In Proceedings of NLG, Santiago de Compostela, Spain, pp. 95–104. Cited by: §5.
  • O. Vinyals, M. Fortunato, and N. Jaitly (2015) Pointer networks. In NIPS2015, pp. 2692–2700. Cited by: §2.
  • S. Wiseman, S. Shieber, and A. Rush (2017) Challenges in data-to-document generation. In Proceedings of the EMNLP2017, pp. 2253–2263. Cited by: item 3, §1, §1, §2, §3.3, §4.1, §4.4, §5.
  • Y. W. Wong and R. J. Mooney (2007) Generation by inverting a semantic parser that uses statistical machine translation. In Proceedings of NAACL-HLT-07, Rochester, NY, pp. 172–179. Cited by: §5.