Fact-based Text Editing

07/02/2020 ∙ by Hayate Iso, et al. ∙ ByteDance Inc. 0

We propose a novel text editing task, referred to as fact-based text editing, in which the goal is to revise a given document to better describe the facts in a knowledge base (e.g., several triples). The task is important in practice because reflecting the truth is a common requirement in text editing. First, we propose a method for automatically generating a dataset for research on fact-based text editing, where each instance consists of a draft text, a revised text, and several facts represented in triples. We apply the method into two public table-to-text datasets, obtaining two new datasets consisting of 233k and 37k instances, respectively. Next, we propose a new neural network architecture for fact-based text editing, called FactEditor, which edits a draft text by referring to given facts using a buffer, a stream, and a memory. A straightforward approach to address the problem would be to employ an encoder-decoder model. Our experimental results on the two datasets show that FactEditor outperforms the encoder-decoder approach in terms of fidelity and fluency. The results also show that FactEditor conducts inference faster than the encoder-decoder approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

Code Repositories

factedit

:monocle_face: Code & Data for Fact-based Text Editing (Iso et al; ACL 2020)


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Set of triples
{ (Baymax, creator, Douncan_Rouleau),
(Douncan_Rouleau, nationality, American),
(Baymax, creator, Steven_T._Seagle),
(Steven_T._Seagle, nationality, American),
(Baymax, series, Big_Hero_6),
(Big_Hero_6, starring, Scott_Adsit)}
Draft text
Baymax was created by Duncan_Rouleau, a winner of
Eagle_Award. Baymax is a character in Big_Hero_6 .
Revised text
Baymax was created by American creators
Duncan_Rouleau and Steven_T._Seagle . Baymax is
a character in Big_Hero_6 which stars Scott_Adsit .
Table 1: Example of fact-based text editing. Facts are represented in triples. The facts in green appear in both draft text and triples. The facts in orange are present in the draft text, but absent from the triples. The facts in blue do not appear in the draft text, but in the triples. The task of fact-based text editing is to edit the draft text on the basis of the triples, by deleting unsupported facts and inserting missing facts while retaining supported facts.

Automatic editing of text by computer is an important application, which can help human writers to write better documents in terms of accuracy, fluency, etc. The task is easier and more practical than the automatic generation of texts from scratch and is attracting attention recently Yang et al. (2017); Yin et al. (2019). In this paper, we consider a new and specific setting of it, referred to as fact-based text editing, in which a draft text and several facts (represented in triples) are given, and the system aims to revise the text by adding missing facts and deleting unsupported facts. Table 1 gives an example of the task.

As far as we know, no previous work did address the problem. In a text-to-text generation, given a text, the system automatically creates another text, where the new text can be a text in another language (machine translation), a summary of the original text (summarization), or a text in better form (text editing). In a table-to-text generation, given a table containing facts in triples, the system automatically composes a text, which describes the facts. The former is a text-to-text problem, and the latter a table-to-text problem. In comparison, fact-based text editing can be viewed as a ‘text&table-to-text’ problem.

First, we devise a method for automatically creating a dataset for fact-based text editing. Recently, several table-to-text datasets have been created and released, consisting of pairs of facts and corresponding descriptions. We leverage such kind of data in our method. We first retrieve facts and their descriptions. Next, we take the descriptions as revised texts and automatically generate draft texts based on the facts using several rules. We build two datasets for fact-based text editing on the basis of WebNLG Gardent et al. (2017) and RotoWire, consisting of 233k and 37k instances respectively Wiseman et al. (2017) 111The datasets are publicly available at https://github.com/isomap/factedit.

Second, we propose a model for fact-based text editing called FactEditor. One could employ an encoder-decoder model, such as an encoder-decoder model, to perform the task. The encoder-decoder model implicitly represents the actions for transforming the draft text into a revised text. In contrast, FactEditor explicitly represents the actions for text editing, including Keep, Drop, and Gen, which means retention, deletion, and generation of word respectively. The model utilizes a buffer for storing the draft text, a stream to store the revised text, and a memory for storing the facts. It also employs a neural network to control the entire editing process. FactEditor has a lower time complexity than the encoder-decoder model, and thus it can edit a text more efficiently.

Experimental results show that FactEditor outperforms the baseline model of using encoder-decoder for text editing in terms of fidelity and fluency, and also show that FactEditor can perform text editing faster than the encoder-decoder model.

2 Related Work

2.1 Text Editing

Text editing has been studied in different settings such as automatic post-editing Knight and Chander (1994); Simard et al. (2007); Yang et al. (2017), paraphrasing Dolan and Brockett (2005), sentence simplification Inui et al. (2003); Wubben et al. (2012), grammar error correction Ng et al. (2014), and text style transfer Shen et al. (2017); Hu et al. (2017).

The rise of encoder-decoder models Cho et al. (2014); Sutskever et al. (2014) as well as the attention Bahdanau et al. (2015); Vaswani et al. (2017) and copy mechanisms Gu et al. (2016); Gulcehre et al. (2016) has dramatically changed the landscape, and now one can perform the task relatively easily with an encoder-decoder model such as Transformer provided that a sufficient amount of data is available. For example, Li et al. (2018)

introduce a deep reinforcement learning framework for paraphrasing, consisting of a generator and an evaluator.

Yin et al. (2019) formalize the problem of text edit as learning and utilization of edit representations and propose an encoder-decoder model for the task. Zhao et al. (2018) integrate paraphrasing rules with the Transformer model for text simplification. Zhao et al. (2019) proposes a method for English grammar correction using a Transformer and copy mechanism.

Another approach to text editing is to view the problem as sequential tagging instead of encoder-decoder. In this way, the efficiency of learning and prediction can be significantly enhanced. Vu and Haffari (2018) and Dong et al. (2019) conduct automatic post-editing and text simplification on the basis of edit operations and employ Neural Programmer-Interpreter Reed and De Freitas (2016) to predict the sequence of edits given a sequence of words, where the edits include KEEP, DROP, and ADD. Malmi et al. (2019) propose a sequential tagging model that assigns a tag (KEEP or DELETE) to each word in the input sequence and also decides whether to add a phrase before the word. Our proposed approach is also based on sequential tagging of actions. It is designed for fact-based text editing, not text-to-text generation, however.

AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission that was operated by PATIENT-2 .
AGENT-1 served as PATIENT-3 was a crew member of the BRIDGE-1 mission .
AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission .
(a) Example for insertion. The revised template and the reference template share subsequences. The set of triple templates is {(BRIDGE-1, operator, PATIENT-2)}. Our method removes “that was operated by PATIENT-2” from the revised template to create the draft template .
AGENT-1 was created by BRIDGE-1 and PATIENT-2 .
The character of AGENT-1 , whose full name is PATIENT-1 , was created by BRIDGE-1 and PATIENT-2 .
AGENT-1 , whose full name is PATIENT-1 , was created by BRIDGE-1 and PATIENT-2 .
(b) Example for deletion. The revised template and the reference template share subsequences. The set of triple templates is {(AGENT-1, fullName, PATIENT-1)}. Our method copies “whose full name is PATIENT-1” from the reference template to create the draft template .
Table 2: Examples for insertion and deletion, where words in green are matched, words in gray are not matched, words in blue are copied, and words in orange are removed. Best viewed in color.

2.2 Table-to-Text Generation

Table-to-text generation is the task which aims to generate a text from structured data Reiter and Dale (2000); Gatt and Krahmer (2018), for example, a text from an infobox about a term in biology in wikipedia Lebret et al. (2016) and a description of restaurant from a structured representation Novikova et al. (2017). Encoder-decoder models can also be employed in table-to-text generation with structured data as input and generated text as output, for example, as in  Lebret et al. (2016). Puduppully et al. (2019) and Iso et al. (2019) propose utilizing an entity tracking module for document-level table-to-text generation.

One issue with table-to-text is that the style of generated texts can be diverse Iso et al. (2019). Researchers have developed methods to deal with the problem using other texts as templates Hashimoto et al. (2018); Guu et al. (2018); Peng et al. (2019). The difference between the approach and fact-based text editing is that the former is about table-to-text generation based on other texts, while the latter is about text-to-text generation based on structured data.

3 Data Creation

In this section, we describe our method of data creation for fact-based text editing. The method automatically constructs a dataset from an existing table-to-text dataset.

3.1 Data Sources

There are two benchmark datasets of table-to-text, WebNLG Gardent et al. (2017)222The data is available at https://github.com/ThiagoCF05/webnlg. We utilize version 1.5. and RotoWireWiseman et al. (2017)333We utilize the RotoWire-modified data provided by Iso et al. (2019) available at https://github.com/aistairc/rotowire-modified. The authors also provide an information extractor for processing the data.. We create two datasets on the basis of them, referred to as WebEdit and RotoEdit respectively. In the datasets, each instance consists of a table (structured data) and an associated text (unstructured data) describing almost the same content.444In RotoWire

, we discard redundant box-scores and unrelated sentences using the information extractor and heuristic rules.

.

For each instance, we take the table as triples of facts and the associated text as a revised text, and we automatically create a draft text. The set of triples is represented as . Each triple consists of subject, predicate, and object, denoted as . For simplicity, we refer to the nouns or noun phrases of subject and object simply as entities. The revised text is a sequence of words denoted as . The draft text is a sequence of words denoted as .

Given the set of triples and the revised text , we aim to create a draft text , such that is not in accordance with , in contrast to , and therefore text editing from to is needed.

3.2 Procedure

Our method first creates templates for all the sets of triples and revised texts and then constructs a draft text for each set of triples and revised text based on their related templates.

Creation of templates

For each instance, our method first delexicalizes the entity words in the set of triples and the revised text to obtain a set of triple templates and a revised template . For example, given {(Baymax, voice, Scott_Adsit)} and “Scott_Adsit does the voice for Baymax”, it produces the set of triple templates {(AGENT-1, voice, PATIENT-1)} and the revised template “AGENT-1 does the voice for PATIENT-1”. Our method then collects all the sets of triple templates and revised templates and retains them in a key-value store with being a key and being a value.

Creation of draft text

Next, our method constructs a draft text using a set of triple templates and a revised template . For simplicity, it only considers the use of either insertion or deletion in the text editing, and one can easily make an extension of it to a more complex setting. Note that the process of data creation is reverse to that of text editing.

Given a pair of and , our method retrieves another pair denoted as and , such that and have the longest common subsequences. We refer to as a reference template. There are two possibilities; is a subset or a superset of . (We ignore the case in which they are identical.) Our method then manages to change to a draft template denoted as on the basis of the relation between and . If , then the draft template created is for insertion, and if , then the draft template created is for deletion.

For insertion, the revised template and the reference template share subsequences, and the set of triples appear in but not in . Our method keeps the shared subsequences in , removes the subsequences in about , and copies the rest of words in , to create the draft template . Table 1(a) gives an example. The shared subsequences “AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission” are kept. The set of triple templates is {(BRIDGE-1, operator, PATIENT-2)}. The subsequence “that was operated by PATIENT-2” is removed. Note that the subsequence “served” is not copied because it is not shared by and .

For deletion, the revised template and the reference template share subsequences. The set of triples appear in but not in . Our method retains the shared subsequences in , copies the subsequences in about , and copies the rest of words in , to create the draft template . Table 1(b) gives an example. The subsequences “AGENT-1 was created by BRIDGE-1 and PATIENT-2” are retained. The set of triple templates is {(AGENT-1, fullName, PATIENT-1)}. The subsequence “whose full name is PATIENT-1” is copied. Note that the subsequence “the character of” is not copied because it is not shared by and .

After getting the draft template , our method lexicalizes it to obtain a draft text

, where the lexicons (entity words) are collected from the corresponding revised text

.

We obtain two datasets with our method, referred to as WebEdit and RotoEdit, respectively. Table 3 gives the statistics of the datasets.

In the WebEdit data, sometimes entities only appear in the ’s of triples. In such cases, we also make them appear in the ’s. To do so, we introduce an additional triple (ROOT, IsOf, ) for each , where ROOT is a dummy entity.

WebEdit RotoEdit
Train Valid Test Train Valid Test
181k 23k 29k 27k 5.3k 4.9k
4.1M 495k 624k 4.7M 904k 839k
4.2M 525k 649k 5.6M 1.1M 1.0M
403k 49k 62k 209k 40k 36k
Table 3: Statistics of WebEdit and RotoEdit, where # is the number of instances, # and # are the total numbers of words in the draft texts and the revised texts, respectively, and # is total the number of sentences.

4 FactEditor

In this section, we describe our proposed model for fact-based text editing referred to as FactEditor.

4.1 Model Architecture

FactEditor transforms a draft text into a revised text based on given triples. The model consists of three components, a buffer for storing the draft text and its representations, a stream for storing the revised text and its representations, and a memory for storing the triples and their representations, as shown in Figure 1.

FactEditor scans the text in the buffer, copies the parts of text from the buffer into the stream if they are described in the triples in the memory, deletes the parts of the text if they are not mentioned in the triples, and inserts new parts of next into the stream which is only presented in the triples.

The architecture of FactEditor is inspired by those in sentence parsing Dyer et al. (2015); Watanabe and Sumita (2015). The actual processing of FactEditor is to generate a sequence of words into the stream from the given sequence of words in the buffer and the set of triples in the memory. A neural network is employed to control the entire editing process.

4.2 Neural Network

Initialization

FactEditor first initializes the representations of content in the buffer, stream, and memory.

There is a feed-forward network associated with the memory, utilized to create the embeddings of triples. Let denote the number of triples. The embedding of triple is calculated as

where and denote parameters, denote the embeddings of subject, predicate, and object of triple , and

denotes vector concatenation.

There is a bi-directional LSTM associated with the buffer, utilized to create the embeddings of words of draft text. The embeddings are obtained as , where is the list of embeddings of words and is the list of representations of words, where denotes the number of words.

There is an LSTM associated with the stream for representing the hidden states of the stream. The first hidden state is initialized as

where and denotes parameters.

Action prediction

FactEditor predicts an action at each time using the LSTM. There are three types of action, namely Keep, Drop, and Gen. First, it composes a context vector of triples at time using attention

where is a weight calculated as

where and are parameters. Then, it creates the hidden state for action prediction at time

where and

denote parameters. Next, it calculates the probability of action

where denotes parameters, and chooses the action having the largest probability.

Action execution

FactEditor takes action based on the prediction result at time .

(a) The Keep action, where the top embedding of the buffer is popped and the concatenated vector is pushed into the stream LSTM.
(b) The Drop action, where the top embedding of the buffer is popped and the state in the stream is reused at the next time step .
(c) The Gen action, where the concatenated vector is pushed into the stream, and the top embedding of the buffer is reused at the next time step .
Figure 1: Actions of FactEditor.
Draft text Bakewell_pudding is Dessert that can be served Warm or cold .
Revised text Bakewell_pudding is Dessert that originates from Derbyshire_Dales .
Action sequence Keep Keep Keep Keep Gen(originates) Gen(from) Gen(Derbyshire_Dales)
Drop Drop Drop Drop Keep
Table 4: An example of action sequence derived from a draft text and revised text.

For Keep at time t, FactEditor pops the top embedding in the buffer, and feeds the combination of the top embedding and the context vector of triples into the stream, as shown in Fig. 0(a). The state of stream is updated with the LSTM as . FactEditor also copies the top word in the buffer into the stream.

For Drop at time t, FactEditor pops the top embedding in the buffer and proceeds to the next state, as shown in Fig. 0(b). The state of stream is updated as . Note that no word is inputted into the stream.

For Gen at time t, FactEditor does not pop the top embedding in the buffer. It feeds the combination of the context vector of triples and the linearly projected embedding of word into the stream, as shown in Fig. 0(c). The state of stream is updated with the LSTM as , where is the embedding of the generated word and denotes parameters. In addition, FactEditor copies the generated word into the stream.

FactEditor continues the actions until the buffer becomes empty.

Word generation

FactEditor generates a word at time , when the action is Gen,

where is parameters.

To avoid generation of OOV words, FactEditor exploits the copy mechanism. It calculates the probability of copying the object of triple

where and denote parameters, and is the object of triple . It also calculates the probability of gating

where and are parameters. Finally, it calculates the probability of generating a word through either generation or copying,

where it is assumed that the triples in the memory have the same subject and thus only objects need to be copied.

4.3 Model Learning

The conditional probability of sequence of actions given the set of triples and the sequence of input words can be written as

where is the conditional probability of action given state at time and denotes the number of actions.

The conditional probability of sequence of generated words given the sequence of actions can be written as

where is the conditional probability of generated word given action at time , which is calculated as

Note that not all positions have a generated word. In such a case, is simply a null word.

The learning of the model is carried out via supervised learning. The objective of learning is to minimize the negative log-likelihood of

and

where denotes the parameters.

A training instance consists of a pair of draft text and revised text, as well as a set of triples, denoted as , , and respectively. For each instance, our method derives a sequence of actions denoted as , in a similar way as that in Dong et al. (2019). It first finds the longest common sub-sequence between and , and then selects an action of Keep, Drop, or Gen at each position, according to how is obtained from and (cf., Tab. 4). Action Gen is preferred over action Drop when both are valid.

(a) Table-to-Text
(b) Text-to-Text
(c) EncDecEditor
Figure 2: Model architectures of the baselines. All models employ attention and copy mechanism.

4.4 Time Complexity

The time complexity of inference in FactEditor is , where is the number of words in the buffer, and is the number of triples. Scanning of data in the buffer is of complexity . The generation of action needs the execution of attention, which is of complexity . Usually, is much larger than .

4.5 Baseline

We consider a baseline method using the encoder-decoder architecture, which takes the set of triples and the draft text as input and generates a revised text. We refer to the method as EncDecEditor. The encoder of EncDecEditor is the same as that of FactEditor. The decoder is the standard attention and copy model, which creates and utilizes a context vector and predicts the next word at each time.

The time complexity of inference in EncDecEditor is (cf.,Britz et al. (2017)). Note that in fact-based text editing, usually is very large. That means that EncDecEditor is less efficient than FactEditor.

Model Fluency Fidelity
Bleu Sari Keep Add Delete EM P% R% F1%
Baselines
No-Editing 66.67 31.51 78.62 3.91 12.02. 0. 84.49 76.34 80.21
Table-to-Text 33.75 43.83 51.44 27.86 52.19 5.78 98.23 83.72 90.40
Text-to-Text 63.61 58.73 82.62 25.77 67.80 6.22 81.93 77.16 79.48
Fact-based text editing
EncDecEditor 71.03 69.59 89.49 43.82 75.48 20.96 98.06 87.56 92.51
FactEditor 75.68 72.20 91.84 47.69 77.07 24.80 96.88 89.74 93.17
(a) WebEdit
Model Fluency Fidelity
Bleu Sari Keep Add Delete EM P% R% F1%
Baselines
No-Editing 74.95 39.59 95.72 0.05 23.01 0. 92.92 65.02 76.51
Table-to-Text 24.87 23.30 39.12 14.78 16.00 0. 48.01 24.28 32.33
Text-to-Text 78.07 60.25 97.29 13.04 70.43 0.02 63.62 41.08 49.92
Fact-based text editing
EncDecEditor 83.36 71.46 97.69 44.02 72.69 2.49 78.80 52.21 62.81
FactEditor 84.43 74.72 98.41 41.50 84.24 2.65 78.84 52.30 63.39
(b) RotoEdit
Table 5: Performances of FactEditor and baselines on two datasets in terms of Fluency and Fidelity. EM stands for exact match.

5 Experiment

We conduct experiments to make comparison between FactEditor and the baselines using the two datasets WebEdit and RotoEdit.

5.1 Experiment Setup

The main baseline is the encoder-decoder model EncDecEditor, as explained above. We further consider three baselines, No-Editing, Table-to-Text, and Text-to-Text. In No-Editing, the draft text is directly used. In Table-to-Text, a revised text is generated from the triples using encoder-decoder. In Text-to-Text, a revised text is created from the draft text using the encoder-decoder model. Figure 2 gives illustrations of the baselines.

We evaluate the results of revised texts by the models from the viewpoint of fluency and fidelity. We utilize ExactMatch (EM), Bleu Papineni et al. (2002) and Sari Xu et al. (2016) scores555We use a modified version of SARI where equals , available at https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/sari_hook.py

as evaluation metrics for fluency. We also utilize precision, recall, and F1 score as evaluation metrics for fidelity. For

WebEdit, we extract the entities from the generated text and the reference text and then calculate the precision, recall, and F1 scores. For RotoEdit, we use the information extraction tool provided by Wiseman et al. (2017) for calculation of the scores.

For the embeddings of subject and object for both datasets and the embedding of the predicate for RotoEdit, we simply use the embedding lookup table. For the embedding of the predicate for WebEdit, we first tokenize the predicate, lookup the embeddings of lower-cased words from the table, and use averaged embedding to deal with the OOV problem Moryossef et al. (2019).

We tune the hyperparameters based on the

Bleu score on a development set. For WebEdit, we set the sizes of embeddings, buffers, and triples to 300, and set the size of the stream to 600. For RotoEdit, we set the size of embeddings to 100 and set the sizes of buffers, triples, and stream to 200. The initial learning rate is 2e-3, and AMSGrad is used for automatically adjusting the learning rate Reddi et al. (2018). Our implementation makes use of AllenNLP Gardner et al. (2018).

5.2 Experimental Results

Quantitative evaluation

We present the performances of our proposed model FactEditor and the baselines on fact-based text editing in Table 5. One can draw several conclusions from the results.

First, our proposed model, FactEditor, achieves significantly better performances than the main baseline, EncDecEditor, in terms of almost all measures. In particular, FactEditor obtains significant gains in Delete scores on both WebEdit and RotoEdit.

Second, the fact-based text editing models (FactEditor and EncDecEditor) significantly improve upon the other models in terms of fluency scores, and achieve similar performances in terms of fidelity scores.

Third, compared to No-Editing, Table-to-Text has higher fidelity scores, but lower fluency scores. Text-to-Text has almost the same fluency scores, but lower fidelity scores on RotoEdit.

Qualitative evaluation

We also manually evaluate 50 randomly sampled revised texts for WebEdit. We check whether the revised texts given by FactEditor and EncDecEditor include all the facts. We categorize the factual errors made by the two models. Table 6 shows the results. One can see that FactEditor covers more facts than EncDecEditor and has less factual errors than EncDecEditor.

Covered facts Factual errors
 CQT  UPara  Rpt  Ms  USup  DRel
EncDecEditor 14 7 16 21 3 12
FactEditor 24 4 9 19 1 3
Table 6: Evaluation results on 50 randomly sampled revised texts in WebEdit in terms of numbers of correct editing (CQT), unnecessary paraphrasing (UPara), repetition (Rpt), missing facts (Ms), unsupported facts (USup) and different relations (DRel)

FactEditor has a larger number of correct editing (CQT) than EncDecEditor for fact-based text editing. In contrast, EncDecEditor often includes a larger number of unnecessary rephrasings (UPara) than FactEditor.

There are four types of factual errors: fact repetitions (Rpt), fact missings (Ms), fact unsupported (USup), and relation difference (DRel). Both FactEditor and EncDecEditor often fail to insert missing facts (Ms), but rarely insert unsupported facts (USup). EncDecEditor often generates the same facts multiple times (RPT) or facts in different relations (DRel). In contrast, FactEditor can seldomly make such errors.

Set of triples
{ (Ardmore_Airport, runwayLength, 1411.0),
(Ardmore_Airport, 3rd_runway_SurfaceType, Poaceae),
(Ardmore_Airport, operatingOrganisation, Civil_Aviation_Authority_of_New_Zealand),
(Ardmore_Airport, elevationAboveTheSeaLevel, 34.0),
(Ardmore_Airport, runwayName, 03R/21L)}
Draft text
Ardmore_Airport , ICAO Location Identifier UTAA . Ardmore_Airport 3rd runway
is made of Poaceae and Ardmore_Airport . 03R/21L is 1411.0 m long and Ardmore_Airport
is 34.0 above sea level .
Revised text
Ardmore_Airport is operated by Civil_Aviation_Authority_of_New_Zealand . Ardmore_Airport
3rd runway is made of Poaceae and Ardmore_Airport name is 03R/21L . 03R/21L is 1411.0 m long
and Ardmore_Airport is 34.0 above sea level .
EncDecEditor
Ardmore_Airport , ICAO Location Identifier UTAA , is operated by
Civil_Aviation_Authority_of_New_Zealand . Ardmore_Airport 3rd runway is made of Poaceae and
Ardmore_Airport . 03R/21L is 1411.0 m long and Ardmore_Airport is 34.0 m long .
FactEditor
Ardmore_Airport is operated by Civil_Aviation_Authority_of_New_Zealand . Ardmore_Airport
3rd runway is made of Poaceae and Ardmore_Airport . 03R/21L is 1411.0 m long and
Ardmore_Airport is 34.0 above sea level .
Table 7: Example of generated revised texts given by EncDecEditor and FactEditor on WebEdit. Entities in green appear in both the set of triples and the draft text. Entities in orange only appear in the draft text. Entities in blue should appear in the revised text but do not appear in the draft text.

Table 7 shows an example of results given by EncDecEditor and FactEditor. The revised texts of both EncDecEditor and FactEditor appear to be fluent, but that of FactEditor has higher fidelity than that of EncDecEditor. EncDecEditor cannot effectively eliminate the description about an unsupported fact (in orange) appearing in the draft text. In contrast, FactEditor can deal with the problem well. In addition, EncDecEditor conducts an unnecessary substitution in the draft text (underlined). FactEditor tends to avoid such unnecessary editing.

Runtime analysis

We conduct runtime analysis on FactEditor and the baselines in terms of number of processed words per second, on both WebEdit and RotoEdit. Table 8 gives the results when the batch size is 128 for all methods. Table-to-Text is the fastest, followed by FactEditor. FactEditor is always faster than EncDecEditor, apparently because it has a lower time complexity, as explained in Section 4. The texts in WebEdit are relatively short, and thus FactEditor and EncDecEditor have similar runtime speeds. In contrast, the texts in RotoEdit are relatively long, and thus FactEditor executes approximately two times faster than EncDecEditor.

WebEdit RotoEdit
Table-to-Text 4,083 1,834
Text-to-Text 2,751 581
EncDecEditor 2,487 505
FactEditor 3,295 1,412
Table 8: Runtime analysis (# of words/second). Table-to-Text always shows the fastest performance (Bold-faced). FactEditor shows the second fastest runtime performance (Underlined).

6 Conclusion

In this paper, we have defined a new task referred to as fact-based text editing and made two contributions to research on the problem. First, we have proposed a data construction method for fact-based text editing and created two datasets. Second, we have proposed a model for fact-based text editing, named FactEditor, which performs the task by generating a sequence of actions. Experimental results show that the proposed model FactEditor performs better and faster than the baselines, including an encoder-decoder model.

Acknowledgments

We would like to thank the reviewers for their insightful comments.

References

  • D. Bahdanau, K. Cho, and Y. Bengio (2015) Neural Machine Translation by Jointly Learning to Align and Translate. In International Conference on Learning Representations, External Links: Link Cited by: §2.1.
  • D. Britz, M. Guan, and M. Luong (2017) Efficient Attention using a Fixed-Size Memory Representation. In

    Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

    ,
    Copenhagen, Denmark, pp. 392–400. External Links: Link, Document Cited by: §4.5.
  • K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio (2014) Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1724–1734. External Links: Link, Document Cited by: §2.1.
  • W. B. Dolan and C. Brockett (2005) Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005), External Links: Link Cited by: §2.1.
  • Y. Dong, Z. Li, M. Rezagholizadeh, and J. C. K. Cheung (2019) EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 3393–3402. External Links: Link, Document Cited by: §2.1, §4.3.
  • C. Dyer, M. Ballesteros, W. Ling, A. Matthews, and N. A. Smith (2015)

    Transition-Based Dependency Parsing with Stack Long Short-Term Memory

    .
    In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, pp. 334–343. External Links: Link, Document Cited by: §4.1.
  • C. Gardent, A. Shimorina, S. Narayan, and L. Perez-Beltrachini (2017) Creating training corpora for NLG micro-planners. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 179–188. External Links: Link, Document Cited by: §1, §3.1.
  • M. Gardner, J. Grus, M. Neumann, O. Tafjord, P. Dasigi, N. F. Liu, M. Peters, M. Schmitz, and L. Zettlemoyer (2018) AllenNLP: A Deep Semantic Natural Language Processing Platform. In

    Proceedings of Workshop for NLP Open Source Software (NLP-OSS)

    ,
    Melbourne, Australia, pp. 1–6. External Links: Link, Document Cited by: §5.1.
  • A. Gatt and E. Krahmer (2018) Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation.

    Journal of Artificial Intelligence Research (JAIR)

    61, pp. 65–170.
    External Links: Link, https://arxiv.org/abs/1703.09902 Cited by: §2.2.
  • J. Gu, Z. Lu, H. Li, and V. O.K. Li (2016) Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 1631–1640. External Links: Link, Document Cited by: §2.1.
  • C. Gulcehre, S. Ahn, R. Nallapati, B. Zhou, and Y. Bengio (2016) Pointing the unknown words. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 140–149. External Links: Link, Document Cited by: §2.1.
  • K. Guu, T. B. Hashimoto, Y. Oren, and P. Liang (2018) Generating Sentences by Editing Prototypes. Transactions of the Association for Computational Linguistics 6, pp. 437–450. External Links: Link Cited by: §2.2.
  • T. B. Hashimoto, K. Guu, Y. Oren, and P. S. Liang (2018) A Retrieve-and-Edit Framework for Predicting Structured Outputs. In Advances in Neural Information Processing Systems, pp. 10052–10062. External Links: Link Cited by: §2.2.
  • Z. Hu, Z. Yang, X. Liang, R. Salakhutdinov, and E. P. Xing (2017) Toward Controlled Generation of Text. In

    Proceedings of the 34th International Conference on Machine Learning

    , D. Precup and Y. W. Teh (Eds.),
    Proceedings of Machine Learning Research, Vol. 70, International Convention Centre, Sydney, Australia, pp. 1587–1596. External Links: Link Cited by: §2.1.
  • K. Inui, A. Fujita, T. Takahashi, R. Iida, and T. Iwakura (2003) Text simplification for reading assistance: a project note. In Proceedings of the Second International Workshop on Paraphrasing, Sapporo, Japan, pp. 9–16. External Links: Link, Document Cited by: §2.1.
  • H. Iso, Y. Uehara, T. Ishigaki, H. Noji, E. Aramaki, I. Kobayashi, Y. Miyao, N. Okazaki, and H. Takamura (2019) Learning to Select, Track, and Generate for Data-to-Text. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy, pp. 2102–2113. External Links: Link Cited by: §2.2, §2.2, footnote 3.
  • K. Knight and I. Chander (1994) Automated Postediting of Documents. In Proceedings of the AAAI Conference on Artificial Intelligence., Vol. 94, pp. 779–784. External Links: Link Cited by: §2.1.
  • R. Lebret, D. Grangier, and M. Auli (2016) Neural Text Generation from Structured Data with Application to the Biography Domain. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pp. 1203–1213. External Links: Link, Document Cited by: §2.2.
  • Z. Li, X. Jiang, L. Shang, and H. Li (2018) Paraphrase Generation with Deep Reinforcement Learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 3865–3878. External Links: Link, Document Cited by: §2.1.
  • E. Malmi, S. Krause, S. Rothe, D. Mirylenka, and A. Severyn (2019) Encode, Tag, Realize: High-Precision Text Editing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 5053–5064. External Links: Link, Document Cited by: §2.1.
  • A. Moryossef, Y. Goldberg, and I. Dagan (2019) Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 2267–2277. External Links: Link, Document Cited by: §5.1.
  • H. T. Ng, S. M. Wu, T. Briscoe, C. Hadiwinoto, R. H. Susanto, and C. Bryant (2014) The CoNLL-2014 Shared Task on Grammatical Error Correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, Baltimore, Maryland, pp. 1–14. External Links: Link, Document Cited by: §2.1.
  • J. Novikova, O. Dušek, and V. Rieser (2017) The E2E Dataset: New Challenges For End-to-End Generation. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, Saarbrücken, Germany, pp. 201–206. External Links: Link, Document Cited by: §2.2.
  • K. Papineni, S. Roukos, T. Ward, and W. Zhu (2002) Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 311–318. External Links: Link, Document Cited by: §5.1.
  • H. Peng, A. Parikh, M. Faruqui, B. Dhingra, and D. Das (2019) Text Generation with Exemplar-based Adaptive Decoding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 2555–2565. External Links: Link, Document Cited by: §2.2.
  • R. Puduppully, L. Dong, and M. Lapata (2019) ”Data-to-text Generation with Entity Modeling”. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 2023–2035. External Links: Link, Document Cited by: §2.2.
  • S. J. Reddi, S. Kale, and S. Kumar (2018) On the convergence of adam and beyond. In International Conference on Learning Representations, External Links: Link Cited by: §5.1.
  • S. Reed and N. De Freitas (2016) Neural Programmer-Interpreters. In International Conference on Learning Representations, External Links: Link Cited by: §2.1.
  • E. Reiter and R. Dale (2000) Building Natural Language Generation Systems. Studies in Natural Language Processing, Cambridge University Press. External Links: Document Cited by: §2.2.
  • T. Shen, T. Lei, R. Barzilay, and T. Jaakkola (2017) Style Transfer from Non-Parallel Text by Cross-Alignment. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 6830–6841. External Links: Link Cited by: §2.1.
  • M. Simard, C. Goutte, and P. Isabelle (2007) Statistical Phrase-Based Post-Editing. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, Rochester, New York, pp. 508–515. External Links: Link Cited by: §2.1.
  • I. Sutskever, O. Vinyals, and Q. V. Le (2014) Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), pp. 3104–3112. External Links: Link Cited by: §2.1.
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is All you Need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 5998–6008. External Links: Link Cited by: §2.1.
  • T. Vu and G. Haffari (2018) Automatic Post-Editing of Machine Translation: A Neural Programmer-Interpreter Approach. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 3048–3053. External Links: Link, Document Cited by: §2.1.
  • T. Watanabe and E. Sumita (2015) Transition-based neural constituent parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, pp. 1169–1179. External Links: Link, Document Cited by: §4.1.
  • S. Wiseman, S. Shieber, and A. Rush (2017) ”Challenges in Data-to-Document Generation”. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 2253–2263. External Links: Link, Document Cited by: §1, §3.1, §5.1.
  • S. Wubben, A. van den Bosch, and E. Krahmer (2012) Sentence simplification by monolingual machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jeju Island, Korea, pp. 1015–1024. External Links: Link Cited by: §2.1.
  • W. Xu, C. Napoles, E. Pavlick, Q. Chen, and C. Callison-Burch (2016) Optimizing Statistical Machine Translation for Text Simplification. Transactions of the Association for Computational Linguistics 4, pp. 401–415. External Links: Link, Document Cited by: §5.1.
  • D. Yang, A. Halfaker, R. Kraut, and E. Hovy (2017) Identifying Semantic Edit Intentions from Revisions in Wikipedia. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 2000–2010. External Links: Link, Document Cited by: §1, §2.1.
  • P. Yin, G. Neubig, M. Allamanis, M. Brockschmidt, and A. L. Gaunt (2019) Learning to Represent Edits. In International Conference on Learning Representations, External Links: Link Cited by: §1, §2.1.
  • S. Zhao, R. Meng, D. He, A. Saptono, and B. Parmanto (2018) Integrating Transformer and Paraphrase Rules for Sentence Simplification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 3164–3173. External Links: Link, Document Cited by: §2.1.
  • W. Zhao, L. Wang, K. Shen, R. Jia, and J. Liu (2019) ”Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data”. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 156–165. External Links: Link, Document Cited by: §2.1.