Improving Truthfulness of Headline Generation
Most studies on abstractive summarization re-port ROUGE scores between system and ref-erence summaries. However, we have a con-cern about thetruthfulnessof generated sum-maries: whether all facts of a generated sum-mary are mentioned in the source text. Thispaper explores improving the truthfulness inheadline generation on two popular datasets.Analyzing headlines generated by the state-of-the-art encoder-decoder model, we showthat the model sometimes generates untruthfulheadlines. We conjecture that one of the rea-sons lies in untruthful supervision data usedfor training the model. In order to quantifythe truthfulness of article-headline pairs, weconsider the textual entailment of whether anarticle entails its headline. After confirmingquite a few untruthful instances in the datasets,this study hypothesizes that removing untruth-ful instances from the supervision data mayremedy the problem of the untruthful behav-iors of the model. Building a binary classifierthat predicts an entailment relation between anarticle and its headline, we filter out untruth-ful instances from the supervision data. Exper-imental results demonstrate that the headlinegeneration model trained on filtered supervi-sion data shows no clear difference in ROUGEscores but remarkable improvements in auto-matic and manual evaluations of the generatedheadlines.
READ FULL TEXT