CIKM‘2021: SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary
Sports game summarization aims at generating sports news from live commentaries. However, existing datasets are all constructed through automated collection and cleaning processes, resulting in a lot of noise. Besides, current works neglect the knowledge gap between live commentaries and sports news, which limits the performance of sports game summarization. In this paper, we introduce K-SportsSum, a new dataset with two characteristics: (1) K-SportsSum collects a large amount of data from massive games. It has 7,854 commentary-news pairs. To improve the quality, K-SportsSum employs a manual cleaning process; (2) Different from existing datasets, to narrow the knowledge gap, K-SportsSum further provides a large-scale knowledge corpus that contains the information of 523 sports teams and 14,724 sports players. Additionally, we also introduce a knowledge-enhanced summarizer that utilizes both live commentaries and the knowledge to generate sports news. Extensive experiments on K-SportsSum and SportsSum datasets show that our model achieves new state-of-the-art performances. Qualitative analysis and human study further verify that our model generates more informative sports news.READ FULL TEXT VIEW PDF
CIKM‘2021: SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary
In recent years, a large number of sports games are carried out every day, and it is demanding to report corresponding news articles after games. Meanwhile, manually writing sports news is labor-intensive for professional editors. Therefore, how to automatically generate sports news has gradually attracted attention from both the research communities and industries. As the example shown in Fig. 1, Sports Game Summarization aims at generating sports news articles based on live commentaries (Zhang et al., 2016). Ideally, the generated sports news should record the core events of a game that could help people efficiently catch up to games. Compared to the most prior work on traditional text summarization, the challenges of sports game summarization lie in three aspects: (1) The live commentaries record the whole events of a game, usually reaching thousands of tokens that is far beyond the typical 512 token limits of BERT-style pre-trained models; (2) The live commentaries have a different text style from the sports news. Specifically, commentaries are more informal and colloquial; (3) There is a knowledge gap between commentaries and news. Sports news usually contains additional knowledge of sports teams or players, which cannot be obtained from corresponding commentaries (we will discuss more in Sec. 2.3).
Most existing works and datasets on sports game summarization treat the problem as a single document summarization task. Zhang et al. (Zhang et al., 2016) discuss this task for the first time and build the first dataset which has 150 commentary-news pairs. Wan et al. (Wan et al., 2016) also contribute a dataset with 900 samples for NLPCC 2016 shared task. Early methods design diverse strategies to select key commentary sentences, and then either form the sports news directly (Zhang et al., 2016; Zhu et al., 2016; Yao et al., 2017) or relies on human-constructed templates to generate final news (Liu et al., 2016; Lv et al., 2020). Recently, Huang et al. (Huang et al., 2020) present SportsSum, the first large-scale sports game summarization dataset with 5,428 samples. They crawl live commentaries and sports news from sports reports websites, and further adopt a rule-based cleaning process to clean the data. Specifically, they first remove all HTML tags, and then for each news article, remove the descriptions before starting keywords which indicate the start of a game (e.g., “at the beginning of the game”). This is because there usually exist descriptions (e.g., matching history) at the beginning of news articles, which cannot be directly inferred from the corresponding commentaries. Huang et al. (Huang et al., 2020) also extend early methods (Zhang et al., 2016; Zhu et al., 2016; Yao et al., 2017; Liu et al., 2016; Lv et al., 2020) to a state-of-the-art two-step summarization framework, where each selected key commentary sentence is further rewritten to a news sentence through seq2seq models.
Despite above progress, there are two shortcomings in the previous works. First, the previous datasets are limited in either scale or quality111We note that another concurrent work SportsSum2.0 (Wang et al., 2021) also employs manual cleaning process on large-scale sports game summarization dataset. However, they only clean the original SportsSum (Huang et al., 2020) dataset. There are three major differences between our dataset and SportsSum2.0: (1) the scale of our dataset is 1.45 times theirs; (2) our manual cleaning process also remove the history-related descriptions in the middle of news articles which is neglected by SportsSum2.0; (3) our dataset also provide a large-scale knowledge corpus to alleviate the knowledge gap issue.. The scale of early datasets (Zhang et al., 2016; Wan et al., 2016) is less than 1,000 samples, which cannot be utilized to explore sophisticated supervised models. SportsSum (Huang et al., 2020) is many times larger than early datasets, but as shown in Fig. 2, we find more than 15% of news articles in SportsSum have noisy sentences due to its simple rule-based cleaning process (detailed in Sec. 2.2). Second, the knowledge gap between live commentaries and sports news is neglected by previous works. Though SportsSum removes a part of descriptions at the beginning of news to alleviate the knowledge gap to some extent, there are still many other descriptions leading to the knowledge gap. Moreover, their two-step framework neglects the gap.
In this paper, we introduce K-SportsSum, a large-scale human-cleaned sports game summarization dataset which is constructed with the following features: (1) In order to improve both the scale and the quality of dataset, K-SportsSum collects a large amount of data from massive games and further employs a strict manual cleaning process to denoise news articles. It is a large scale sports game summarization dataset, which consists of 7,854 high quality commentary-news pairs; (2) To narrow the knowledge gap between live commentaries and sports news, K-SportsSum also provides an abundant knowledge corpus including the information of 523 sports teams and 14,724 sports players. Additionally, in the aspect of model design, we propose a knowledge-enhanced summarizer that first selects key commentary sentences, and then considers the information of the knowledge corpus during rewriting each selected sentence to a news sentence so as to form final news. The experimental results on K-SportsSum and SportsSum (Huang et al., 2020) datasets show that our model achieves new state-of-the-art performances. We further conduct qualitative analysis and human study to verify that our model generates more informative sports news.
We highlight our contributions as follows:
We introduce a new sports game summarization dataset, i.e., K-SportsSum, which contains 7,854 human-cleaned samples. To the best of our knowledge, K-SportsSum is currently the highest quality and largest sports game summarization dataset222We release the data at https://github.com/krystalan/K-SportsSum.
In order to narrow the knowledge gap between commentaries and news, we also provide an abundant knowledge corpus containing the information of 523 sports teams and 14,724 sports players.
A knowledge-enhanced summarizer is proposed to take the information of knowledge corpus into account when generating sports news. It is the first sports game summarization model which considers the knowledge gap issue.
The experimental results show our model achieves a new state-of-the-art performance on both K-SportsSum and SportsSum datasets. Qualitative analysis and human study further verify that our model generates better sports news.
In this section, we first show how we collect live commentary documents and news articles from Sina Sports Live (§ 2.1). Secondly, we analyze the noise existed in collected text and introduce the manual cleaning process to denoise the news articles (§ 2.2). Thirdly, we discuss the collection process of knowledge corpus (§ 2.3). Finally, we give the details of benchmark settings (§ 2.4).
Following previous works (Zhang et al., 2016; Wan et al., 2016; Huang et al., 2020), we crawl the records of football games from Sina Sports Live333http://match.sports.sina.com.cn/index.html, the most influential football live services in China. Note that the existing research of sports game summarization is oriented to the football games which are the easiest to collect, but the methods and discussions can trivially generalize to other types of sports games. After crawling all football games data from 2012 to 2020, we remove HTML tags and obtain 8,640 live commentary documents together with corresponding news articles.
|Home or visiting team||14.7||The home team broke the deadlock in the 58th minute, Brian took a free kick and Lucchini scored|
|the goal. In the 77th minute, Nica…|
|Player Information||6.3||Paris exceeded it again in the 26th minute! Motta took a corner kick from the right. Verratti, who is|
|1.65 meters tall, pushed into the near corner at a small angle 6 meters away from the goal, 2-1. …|
|After another 3 minutes, Keita broke through Campagnaro and Brugman and passed back. Immobile|
|shot to the top of the goal. He faced the previous teams without celebrating the goal. …|
|Team Information||4.7||Purple Lily equalized in the 29th minute, Tomovic broke through Ljajic and Duoduo from the right.|
|The last Champions League runner-up completely controlled the game after the opening. Reus|
|shot higher once with two feet and was rescued by Mandanda once. …|
We find that the live commentaries are of high-quality due to the structured form. Nevertheless, the news articles are unstructured text usually containing noises. Specifically, we divide all noises into four types:
Description of other games: We find about 12% of the news pages contain multiple news articles belonging to different games, which has been neglected by SportsSum (Huang et al., 2020), resulting in 2.2% (119/5428) of news articles include descriptions of other games.
Advertisements: There are advertisements often appear in news articles. We find about 9.3% (505/5428) of news articles in SportsS
um have such noise.
Irrelevant hyperlink texts: Many new sentences contain hyperlink texts. Some of them can be regarded as part of the news, but others are irrelevant to current news, which we called irrelevant hyperlink texts. About 0.6% (31/5428) samples of SportsSum have this kind of noise.
History-related descriptions: There are amount of news articles containing history-related descriptions which cannot be inferred from the corresponding live commentary document. SportsSum adopts a rule-based approach to remove all news sentences before starting keywords to alleviate part of this noise. However, this approach cannot correctly dispose about 4.6% (252/5428) of the news articles, because the rule-based approach cannot cover all situations. In addition, history-related descriptions may not only appear at the beginning of the news. Many news articles often introduce the matching history in the middle, e.g., if a player scores, the news may introduce the recent outstanding performance of the player in the previous rounds, or count his (her) total goals in the current season. We also explain why we consider the history-related descriptions as noise rather than the knowledge gap between commentaries and news in Sec. 2.3.
To reduce the noise of news articles, we design a manual cleaning process with special consideration for sports game summarization. Fig. 3(a) shows the manual cleaning process. We first remove the description of other games. Secondly, we delete advertisements and irrelevant hyperlink text. Finally, the history-related descriptions of the game are identified and removed.
The interface of manual cleaning is shown in Fig. 3(b). We recruit 9 master students, who are native Chinese speakers to perform the manual cleaning process. Firstly, we randomly select 50 news articles as test samples and ask all students to clean them at the same time. Based on the results, we decide 7 annotators and 2 senior annotators. After that, each news article will be randomly assigned to 2 annotators. If the cleaning results are inconsistent, it will be determined by a third senior annotator. Finally, all the cleaning results are checked by another two data experts. If the data experts think that the result does not meet the requirements, the news article will be assigned again.
After the manual cleaning process, we discover some news articles do not contain the description of the current game, or contain little information, e.g., only include the results of the game. We discard these news articles and retain 7,854 high-quality manually cleaned news articles which together with the corresponding live commentary documents constitute the K-SportsSum dataset.
As we mentioned in Sec. 2.2, the history-related descriptions in original news articles are regarded as noise and have been removed during the manual cleaning process due to the following reasons: (1) we follow the settings of SportsSum (Huang et al., 2020) which adopt a rule-based approach to remove all sentences before the starting keywords. Most of the removed sentences are history-related descriptions; (2) the goal of sports game summarization is to generate news articles that can record the key events of the current sports games. The history-related descriptions make little contribution to the goal.
To investigate if there are still other descriptions in the news articles, which also cannot be inferred from the corresponding live commentary document, we randomly select 300 samples from K-SportsSum and manually analyze whether the news articles contain additional knowledge that cannot be obtained from commentary documents. Tab. 1 shows statistics of required additional knowledge: (1) Some news articles replace the name of football teams with “home team” or “visiting team”, which cannot be explicitly obtain from commentary documents; (2) Some news includes the personal information of players, e.g., height, birthday, previous teams; (3) A number of news articles contain prior knowledge of football teams, such as nickname (“Purple Lily” is the nickname of ACF Fiorentina, which is shown in the penultimate line of Tab. 1) and past honors.
To narrow the knowledge gap between commentaries and news, we construct a knowledge corpus whose collection process contains the following three steps:
Step 1: Metadata Collection. For each game, as shown in Fig. 4, Sina Sports Live also provides a metadata page that can link to related player pages and team pages. Note that each team (or player) has a unique corresponding team (or player) page. After crawling 8,640 metadata pages of all games in K-SportsSum, we obtain 559 URLs of team pages and 15591 URLs of player pages.
Step 2: Player Knowledge Collection. The player pages provided by Sina Sports Live contain structured knowledge cards describing players in ten aspects (i.e., name, birthday, age, etc.). We crawl these pages and obtain the 14,724 players’ structured knowledge cards. Then we convert each knowledge card to a passage through several rule-based sentence templates (e.g., an item of knowledge card “Ronaldo”, “birthday”, “September 18, 1976” can be converted to a sentence “Ronaldo’s birthday is September 18, 1976”). In this way, we obtain 14,724 player passages.
Step 3: Team Knowledge Collection. Though Sina Sports Live offers the team pages, we find that most of them are less informative than player pages. To construct an informative knowledge corpus, we decide to manually align these 559 team pages to Wikipedia articles444https://zh.wikipedia.org/ in which we further extract plain text555We crawl the Wikipedia articles and then use wikiextractor tool (https://github.com/attardi/wikiextractor) to extract plain text. to form our knowledge corpus. Three master students and two data experts are recruited to perform the alignment task. Each team page is assigned to three students, if their results are inconsistent, the final result is decided by a group meeting with these five persons. Eventually, we assign 523 out of 559 team pages to corresponding Wikipedia articles.
Finally, our knowledge corpus contains 523 team articles and 14,724 player passages. The corpus also provides the link and alignment relations among metadata pages, player pages, team pages and Wikipedia articles. Thus, for a given game, we can accurately retrieve the related articles and passages.
|Datasets||# Examples||News article||Live commentary document|
|# Tokens||# Words||# Sent.||# Tokens||# Words||# Sent.|
|Avg.||95th pctl.||Avg.||95th pctl.||Avg.||95th pctl.||Avg.||95th pctl.||Avg.||95th pctl.||Avg.||95th pctl.|
|Zhang et al. (Zhang et al., 2016)||150||-||-||-||-||-||-||-||-||-||-||-||-|
|NLPCC 2016 shared task (Wan et al., 2016)||900||-||-||-||-||-||-||-||-||-||-||-||-|
|SportsSum(Huang et al., 2020)||5428||801.11||1558||427.98||924||23.80||39||3459.97||5354||1825.63||3133||193.77||379|
We randomly select 500 samples and other 500 samples from K-SportsSum to form development set and testing set. The remaining 6,854 samples constitute training set.
In this section, we analyze various aspects of K-SportsSum to provide a deeper understanding of the dataset and task of sports game summarization.
Data Size. Tab. 2 shows the statistics of K-SportsSum and previous datasets. The K-SportsSum dataset is much larger than any other dataset.
News Article. The average number of tokens in K-SportsSum is 606.80 which is less than the counterpart in SportsSum (801.11) due to the manual cleaning process. Fig. 5
(a) shows the distributions of token length for news articles. This distribution obeys positive skewness distribution which indicates the length of most news articles is less than the average.
Live Commentary Document. The length of commentary document usually reaches thousands of tokens, which makes the task more challenging. The average number of tokens for commentary documents in SportsSum is 3459.97, which is longer than the counterpart in K-SportsSum, i.e., 2251.62. This is because SportsSum considers every commentary sentences in the document, but we only retain the commentary sentences which has timeline information, for the reason that most the commentary sentences without timeline information are irrelevant to the current game. The distributions of token length for live commentary documents in K-SportsSum are shown in Fig. 5(b). This distribution obeys the Gaussian mixture distribution because the live services in Sina Sports Live have been updated once. The length distribution of commentary documents is different before and after the update. Some commentary documents in K-SportsSum are provided by the old live services, while others are offered by the new one.
Knowledge Corpus. The knowledge corpus contains 523 team articles and 14,724 player passages. Each team article has 18.28 sentences or 1341.91 tokens on average. Each player passage contains 15.05 sentences or 283.49 tokens on average.
We propose knowledge-enhanced summarizer which generates sports news based on both live commentary document and knowledge corpus. Formally, a model is given a live commentary document together with a knowledge corpus and outputs a sports news article . represents -th news sentence and is -th commentary, where is the timeline information, denotes the current scores and is the commentary sentence.
The overview of our knowledge-enhanced summarizer is illustrated in Fig. 6. Firstly, we utilize a selector to extract key commentary sentences from the original commentary document (Sec. 4.1). Secondly, a knowledge retriever is used to obtain related passages and articles from the knowledge corpus for each selected commentary sentence (Sec. 4.2). Lastly, we make use of a seq2seq rewriter to generate news sentences based on the corresponding commentary sentences and retrieved passages/articles (Sec. 4.3). In order to train our selector and rewriter, we need labels to indicate the importance of commentary sentences and aligned commentary sentence, news sentence pairs. Following Huang et al. (Huang et al., 2020), we obtain these importance labels and sentence pairs through a sentence mapping process (Sec. 4.4).
The selector is used to extract key commentary sentences from all commentary sentences of a given live commentary document, which can be modeled as a text classification task. Different from the previous state-of-the-art two-step framework (Huang et al., 2020) purely utilizing TextCNN (Kim, 2014) as the selector and ignores the contexts of a commentary sentence, we make use of RoBERTa (Liu et al., 2019) to extract the contextual representation of a commentary sentence, and then predict its importance. Specifically, we input the target commentary sentence with its context to RoBERTa (Liu et al., 2019)
in a sliding window way. The representation of target sentence is obtained by averaging the output embedding of each token belonging to the target sentence. Finally, a sigmoid classifier is employed to predict the importance of the target commentary sentence. The cross-entropy loss is used as the training objective for the selector.
Given a selected commentary sentence and the knowledge corpus , the knowledge retriever first recognize the team and player entity mentions in , and then link the entity mentions to team articles or player passages from the corpus .
In order to recognize the players and teams mentioned in commentary sentence , we train a FLAT model (Li et al., 2020) (a state-of-the-art Chinese NER model which could better leverage the lattice information of Chinese characters sequences) on MSRA (Levow, 2006) (a general Chinese NER dataset), and then make use of the trained FLAT model to predict the entity mentions in . We only retain the PER and ORG entity mentions predicted by FLAT. Because the PER entity mentions indicate the players while the ORG entity mentions hint the teams.
For a given game, we can accurately retrieve dozens of candidate player passages and several (2 in most cases) candidate team articles through the link and alignment relations provided by knowledge corpus. The entity linking process needs to link each PER or ORG entity mention to the candidate passages/articles. Here, we employ a simple yet effective linking method, we calculate the normalized Levenshtein distance between each entity mention with the title666The standard name of each player is regarded as the title of player passages, while the team articles collected from Wikipedia already has the corresponding titles. of passages/articles:
where means the normalized Levenshtein distance, and represents the standard Levenshtein distance. indicates the number of characters within input sequences.
Each PER (or ORG) entity mention is linked to the corresponding player passage (or team article) whose title has the nearest normalized Levenshtein distance with the entity mention if the nearest distance within a predefined threshold (or ). Otherwise, we do not link the entity mention. Eventually, for a given commentary sentence , we obtain a number of linked PER/ORG entity mention, corresponding passage/article pairs.
As shown in Fig. 7, we make use of mT5 (Xue et al., 2021), a pre-trained multilingual language model777Since there is no Chinese version of T5 for public use, we choose its multilingual version, i.e., mT5., as our rewriter to generate a news sentence based on a given commentary sentence , its timeline information and the linked PER/ORG entity mention, corresponding passage/article pairs. Specifically, we tokenize the temporal phrase “In the -minute” and using mT5’s tokenizer, then form the input as <s> temporal phrase </s> commentary sentence </s>. The input embeddings of each token consist of a segment embedding and a knowledge embedding in addition to a token embedding and a position embedding within the input sequence:
where denotes the fused embedding of the -th token in the input sequence. represents layer normalization.
We explain the two additional embeddings below.
Segment embeddings. To convey the semantics of fine-grained types of tokens, we use four learnable segment embeddings ([Playe
r], [Team], [Time] and [Other]) to indicate a token belongs to: (1) linked PER entity mentions; (2) linked ORG entity mentions; (3) the temporal phrase; (4) none of above.
Knowledge embeddings. For a token belonging to the linked PER/ORG entity mentions, we consider the representation of corresponding passage/article as its knowledge embedding. In detail, we first input each sentence of a given passage/article to a pre-trained RoBERTa (Liu et al., 2019) one by one and use the output embedding of [CLS] token as the sentence embedding. Then, the representation of the whole passage/article is the average of all sentences’ embeddings.
The input embeddings are further passed to the mT5-encoder and generate the news sentence in the sequence-to-sequence learning process with the negative log-likelihood loss. All generated news sentences are concatenated to form final sports news.
To train the selector and rewriter, we need (1) labels to indicate the importance of each commentary sentence and (2) a large number of commentary sentence, news sentence pairs, respectively. To obtain the above labels and pairs, we map each news sentence to its commentary sentence through the same sentence mapping process as Huang et al. (Huang et al., 2020). Specifically, although there is no explicit timeline information on news articles, we find about 40% news sentences in K-SportsSum begin with “in the n-th minutes”. For each news sentence beginning with “in the -th minutes”, we first obtain its time information . Then, we consider commentary sentences whose corresponding timeline as candidate mapping set of . Lastly, we calculate BERTScore (Zhang et al., 2020) (a metric to measure the sentence similarity) of and all the commentary sentences in candidate mapping set. The commentary sentence with the highest score is paired with . With the above process, we can obtain a large number of pairs of a mapped commentary sentence and a news sentence, which can be used to train our rewriter. Furthermore, the commentary sentence appearing in the pairs will be regarded as important while others are insignificant, which could be used as the training data for our selector.
|SportsSUM (Huang et al., 2020)||44.89||19.04||44.16||43.17||18.66||42.27|
|Commentary Sentence||T.||KES (w/o know.)||KES|
|Barcelona attacked on the left side, Semedo passed the ball to the front of the small restricted area, Suarez scored the ball!!!||27’||Barcelona broke the deadlock in the 27th minute, Semedo crossed from the left and Suarez scored in front of the small restricted area.||Defending champion broke the deadlock in the 27th minute, Semedo crossed from the left and Suarez scored in front of the small restricted area.|
|De Yang was replaced and Song Boxuan came on as a substitute||45’||In the 45th minute, De Yang was replaced by Song Boxuan||In the 45th minute, De Yang was replaced by Song Boxuan, who is 1.7 meters height.|
We train all models on one 32GB Tesla V100 GPU. Our knowledge-enhanced summarizer is implemented based on RoBERTa (Liu et al., 2019) and mT5 (Xue et al., 2021) of huggingface Transformers library (Wolf et al., 2020) with default settings. In detail, we utilize RoBERTa-Large (12 layers with 1024 hidden size) to initialize our selector, and mT5-Large (24 layers with 1024 hidden size) as our rewriter. Another fixed RoBERTa-Large in our rewriter is used to calculate the knowledge embedding for a number of input tokens. For knowledge retriever, we train a FLAT (Li et al., 2020) NER model on a general Chinese NER dataset, i.e., MSRA (Levow, 2006). The trained FLAT model achieves 94.21 F1 score on the MSRA testing set, which is similar to the original paper. The predefined threshold and
used in the entity linking process are 0.2 and 0.25, respectively. We set the hyperparameters based on the preliminary experiments on the development set. We use minimal hyperparameter tuning using Learning Rates (LRs) in [1e-5, 2e-5, 3e-5] and epochs of 3 to 10. We find the selector with LR of 3e-5 and 5 epochs to work best. The best configuration for the mT5 rewriter is 2e-5 lr, 7epochs. For all models, we set the batch size to 32, use Adam optimizer with a default initial momentum and adopt linear warmup in the first 500 steps.
We compare our knowledge enhanced summarizer (i.e., KES) with several general text summarization models and the current state-of-the-art two-step model proposed by Huang et al. (Huang et al., 2020)
(i.e., SportsSUM). Note that the baseline models do not include the state-of-the-art pre-trained encoder-decoder language models (e.g., T5(Raffel et al., 2020) and BART (Lewis et al., 2020)) due to their limitation with long text. Tab. 3 shows that our model outperforms the baselines on both K-SportsSum and SportsSum datasets in terms of ROUGE scores. Specifically, TextRank (Mihalcea and Tarau, 2004) and PacSum (Zheng and Lapata, 2019) are two typical unsupervised extractive summarization models. Roberta-Large is used as a supervised extractive summarization model in the same way as our selector. These three models achieve limited performances due to different text styles between commentaries and news. Abs-LSTM and Abs-PGNet (See et al., 2017) are two abstractive summarization models which dispose of sports game summarization in an end2end sequence-to-sequence learning way. They outperform extractive models, because they take different text styles into account. Nevertheless, both LSTM and PGNet could not better model the long-distance dependency in the input sequence. Thanks to the appearance of transformer model (Vaswani et al., 2017), an encoder-decoder architecture which makes use of self-attention mechanism to model the long-distance dependency, many pre-trained encoder-decoder language models have been proposed one after another such as BART and T5. However, they cannot be direct used for sports game summarization because the input limitations of T5 and BART are 512 tokens and 1,024 tokens, respectively. The state-of-the-art baseline SportsSUM (Huang et al., 2020) uses TextCNN selector and PGNet rewriter to achieve better results than the above models, where the selector could effectively handle the long commentaries text while the rewriter alleviates the different styles issue. Despite its better performance, SportsSUM neglects the knowledge gap between live commentaries and sports news. Our knowledge-enhanced summarizer uses the additional corpus to alleviate the knowledge gap, together with the advanced selector and rewriter to achieve a new state-of-the-art performance. Since K-SportsSum and SportsSum are both collected from Sina Sports Live, for each game in SportsSum, we also can accurately retrieve related passages and articles from the corpus, and then train our knowledge-enhanced summarizer.
We run 5 ablations, modifying various settings of our knowledge-enhanced summarizer: (1) remove segment embeddings in knowledge retriever; (2) remove knowledge embeddings in knowledge retriever; (3) remove both segment embeddings and knowledge embeddings; (4) replace mT5 rewriter with PGN rewriter (it is worth noting that the PGN rewriter is based on LSTM, which cannot utilize segment embeddings and knowledge embeddings); (5) replace Roberta-Large selector with TextCNN selector.
The effect of these ablations on K-SportsSum development set is shown in Tab. 5. In each case, the average ROUGE score is lower than our origin knowledge-enhanced summarizer, which justifies the rationality of our model.
Tab. 4 shows the news sentences generated by (a) our original knowledge-enhanced summarizer, i.e., KES and (b) the variant model which removes knowledge embeddings in knowledge retriever, i.e., KES (w/o know.). As shown, the news sentences generated by original KES are more informative than the counterpart by KES (w/o know.). Our knowledge-enhanced summarizer implicitly makes use of additional knowledge by fusing the knowledge embedding into the pre-trained language model (mT5 in our experiments). Though this implicit way could help the model to generate informative sports news, we also find that this way may lead to wrong facts. As the second example shown in Tab. 4, the generated news sentence describes the height of De Yang is 1.7 meters. However, the actual height of De Yang is 1.8 meters. This finding implies that KES has learned the pattern of adding additional knowledge to news sentences but it is still challenging to generate correct descriptions.
|KES (w/o seg.)||39.41/-0.53|
|KES (w/o know.)||39.22/-0.72|
|KES (w/o seg.&know.)||38.23/-1.71|
|KES (PGN rewriter)||37.02/-2.92|
|KES (TextCNN selector)||37.72/-2.22|
We conduct human studies to further evaluate the sports news generated by different methods, i.e., KES, KES (w/o know.) and SportsSUM (Huang et al., 2020). Five master students are recruited and each student evaluates 50 samples for each method. The evaluator scores generated sports news in terms of informativeness, fluency and overall quality with a 3-point scale.
Fig. 8 shows the human evaluation results. KES outperforms KES (w/o know.) and SportsSUM on all three aspects, which verifies that our original KES performances better on generating sports news. What is more, the fluency of sports news generated by KES is better than the counterpart by KES (w/o know.), which indicates taking the knowledge gap into account when generating news could also improve its fluency.
We can conclude from the above experiments and analysis that sports game summarization is more challenging than traditional text summarization. We believe the following research directions are worth following: (1) Exploring models explicitly utilizing knowledge; (2) Leveraging long text pre-trained model (e.g., Longformer (Beltagy et al., 2020) and ETC (Ainslie et al., 2020)) to deal with sports game summarization task.
Text Summarization aims at preserving the main information of one or multiple documents with a relatively short text (Rush et al., 2015; Chopra et al., 2016; Nallapati et al., 2016). Our paper focuses on sports game summarization, a challenging branch of text summarization. Early literature mainly explores different strategies on limited-scale datasets (Zhang et al., 2016; Wan et al., 2016) to first select key commentary sentences, and then either form the sports news directly (Zhang et al., 2016; Zhu et al., 2016; Yao et al., 2017) or relies on human-constructed templates to generate final news (Liu et al., 2016; Lv et al., 2020). Specifically, Zhang et al. (Zhang et al., 2016) extract different features (e.g., the number of words, keywords and stop words) of commentary sentences, and then utilize a learning to rank (LTR) model to select key commentary sentences so as to form news. Yao et al. (Yao et al., 2017) take the description style and the importance of the described behavior into account during key commentary sentences selection. Zhu et al. (Zhu et al., 2016) model the sentences selection process as a sequence tagging task and deal with it using Conditional Random Field (CRF). Lv et al. (Lv et al., 2020)
make use of Convolutional Neural Network (CNN) to select key commentary sentences and further adopt pre-defined sentence templates to generate final news. Recently, Huang et al.(Huang et al., 2020) present SportsSum, the first large-scale sports game summarization dataset. They also discuss a state-of-the-art two-step framework which first selects key commentary sentences, and then rewrites each selected sentence to a news sentence through seq2seq models. Despite its great contributions, there are many noises in SportsSum due to its simple rule-based data cleaning process.
In conclusion, we propose K-SportsSum, a large-scale human-cleaned sports game summarization benchmark. In order to narrow the knowledge gap between live commentaries and sports news, K-SportsSum also provides a knowledge corpus containing the information of sports teams and players. Additionally, a knowledge-enhanced summarizer is presented to harness external knowledge for generating more informative sports news. We have conducted extensive experiments to verify the effectiveness of the proposed method on two datasets compared with current state-of-the-art baselines via quantitative analysis, qualitative analysis and human study.
Abstractive sentence summarization with attentive recurrent neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, pp. 93–98. External Links: Cited by: §6.
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, Suzhou, China, pp. 609–615. External Links: Cited by: §1, §1, §1, 1st item, §2.1, §2.3, Table 2, §4.1, §4.4, Table 3, §4, §5.2.1, §5.4, §6, footnote 1.
Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1746–1751. External Links: Cited by: §4.1.
The third international chinese language processing bakeoff: word segmentation and named entity recognition. In SIGHAN@COLING/ACL, Cited by: §4.2.1, §5.1.
Exploring the limits of transfer learning with a unified text-to-text transformer. ArXiv abs/1910.10683. Cited by: §5.2.1.
A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 379–389. External Links: Cited by: §6.