Growing Story Forest Online from Massive Breaking News

03/01/2018
by   Bang Liu, et al.
0

We describe our experience of implementing a news content organization system at Tencent that discovers events from vast streams of breaking news and evolves news story structures in an online fashion. Our real-world system has distinct requirements in contrast to previous studies on topic detection and tracking (TDT) and event timeline or graph generation, in that we 1) need to accurately and quickly extract distinguishable events from massive streams of long text documents that cover diverse topics and contain highly redundant information, and 2) must develop the structures of event stories in an online manner, without repeatedly restructuring previously formed stories, in order to guarantee a consistent user viewing experience. In solving these challenges, we propose Story Forest, a set of online schemes that automatically clusters streaming documents into events, while connecting related events in growing trees to tell evolving stories. We conducted extensive evaluation based on 60 GB of real-world Chinese news data, although our ideas are not language-dependent and can easily be extended to other languages, through detailed pilot user experience studies. The results demonstrate the superior capability of Story Forest to accurately identify events and organize news text into a logical structure that is appealing to human readers, compared to multiple existing algorithm frameworks.

READ FULL TEXT
research
08/16/2018

Story Disambiguation: Tracking Evolving News Stories across News and Social Streams

Following a particular news story online is an important but difficult t...
research
04/08/2023

Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding

Unsupervised discovery of stories with correlated news articles in real-...
research
09/03/2018

Multilingual Clustering of Streaming News

Clustering news across languages enables efficient media monitoring by a...
research
11/02/2015

Spatial Semantic Scan: Jointly Detecting Subtle Events and their Spatial Footprint

Many methods have been proposed for detecting emerging events in text st...
research
06/04/2019

Coherent Comment Generation for Chinese Articles with a Graph-to-Sequence Model

Automatic article commenting is helpful in encouraging user engagement a...
research
03/12/2014

Adaptive Representations for Tracking Breaking News on Twitter

Twitter is often the most up-to-date source for finding and tracking bre...
research
02/13/2016

Semantic Scan: Detecting Subtle, Spatially Localized Events in Text Streams

Early detection and precise characterization of emerging topics in text ...

Please sign up or login with your details

Forgot password? Click here to reset