Semantic Scan: Detecting Subtle, Spatially Localized Events in Text Streams

by   Abhinav Maurya, et al.
University of Notre Dame
Carnegie Mellon University

Early detection and precise characterization of emerging topics in text streams can be highly useful in applications such as timely and targeted public health interventions and discovering evolving regional business trends. Many methods have been proposed for detecting emerging events in text streams using topic modeling. However, these methods have numerous shortcomings that make them unsuitable for rapid detection of locally emerging events on massive text streams. In this paper, we describe Semantic Scan (SS) that has been developed specifically to overcome these shortcomings in detecting new spatially compact events in text streams. Semantic Scan integrates novel contrastive topic modeling with online document assignment and principled likelihood ratio-based spatial scanning to identify emerging events with unexpected patterns of keywords hidden in text streams. This enables more timely and accurate detection and characterization of anomalous, spatially localized emerging events. Semantic Scan does not require manual intervention or labeled training data, and is robust to noise in real-world text data since it identifies anomalous text patterns that occur in a cluster of new documents rather than an anomaly in a single new document. We compare Semantic Scan to alternative state-of-the-art methods such as Topics over Time, Online LDA, and Labeled LDA on two real-world tasks: (i) a disease surveillance task monitoring free-text Emergency Department chief complaints in Allegheny County, and (ii) an emerging business trend detection task based on Yelp reviews. On both tasks, we find that Semantic Scan provides significantly better event detection and characterization accuracy than competing approaches, while providing up to an order of magnitude speedup.


page 1

page 2

page 3

page 4


Spatial Semantic Scan: Jointly Detecting Subtle Events and their Spatial Footprint

Many methods have been proposed for detecting emerging events in text st...

Machine Learning for Drug Overdose Surveillance

We describe two recently proposed machine learning approaches for discov...

Gaussian Process Subset Scanning for Anomalous Pattern Detection in Non-iid Data

Identifying anomalous patterns in real-world data is essential for under...

Semantic Analysis of Traffic Camera Data: Topic Signal Extraction and Anomalous Event Detection

Traffic Management Centers (TMCs) routinely use traffic cameras to provi...

Growing Story Forest Online from Massive Breaking News

We describe our experience of implementing a news content organization s...

Image Analysis Enhanced Event Detection from Geo-tagged Tweet Streams

Events detected from social media streams often include early signs of a...

t-SS3: a text classifier with dynamic n-grams for early risk detection over text streams

A recently introduced classifier, called SS3, has shown to be well suite...

Please sign up or login with your details

Forgot password? Click here to reset