Outlier Detection In Large-scale Traffic Data By Naïve Bayes Method and Gaussian Mixture Model Method

12/28/2015
by   Philip Lam, et al.
0

It is meaningful to detect outliers in traffic data for traffic management. However, this is a massive task for people from large-scale database to distinguish outliers. In this paper, we present two methods: Kernel Smoothing Naïve Bayes (NB) method and Gaussian Mixture Model (GMM) method to automatically detect any hardware errors as well as abnormal traffic events in traffic data collected at a four-arm junction in Hong Kong. Traffic data was recorded in a video format, and converted to spatial-temporal (ST) traffic signals by statistics. The ST signals are then projected to a two-dimensional (2D) (x,y)-coordinate plane by Principal Component Analysis (PCA) for dimension reduction. We assume that inlier data are normal distributed. As such, the NB and GMM methods are successfully applied in outlier detection (OD) for traffic data. The kernel smooth NB method assumes the existence of kernel distributions in traffic data and uses Bayes' Theorem to perform OD. In contrast, the GMM method believes the traffic data is formed by the mixture of Gaussian distributions and exploits confidence region for OD. This paper would address the modeling of each method and evaluate their respective performances. Experimental results show that the NB algorithm with Triangle kernel and GMM method achieve up to 93.78

READ FULL TEXT
research
02/16/2016

A Sparse PCA Approach to Clustering

We discuss a clustering method for Gaussian mixture model based on the s...
research
11/18/2020

Skewed Distributions or Transformations? Modelling Skewness for a Cluster Analysis

Because of its mathematical tractability, the Gaussian mixture model hol...
research
09/16/2020

PCA Reduced Gaussian Mixture Models with Applications in Superresolution

Despite the rapid development of computational hardware, the treatment o...
research
03/03/2023

Asymptotic Bayes risk of semi-supervised multitask learning on Gaussian mixture

The article considers semi-supervised multitask learning on a Gaussian m...
research
07/06/2011

Online Vehicle Detection For Estimating Traffic Status

We propose a traffic congestion estimation system based on unsupervised ...
research
05/10/2020

Non-recurrent Traffic Congestion Detection with a Coupled Scalable Bayesian Robust Tensor Factorization Model

Non-recurrent traffic congestion (NRTC) usually brings unexpected delays...
research
02/24/2021

Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering

The article introduces an elementary cost and storage reduction method f...

Please sign up or login with your details

Forgot password? Click here to reset