With the widespread use of surveillance cameras in public places, computer vision-based scene understanding has gained a lot of popularity amongst the CV research community. Visual data contains rich information compared to other information sources such as GPS, mobile location, radar signals, etc. Thus, it can play a vital role in detecting/predicting congestions, accidents and other anomalies apart from collecting statistical information about the status of road traffic.
Several computer vision-based studies have been conducted focusing on data acquisition 80, 164], scene learning [124, 14, 67, 36]181], behavioral understanding [162, 15], etc. These studies primarily discuss on aspects such as scene analysis, video processing techniques, anomaly detection methods, vehicle detection and tracking, multi camera-based techniques and challenges, activity recognition, traffic monitoring, human behavior analysis, emergency management, event detection, etc.
Anomaly detection is a sub-domain of behavior understanding  from surveillance scenes. Anomalies are typically aberrations of scene entities (vehicles, human or the environment) from the normal behavior. With the availability of video feeds from public places, there has been a surge in the research outputs on video analysis and anomaly detection [162, 164, 158, 115]
. Typically anomaly detection methods learn the normal behavior via training. Anything deviating significantly from the normal behavior can be termed as anomalous. Vehicle presence on walkways, a sudden dispersal of people within a gathering, a person falling suddenly while walking, jaywalking, signal bypassing at a traffic junction, or U-turn of vehicles during red signals are a few examples of anomalies. Anomaly detection frameworks typically use unsupervised, semi-supervised or unsupervised learning. In this survey, we mainly explore anomaly detection techniques used in road traffic scenarios focusing onentities such as vehicles, pedestrian, environment and their interactions.
We have noted that scope of the study should cover the nature of input data and their representations, feasibility of supervised learning, types of anomalies, suitability of the techniques in application contexts, anomaly detection outputs and evaluation criteria. We present this survey from the above perspectives. A typical anomaly detection framework is presented in Fig.1. Usually, anomaly detection systems work by learning the normal data patterns to build a normal profile. Once the normal patterns are learned, anomalies can be detected with the help of established approaches [137, 97]. Output of the system can be a score typically in the form of a metric or a label that notifies whether the data is anomalous or not.
Some examples of anomaly detection results are shown in Fig. 2.
I-a Recent Surveys
During last 10 years or so, a few interesting surveys have been published in this field of research. Authors of  have explored object detection, tracking, scene modeling and activity analysis using video trajectories. The study presented in  covers vehicle detection, tracking, behavior understanding and incident detection from the purview of intelligent transportation systems (ITS). Authors of  have conducted an in-depth study of traffic analysis frameworks under different taxonomies with pointers at integrating information from multiple sensors. The review presented in  is possibly the first work covering anomaly detection techniques. It covers sensors, entities, feature extraction methods, learning methods and scene modeling to detect anomalies. In , an object oriented approach from the perspective of vehicle mounted sensors for object detection, tracking and behavior analysis detailing the progress of the last decade of works, has been presented. Multi-camera study presented in  covers the researches related to surveillance in multi-camera setups. Authors of  discuss events, which are considered as a subset of anomalous events, requiring immediate attention, occuring unintentionally, abruptly and unexpectedly. The research presented in  discusses safety, security and law enforcement related applications from the computer vision perspective. The review presented in  discusses the elements of human activity and behavioral understanding frameworks. Authors of  present the researches on human behavioral understanding through actions and interactions of human entities. Intelligent video systems covering analytics aspect has been studied in . Surveillance systems with specific application areas have been presented in . Authors of  systematically divide road traffic analysis into four layers, namely image acquisition, dynamic and static attribute extraction, behavioral understanding and ITS services. Datasets used for anomaly detections have been covered in . Traffic monitoring using different types of sensors has been discussed in . Algorithms used for spatio-temporal point detections and their applications in vision domain have been covered in . Traffic entities have been studied from the perspective of safety in . Authors of  explore studies on video trajectory-based analysis and applications. Authors of  discuss various ways of handling emergency situations by assessing the risks, preparedness, response, recovery and mitigation using the extracted information from the visual features with the help of various learning mechanisms. In , authors have presented anomalous human behavior recognition work with focus on behavior representation and modeling, feature extraction techniques, classification and behavior modeling frameworks, performance evaluation techniques, and datasets with examples of video surveillance systems. Table I summarizes the major computer vision-based studies done during last 10 years. In our survey, we particularly focus on the studies on anomaly detection that are relevant on road traffic scenarios.
Anomalies are contextual in nature. The assumptions used in anomaly detections cannot be applied universally across different traffic scenarios. We analyze the capabilities of anomaly detection methods used in road traffic surveillance from the perspective of data. In the process, we categorize the methods according to scene representation, employed features, used models and approaches.
|Ref.||Focus||Explored research areas|
|Morris (2008) ||Video trajectory-based scene analysis||Scene modeling: Tracking, interest point study, activity path learning; Applications: People movement, traffic, parking lot, and entity interaction; Path learning: Preprocessing (normalization and dimensionality reduction), clustering approaches and used distance measures, path modeling, relevance of path feedback in low level systems; Activity analysis: Virtual fencing, speed profiling, path classification, abnormality detection, online activity analysis, object interaction characterization.|
|Tian (2011) ||Video processing techniques applied for traffic monitoring||Traffic parameters collection; Traffic incident detection; Vehicle detection scenarios: Background modeling and non-background modeling approaches, shadow detection and removal; Vehicle tracking, model-based classification, region, deformable template and feature study, tracking algorithms; Traffic incident detection and behavior understanding.|
|Buch (2011) )||Video analytics system for urban traffic||
Applications: Vehicle counting, automatic number plate recognition, incident detection; Analytics system components; Foreground segmentation techniques: Frame differencing, background subtraction (averaging, single Gaussian, mode estimation, Kalman filter, wavelets), GMM, graph cuts, shadow removal, object-based segmentation; Top-down vehicle classification: Features (region based, contour based), machine learning techniques; Bottom-up approaches: Interest point descriptors, object classification; Tracking: Kalman filter, PF, S-T MRF, graph correspondence, event cones; Traffic analytic system: Urban (camera domain, three dimensional modeling), highways (detection and classification).
|Sodemann (2012) ||Anomaly detection||
Study on sensors: Visible-spectrum camera (low-level feature extraction and object level feature extraction), audio and infrared sensors; Learning methods: Unsupervised, supervised and apriori modeling; Classification algorithms: Dynamic bayesian networks, bayesian topic models, artificial neural networks, clustering, decision trees, fuzzy reasoning.
|Sivaraman (2013) ||Vision-based vehicle detection, tracking and behavior analysis||Sensors: radar, lidar, camera; Vehicle detection: Monocular vision (camera placement, appearance features and classification, motion based approaches, vehicle pose). Stereo vision (matching, motion-based approaches); Vehicle tracking: Monocular and stereo tracking, vision cue fusion, real-time challenges and system architecture, fusion with other modalities; Behavior analysis: context, vehicle maneuvers, trajectories, behavioral classification; Future direction of vehicle detection, tracking, their on-road behavior and public benchmarks.|
|Wang (2013) ||Multi-camera based surveillance||Multi-camera calibration; Topology computation; Multi-camera object tracking: Calibration, appearance cues, correspondence-based methods; Object re-identification: Feature studies, learning methods; Multi-camera activity analysis: Correspondence free methods, activity models, human action recognition; Cooperative video surveillance using active and static cameras; Background modeling and object tracking with active cameras.|
|Suriani (2013) ||Abrupt event detection||Human centered, vehicle centered and small area centered studies; Methods of detection: Single person, multiple person, vehicles, multi-view camera based.|
|Loce (2013) ||Traffic management||Vehicle mounted camera-based safety applications: Lane departure warning and lane change assistance, pedestrian detection, driver monitoring, adaptive warning systems; Efficiency studies: Traffic flow management, incident management, video based tolling; Security management: Alert and warning systems, traffic surveillance, recognizing and tracking vehicles of interest; Law enforcement: Studies on speed enforcement, violation detection at road intersections, vehicle mounted mobile camera based vehicle identification.|
|Vishwakarma (2013) ||Human activity recognition and behavior analysis||Application areas: Behavioral biometrics, content-based video analysis, security and surveillance, interactive applications, animation and synthesis; Object detection methods: Motion segmentation methods (background subtraction based, statistical, temporal differencing and optical flow-based) and object classification; Object tracking methods (region, contour, feature, model, hybrid and optical flow-based); Action recognition techniques: Hierarchical (statistical, syntactic and description based) and non-hierarchical approaches; Human behavior understanding: Supervised, semi-supervised and unsupervised models; Dataset description: Controlled and realistic environments and its realistic impact on video-based surveillance market.|
|Borges (2013) ||Human behavior analysis||Human detection methods: Appearance, motion and hybrid approaches; Action recognition approaches: Low-level and spatio-temporal interest points, mid and high-level, silhouettes features; Interaction recognition: One-to-one, group interactions, models; Datasets.|
|Liu (2013) ||Intelligent video systems and analytics||
Video systems: Architecture (distributed/centralized), quality diagnosis, system adaptability (configuration, calibration, capability and scalability) analysis, data management and transmission methods; Analytics: Object attributes, motion pattern recognition, event and behavior analysis; Analytic methods: Intelligence and cooperative aspects, multi-camera view selections, statistical and networked analysis, learning and classification, 3-D sensing; Applications areas: Management, traffic control, transportation, intelligent vehicles, health-care, life sciences, security and military.
|Zablocki (2013) ||Characteristics of intelligent video surveillance systems||System classification: Object detection, tracking and movement analysis technologies; Anomaly detection, identification and warning/alarming systems; Vehicle detection, traffic and parking lot analysis systems; Object counting systems; Integrated camera view handling systems; Privacy preserving systems; Cloud-based systems.|
|Tian (2015) ||Vehicle surveillance||Dynamic and static attribute extraction: Appearance and motion-based detection, tracking, recognition (license plate, type, color and logo), networked tracking of vehicles; Behavior understanding: Single camera study, trajectory (clustering, modeling and retrieval) and networked multi-camera-based, interesting region discovery; Image acquisition: Traffic scene characteristics, imaging technologies; ITS service study: Illegal activity and anomaly detection, security monitoring, electronic toll collection, traffic flow analysis, transportation planning and road construction, environment impact assessment.|
|Patil (2016) ||Video datasets for anomaly detection||Dataset classification: Traffic, subway, panic driven, pedestrian, abnormal activity, campus, train, sea, crowd.|
|Datondji (2016) ||Traffic monitoring at intersections||Camera based classification: Mono vision, omni vision and stereo vision; Vehicle sensing: Methodologies and datasets; Challenges: Initialization and preprocessing, vehicle detection and tracking; Vehicle detection methods: Candidate localization, verification; Vehicle tracking: Representation and tracking approaches: Region, contour, feature and model-based; Vehicle tracking algorithms: Matching, Bayesian; Challenges for intersection; Monitoring systems: Monocular vision and omni-directional vision-based, in-vehicle monitoring; Vehicle tracking: Roadside monitoring systems, in-vehicle monitoring systems; Vehicle behavior analysis.|
|Li (2017) ||Spatio-temporal interest point (STIP) detection algorithms||STIPs algorithms; Detection challenges; Applications: Human activity detection, anomaly detection, video summarization and content based video retrieval.|
|Shirazi (2017) ||Intersections analysis from safety perspective||Vehicular behavior: Trajectories, vehicle speed, acceleration, turn recognition; Driver behavior: Turning intention, aggression, perception reaction time; Pedestrian behavior: Motion prediction, waiting time, walking speed, crossing speed, and choices; Safety assessment: Gap analysis, threat, risk, conflict, accident; Intersection safety systems: Driver assistance systems (driver perception enhancement, action suggestion and human driver interface, advanced vehicle motion control delegation), infrastructure-based systems (roadside warning systems, dilemma zone protection systems, decision support systems).|
|Ahmed (2018) ||Trajectory-based analysis||Trajectory analysis: Datasets, extraction, representation, applications; Clustering algorithms; Event detection: Methods and learning procedures; Localization of abnormal events: Methods and learning procedures; Video summarization and synopsis generation.|
|Lopez-Fuentes (2018) ||Emergency management using computer vision||
Emergency classification: Natural, human made (road accident, crowd related, weapon threat, drowning, injured person, falling person); Monitoring objective: Prevention, detection, response and understanding; Acquisition methods: Sensor location, sensor types, acquisition rate and sensor cost; Feature extraction algorithms: Color, shape and texture, temporal (wavelet, optical flow, background modeling and subtraction, tracking) and convolution features; Semantic information extraction using machine learning: Artificial neural networks, deep learning, support vector machines (SVMs), hidden markov models (HMMs), fuzzy logic.
|Mabrouk (2018) ||Abnormal behavior recognition||Behavior representation; Anomalous behavior recognition methods: Modeling frameworks and classification methods, scene density and moving object interaction in crowded and uncrowded scenes; Performance evaluation: Datasets and metrics; Existing surveillance systems.|
Rest of the paper is organized as follows. First, the background and the terminologies used in the paper are introduced in Section II-A. Anomaly detection related visual scene learning methods are presented in Section II-B. Anomaly detection approaches and classification are elaborated in Section II-C. Features used for anomaly detection and application areas are presented in Sections II-D and II-E, respectively. A critical analysis of the existing methods followed by discussions on the challenges and future possibilities of anomaly detection are presented in Section III. We conclude the paper in Section IV.
Ii Computer Vision Guided Anomaly Detection Studies
Ii-a Background and Terminologies
Features are assumed as data in the present context and are represented in the form of feature descriptors. Data typically occupy a position in a multi-dimensional space depending on the feature descriptor length.
Anomalies are data patterns that do not conform to a well-defined notion of normal behavior 
. There has been other synonyms of anomalies such as outliers, novelty in various application areas. In this paper, we use anomaly or outlier in the subsequent part.
Ii-A1 Anomaly Classification
Traditionally, anomalies are classified aspoint anomalies [152, 96, 73], contextual anomalies [165, 210] and collective anomalies [192, 34]. Data correspond to point anomaly if they are far away from the usual distribution. For example, a non-moving car on a busy road can be termed as a point anomaly. Contextual anomalies correspond to data that may be termed normal in a different context. For example, in a slow moving traffic, if a biker rides faster as compared to others, we may term it as anomaly. Conversely, in a less dense road it may be a normal behavior. A group of data instances together may cause anomaly even though individually they may be normal. For example, a group of people dispersing within a short span of time can be termed as collective anomaly.
In the context of visual surveillance, it is common to see anomalies classified as local and global anomalies [57, 68, 139, 207, 138, 154]. Global anomalies can be present in a frame or a segment of the video without specifying where exactly it has happened[57, 68, 139]. Local anomalies usually happen within in a specific area of the scene, but may be missed by global anomaly detection algorithms [207, 138, 154]. Some methods can detect both global and local anomalies[190, 5, 34, 78, 222].
Ii-A2 Challenges and Scope of Study
The key challenges in anomaly detection are: (i) defining a representative normal region, (ii) boundaries between the normal and anomalous regions may not be crisp or well defined, (iii) the notion of anomaly is not same in all application contexts, (iv) limited availability of data for training and validation, (v) data is often noisy due to inaccurate sensing, and (vi) normal behavior evolves over time.
Ii-B Learning Methods
Learning the normal behavior is not only relevant for anomaly detection, but also for diverse use cases. Pattern analysis , classification , prediction , density estimation , and behavior analysis  are a few amongst them.
Learning methods can be classified as supervised, unsupervised or semi-supervised. In supervised learning, the normal profile is built using labeled data [79, 74, 81, 159]. It is typically applied for classification and regression related applications. In unsupervised learning, normal profile is structured from the relationships between elements of the unlabeled dataset 
. Semi-supervised learning primarily uses unlabeled data with some supervision with a small amount of labeled data for specifying example classes known apriori[170, 106]. If learning happens through interactive labeling of data as and when the label info is available, such a learning is called active learning [179, 42, 109, 134]
. Such methods are used when unlabeled data are abundant and manual labeling is expensive. Reinforcement learning, a relatively new learning applied on computer vision, is an area of machine learning concerned with how software agents (discriminant and generator) ought to take actions in an environment so as to maximize some notion of cumulative reward[195, 191, 215]. Some of the important works are summarized in Table II.
|Supervised||[163, 143, 161, 111, 37, 113, 59, 62, 79, 74, 81, 159, 220, 32, 92]|
|Unsupervised||[50, 131, 2, 149, 72, 117, 107, 152, 166, 182, 203, 199]|
|Semi-supervised||[185, 27, 114, 170, 106, 138]|
Learned models are not only been used in feature extraction, but also used in object detection , classification , activity recognition , segmentation , tracking , entity re-identification , object interaction analysis , anomaly detection , etc. Table III presents some important learning methods used in anomaly detection.
|Learning Method||Method||Applied context|
|Supervised||Hidden Markov Model (HMM) ||A supervised statistical Markov model where the system modeled is assumed to be a Markov process with hidden states: Used for anomaly detection in [20, 189].|
|Support Vector Machine (SVM) ||A representation of data points in space, mapped such that separate categories are divided by a clear separation between them: A special class of SVM, namely One class SVM (OCSVM) has been extensively for anomaly detection .|
|Gaussian Regression (GR) ||A generic supervised learning method designed to solve regression and probabilistic classification problems: Used in [34, 153] for anomaly detection from videos.|
|Convolutional Neural Networks (CNN) ||A class of deep neural networks, applied usually to analyze visual imagery: Due to its applicability in extracting semantic level features from the input, it has become popular in many applications including anomaly detection [68, 118].|
|Multiple Instance Learning (MIL) ||A special learning framework which deals with uncertainty of instance labels: Instead of receiving a set of instances which are individually labeled, the learner receives a set of labeled bags, each containing many instances. If all the instances in it are negative, the bag may be labeled negative. If there is at least one positive instance, the bag is labeled positive. It has been used for anomaly detection in [207, 168].|
|Long short-term memory (LSTM) networks ||
A special kind of recurrent neural network (RNN) used in time series applications: In[113, 112, 118, 166], it has been used for anomaly detection.
|Fast Region-based-CNN (Fast R-CNN) ||A higher variation of neural deep neural networks (DNN) that works efficiently in object classification over conventional CNNs: Used for anomaly detection in .|
|Unsupervised||Latent Dirichlet Allocation (LDA) ||A topic model using statistical analysis to retrieve underlying topic distribution of in documents: Used for modeling visual words of videos for anomaly detection .|
|Probabilistic Latent Semantic Analysis (pLSA) ||A model for representing co-occurrence information under a probabilistic framework: Used in  for anomaly detection.|
|Hierarchical Dirichlet Process (HDP) ||A nonparametric Bayesian approach, built based on LDA, to cluster data: Used in data modeling and anomaly detection .|
|Gaussian Mixture Model (GMM) ||
A probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters: Used for anomaly detection in[99, 200].
|Density-based spatial clustering of applications with noise (DBSCAN) ||A density based non-parametric clustering algorithm used extensively for modeling and learning data patterns: Used for anomaly detection in .|
|Fisher kernel method ||A function to measure similarity of two objects on the basis of sets of measurements for each object and a statistical model: Used to obtain trajectory feature representation in .|
|Principal component analysis (PCA) ||A statistical procedure of orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables: Used for dimensionality reduction in .|
|Particle Swarm Optimization ||A population based stochastic optimization technique: Used in  to obtain optimized motion descriptor from a set of particles having individual motion characteristics.|
|Generative Adversarial networks (GAN) ||
A class of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks (generator and discriminator) contesting with each other in a zero-sum game framework: Used for anomaly detection in.
|Hybrid||HDP+HMM||A hybrid model: Used for representing sub-trajectories in  for anomaly detection using MIL.|
|GAN-LSTM ||A hybrid model: Fake frames required for adversarial learning used in  are generated using bidirectional Conv-LSTM .|
|CNN-LSTM ||A hybrid model: Prediction-based anomaly detection with the help of CNN-LSTM.|
Ii-C Anomaly Detection Approaches
Anomaly detection approaches can be classified as depicted Fig. 3.
Model-based approaches learn the normal behavior of data by representing them in terms of a set of parameters. Statistical approaches are used in general to learn the parameters of the model as they try to fit the data into a stochastic model. Statistical approaches may be either parametric or non-parametric. Parametric methods assume that the normal data is generated through parametric distribution and probability density function. Examples are Gaussian mixture models, Regression models , etc. In nonparametric statistical models, the structure is not defined apriori, instead determined dynamically from the data. Examples are histogram-based , Dirichlet process mixture models (DPMM) , Bayesian network-based models 
, etc. Bayesian network estimates the posterior probability of observing a class label from a set of normal class labels and the anomaly class labels, given a test data instance. The class label with the biggest posterior is regarded as predicted class for the given test instance. Typically, topic model-based anomaly detection methods use Bayesian nonparametric approaches[126, 84]
. DNN-based models can also be categorized under parametric models, where the parameters are the weights and biases of the neural networks[154, 28, 112]. However, some researchers consider them as a classification approaches 
, while many approaches (statistical, classification, information theoretic, reconstruction based) are used in the anomaly detection. Neural network-based methods also adopt information theoretic approach to reduce cross entropy between expected and the predicted outputs in the model learning. Hence, it may be also categorized under hybrid approaches.
In proximity based approaches, anomalies are decided by how close they are to their neighbors. In distance-based approaches, the assumption is that normal data have dense neighborhood . Density-based approaches compare the density around a point with the density around its local neighbors. The relative density of a point compared to its neighbors is computed as an outlier score .
Classification based anomaly detection methods assume that a classifier can distinguish between normal and anomalous classes in a given feature space. Class-based anomaly detection techniques can be divided into two categories: one class and multi-class. Multi-class classification-based anomaly detection techniques assume that the training data contain labeled instances of normal and anomalous classes. A data point is assumed anomalous if it falls in the anomalous class . One-class classification (OCC)-based anomaly detection techniques assume that all training data have one label [190, 192, 139, 205]. Such techniques learn a discriminative boundary around the normal instances using a one-class classification algorithm. Support Vector Machines (SVMs) can be used for anomaly detection in the one-class setting extensively in visual surveillance [29, 139]. Rule-based approaches learn rules that capture the normal behavior of a system . A test instance that is not covered by any such rule, is considered as an anomaly.
In reconstruction-based techniques, the assumption is, normal data can be embedded into a lower dimensional subspace in which normal instances and anomalies appear differently. Anomaly is measured based on the data reconstruction error. Some of the examples are, sparse coding [172, 218, 208], autoencoder , and principal component analysis (PCA)-based approaches .
Ii-C6 Other Approaches
There are two types of clustering approaches. One relies on an assumption that the normal data lie in a cluster, while anomaly data do not get associated with any cluster . The later type is based on an assumption that normal data instances belongs to big and dense clusters, while anomalies either belong to little/small clusters. Fuzzy inference systems take a fuzzy data point and uses the rules related to membership and strength at which data point fires the rules to decide whether the data is anomalous or not [201, 98]. Heuristic methods intuitively decide about the feature values, spatial location, and contextual information to decide on anomalies. However, many practical systems do not entirely depend on one technology, rather hybrid approaches are used for anomaly detection [187, 33, 123]. Table IV presents the aforementioned categorization.
|SVM||[163, 143, 2, 190, 70, 16, 160]|
|Sparse||[208, 21, 111, 113, 185, 149]|
|Autoenoder||[161, 59, 37, 150]|
|Clustering-based||[51, 131, 179]|
|Statistical methods||[72, 35, 99]|
|Prediction||[20, 118, 88, 10]|
|Fuzzy logic-based||[201, 98]|
|Hybrid||[207, 150, 107, 206, 123, 62, 177, 5, 66]|
|Heuristic||[199, 30, 211, 93, 167, 117, 69]|
Ii-D Features Used in Anomaly Detection