We approach the problem of distinguishing between ‘native’ events and intrusions in an event stream arriving over time. This problem arises in multiple applications. Consider, for example, an online payment service where the users connect with their credentials and pay for goods or services. A thief can illegally obtain access to another user’s account and steal money by sending payments on behalf of the legitimate user. Is there a way to identify illegal payments by looking at a sequence of payments even if each individual payment looks legitimate?
We turn to the renewal process as the basis for the probabilistic generative model of the problem. Renewal processes C70 are used to model arrival of events where hold times between events are independently distributed. We assume that regular events come from a renewal process with known parameters, which can be given or inferred. We then consider a sequence of recent events and reason about the likelihood that some of the events are ‘foreign’ rather than belong to the process. We adopt the Bayesian approach to infer the probability of an intrusion in a given event sequence, a maximum a posteriori probability (MAP) subsequence of events constituting the intrusion, and the marginal probability of each event to belong to the intrusion.
We show that the inference can be performed in polynomial time. We implement the inference algorithms and evaluate the inference on synthetic data and on anonymized data from an online payment system.
The paper brings the following contributions:
A probabilistic generative model for inference about intrusions in renewal processes.
Polynomial-time algorithms for computing the probability of an intrusion, a MAP subsequence of events constituting the intrusion, and the marginal probability of each event to belong to the intrusion.
An evaluation of applicability of the algorithms to intrusion detection in online payment systems.
2 Related Work
The problem of detecting an intrusion in a sequence of events belongs to the field of anomaly (or novelty) detection. An extensive review of novelty detection in general is provided inMS03 ; PCC+14
. Anomaly detection in discrete sequences is reviewed inCBK12 , and in temporal data in GGA+14 .
A generative probabilistic model E84
is used to reason about intrusion probabilities. Much of the recent fundamental and applied research on unsupervised learning in general and anomaly detection in particular involves generative probabilistic modelsRG99 ; PER00 ; XPS+11 ; XPS11 .
A renewal process is a discrete stochastic process G14 . Discrete stochastic processes arise as models in many applications S09 ; G14 ; SP13 . Depending on the nature of the phenomenon being modelled, different discrete stochastic processes are used, such as Poisson processes LM03 ; SJ10 , Cox processes L98 , interacting point processes and in particular Hawkes processes O88 ; CM12 , Markov processes Y20 , and other variations WP96 ; TA06 .
The present work differs from earlier research in the following aspects:
A specific type of novelty, namely an intrusion, is considered. The sequence of events is viewed as a mixture of normal activity and an intrusion.
The generative model is used to predict both the probability of an intrusion and the marginal probability of each event to belong to the intrusion (rather than just the probability of an intrusion).
No prior assumption is made about the stochastic process realised by the intrusion.
are non-negative, independent, and identically distributed random variables. A renewal process can be characterized in several ways — by the distributions of either arrival times, interarrival intervals, or the number of arrivals during a unit time interval. In this paper, we characterize renewal processes by the distribution of interarrival intervals. We write
to describe a renewal process with interarrival intervals drawn from distribution with parameter
. For example, the Poisson process is a renewal process with exponentially distributed interarrival intervals:
Renewal processes are used as a simple model for systems that repeatedly return to a state probabilistically equivalent to the initial state.
4 Probabilistic Generative Model of Intrusion
In the problem of intrusion detection in a renewal process we are given:
A renewal process
that an individual event in the sequence belongs to the intrusion.
A time interval , of duration .
A sequence of events within the time interval, i.e. .
Based on this, we need to determine:
The posterior probability of an intrusion in the sequence.
Maximum a posteriori subsequence of events constituting the intrusion.
The marginal probability of each event to belong to the intrusion.
To solve the problem, we construct a generative model that produces a sequence of events, taking the possibility of an intrusion into account, and then perform posterior inference on the model. There are two essential observations C70 :
A renewal process is infinite in both directions.
The probability density of the interarrival interval of a renewal process is fully determined by the time interval passed from the last event.
Based on these observations, just two more events — and before the first event and after the last event in the sequence — fully define the context of the given sequence of events; also, the times can be shifted arbitrarily by the same offset, e.g. so that the earliest event takes place at time 0, . Hence, the generative model must draw the number of events belonging to the intrusion, , and then generate events from the renewal process, starting with an event at time 0 (Algorithm 1).
5 Posterior Inference
In what follows, stands for the probability density of distribution , stands for unnormalized probability. For brevity, we drop explicit conditioning of probabilities on problem parameters. The proofs are provided in the supplementary material.
5.1 Probability of Intrusion Subsequence
5.2 Maximum a Posteriori Subsequence
In (3) each of factors , , (Equation 3) is independent of the rest of events given two adjacent process events. Therefore, finding a MAP subsequence of intrusion events can be formulated as the shortest path problem in a directed acyclic graph.
Let us construct a weighted directed acyclic graph where the set of vertices is the set of events , and the set of edges contains a weighted edge for each pair of vertices, from the smaller to the greater index: . The edge weights are:
Then, a MAP subsequence of events is the set of events in a shortest path from to with the extra events removed.
Theorem 1 implies that a MAP subsequence can be computed in time .
5.3 Intrusion Probabilities
The probability of an intrusion and the marginal probability of each event , , to belong to the intrusion can be computed in polynomial time. We first present an algorithm for computing the posterior probability of intrusion. Then, we show how the same algorithm can be generalized to also compute the marginal probability of any given event to belong to the intrusion. Finally, we introduce an algorithm for computing simultaneously, and hence more efficiently, the probability of an intrusion and the marginal probability of each event in to belong to the intrusion.
Both equations involve computing the unnormalized marginal likelihood . Lemma 2 gives an algorithm for computing in polynomial time.
Algorithm 2 computes in time .
Being able to compute the unnormalized marginal likelihood, we can
immediately obtain the intrusion
probability (5). Marginal
probabilities (6) involve the unnormalized
probability of a particular event not in given ,
which can be computed similarly to the unnormalized marginal likelihood.
However, much of the computation would be reused between
different marginal probabilities; in particular, the
computations for two events are the same
until . Theorem 2 gives an
algorithm111Algorithm 3 bears
similarity to the forward-backward algorithm for Markov
bears similarity to the forward-backward algorithm for Markov chainsBPS+70 , but computes marginal probabilities of occurrence of a node in the sequence rather than of states in the sequence of nodes. that computes the intrusion probability and all posterior probabilities simultaneously, reusing computations.
Algorithm 3 computes and in time .
6 Process Parameters
6.1 Estimation from Past Data
The most straightforward approach is to estimate the parameters from the past data under the assumption that the data do not contain any intrusions. This assumption is adequate either if intrusions are detected and removed from the data, or if they are rare, such that their influence on estimation of the process parameters is negligible.
6.2 Maximum Likelihood Estimation by Expectation-Maximization
The process parameters can be chosen to maximize the likelihood of the MAP subsequence. This yields an expectation-maximization (EM) algorithm (Algorithm4) alternating between finding the MAP subsequence of intrusion events and estimating parameters from the remaining subsequence .
The initial parameter values are set under the assumption that there is no intrusion, i.e. from the whole sequence (line 1). Given a sequence of events, the parameters are estimated as in Subsection 6.1 (line 3). The algorithm terminates either when stays the same in two subsequent iterations (line 9), thus reaching a fixed point, or after a pre-defined maximum number of iterations (line 5).
A pitfall of this EM scheme is that the process parameters cannot be estimated reliably if becomes too small. Hence, the algorithm must also be interrupted when the size of exceeds a certain threshold (line 11).
6.3 Bayesian Inference of Posterior Distribution of Parameters
In the Bayesian setting, a prior can be imposed upon the process parameters. The posterior inference is performed on the joint distribution of the process parameters conditioned on the marginal likelihood of(Algorithm 2). A drawback of this approach is that the inference may be too expensive computationally. As the problem of detecting intrusions in online event streams often arises in settings that require fast response, maximum-likelihood estimation from past data (Subsection 6.1) or from the given event sequence (Subsection 6.2) may be a better choice.
7 Empirical Evaluation
In the case studies that follow we evaluate the algorithms of Sections 5 and 6 on both synthetic and real-world data. Evaluation on synthetic data provides an evidence that the algorithms work on data generated by a renewal process. Evaluation on real-world data examines performance of the algorithms when the properties of the generating process are unknown, as well as assesses their applicability to practical intrusion detection.
The data, the algorithms, and the code to run the experiments are available at https://github.com/dtolpin/rmi-case-studies.
7.1 Evaluation on Synthetic Data
We generate data from a renewal process with Gamma-distributed interarrival intervals for shapes 1, 2, 4, and 8. The dataset is balanced so that a half of the dataset entries contains an intrusion. Intrusion events are uniformly distributed over a subinterval of each entry with intrusion, chosen uniformly with average length ofof the total entry duration. 10 000 entries of 20 events are generated for each intrusion probability.
Figure 2 shows average posterior intrusion probability as a function of the shape of interval distribution for both negative and positive entries. The probability is computed either for known process parameters, or for parameters estimated through the EM algorithm (Section 6.2). The posterior intrusion probability for positive and negative samples differs sufficiently for shapes greater than 1 to reliably distinguish between samples with and without intrusion in both.
Shape 1 corresponds to the Poisson process. In a Poisson process the joint density of a sequence of events is independent of intermediate intervals given the interval between the last and the first event. Hence, intrusion probabilities in positive and negative samples are close to 0.5 and to each other.
Figure 3 shows area under the ROC curve (AUC) as a function of the prior intrusion probability for each combination of data and algorithm parameters. AUC reflects the classification accuracy for all combinations of false negative and false positive rates. When interarrival intervals are used for intrusion detection, for both known process parameters and parameters estimated by the EM algorithm, AUC stays above 0.6 for shapes greater than 1, with the highest values of .
Figure 4 shows average Jaccard similarity score between the MAP intrusion subsequence (Section 5.2) and the actual intrusion. Jackard similarity score stays above for shapes greater than 1. The score is low for shape 1, because the Poisson process implies that in the presence of an intrusion any subsequence of intermediate events of given size has the same probability to belong to the intrusion.
7.2 Evaluation on Anonymized Real-World Data
We obtained anonymized data from an online payment system, consisting of 1000 log fragments. The data contains time stamps and amounts of payments. The data is anonymized in the following way. Each entry (log fragment) contains 50 events. The event times are rescaled so that the events fall within interval . The payment amounts are normalized to have the mean of 1.
The renewal process model of intrusion is straightforwardly extended to a renewal process with independent marks by multiplying each term in (1) by the corresponding mark density. We evaluate intrusion detection based on intervals alone, marks alone as a baseline, and marks and intervals combined.
Neither parameters of normal processes generating the events nor the prior intrusion probability are known. To estimate the prior intrusion probability, we split the dataset into the training (20%) and test (80%) datasets. We choose the probability to maximize AUC on the training dataset, and then run the inference on the training dataset. For both training and test dataset, we estimate process parameters with the EM algorithm (Section 6.2).
|Intervals||Marks||Marks and intervals|
Intrusion detection metrics on the test dataset are shown in Table 1. Detection based on marks only serves as a comparison baseline. While marks alone provide some information about intrusion, the detection accuracy is much higher when interarrival intervals are taken into account through the renewal process model.
Figure 5 shows the ROC curves of intrusion detection on the test dataset. According to the curves, when the renewal process is used for detection, of entries with intrusion are among topmost entries ordered by posterior intrusion probability. of intrusion events are among topmost events ordered by posterior marginal probability of belonging to an intrusion. Compared to that, when the detection is based on marks alone, the intrusion detection accuracy is only slightly better than random guess.
We introduced a probabilistic generative model for inference about intrusions in renewal processes. Posterior inference in this model can be performed in polynomial time to obtain the posterior intrusion probability, the marginal probability of each event to belong to an intrusion, and a MAP subsequence of intrusion events. When process parameters are unknown, they can be efficiently estimated using an expectation-maximization algorithm.
We evaluated the inference algorithms, including parameter estimation, on both synthetic and anonymized real-world data. In both cases the inference algorithms yielded results suggesting their suitability for intrusion detection. Due to low runtime complexity, the algorithms suit well online applications, such as fraud detection in online payment systems.
Application of the algorithms is based on the assumption that the process generating normal events is sufficiently well described by a renewal process. Evaluation on the anonymized real-world data from an online payment system supports feasibility of this model. However, one may envision cases where renewal process is inadequate, such that when multiple past events affect the distribution of future event times and marks. In such cases, a model based on interacting point processes, in particular on the Hawkes process, should be considered; however, exact or approximate inference algorithms in such a model may have higher computational complexity. On the other hand, if the event series are well described by a Poisson process, intrusions cannot be reliably identified based on interarrival intervals.
-  Leonard E. Baum, Ted Petrie, George Soules, and Norman Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41(1):164–171, 1970.
-  Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection for discrete sequences: A survey. IEEE Trans. on Knowl. and Data Eng., 24(5):823–839, May 2012.
-  V. Chavez-Demoulin and J.A. McGill. High-frequency financial data modeling using Hawkes processes. Journal of Banking & Finance, 36(12):3415 – 3426, 2012. Systemic risk, Basel III, global financial stability and regulation.
-  Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001.
-  D.R. Cox. Renewal Theory. Methuen science paperbacks. Methuen, 1970.
-  B. S. Everitt. An Introduction to Latent Variable Models. Monographs on Statistics and Applied Probability. Springer Netherlands, 1984.
-  Robert G. Gallager. Stochastic Processes: Theory for Applications. Cambridge University Press, 2014.
-  Manish Gupta, Jing Gao, Charu Aggarwal, and Jiawei Han. Outlier Detection for Temporal Data. Morgan & Claypool Publishers, 2014.
-  David Lando. On Cox processes and credit risky securities. Review of Derivatives Research, 2(2):99–120, Dec 1998.
-  Filip Lindskog and Alexander J. McNeil. Common Poisson shock models: Applications to insurance and credit risk modelling. ASTIN Bulletin, 33(2):209–238, 2003.
-  Markos Markou and Sameer Singh. Novelty detection: A review — part 1: Statistical approaches. Signal Process., 83(12):2481–2497, December 2003.
-  Yosihiko Ogata. Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association, 83(401):9+, March 1988.
William D. Penny, Richard M. Everson, and Stephen J. Roberts.
Hidden Markov independent component analysis.In Mark Girolami, editor, Advances in Independent Component Analysis, pages 3–22. Springer London, London, 2000.
-  Marco A. F. Pimentel, David A. Clifton, Lei Clifton, and Lionel Tarassenko. Review: A review of novelty detection. Signal Process., 99:215–249, June 2014.
-  Sam Roweis and Zoubin Ghahramani. A unifying review of linear Gaussian models. Neural Comput., 11(2):305–345, February 1999.
-  Z. Schuss. Theory and Applications of Stochastic Processes: An Analytical Approach. Applied Mathematical Sciences. Springer New York, 2009.
Aleksandr Simma and Michael I. Jordan.
Modeling events with cascades of Poisson processes.
Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, UAI’10, pages 546–555, Arlington, Virginia, United States, 2010. AUAI Press.
Dan Stowell and Mark D. Plumbley.
Segregating event streams and noise with a Markov renewal process
Journal of Machine Learning Research, 14:2213–2238, 2013.
-  Curtis L Tomasevicz and Sohrab Asgarpoor. Preventive maintenance using continuous-time semi-Markov processes. In North American Power Symposium, pages 3–8. IEEE, 2006.
-  Hongzhou Wang and Hoang Pham. A quasi renewal process and its applications in imperfect maintenance. International Journal of Systems Science, 27(10):1055–1062, 1996.
-  C. F. J. Wu. On the convergence properties of the EM algorithm. The Annals of Statistics, 11(1):95–103, 1983.
-  Liang Xiong, Barnabás Póczos, and Jeff Schneider. Group anomaly detection using flexible genre models. In Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS’11, pages 1071–1079, USA, 2011. Curran Associates Inc.
-  Liang Xiong, Barnabás Póczos, Jeff G. Schneider, Andrew Connolly, and Jake Vanderplas. Hierarchical probabilistic models for group anomaly detection. In Geoffrey J. Gordon and David B. Dunson, editors, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS-11), volume 15, pages 789–797. Journal of Machine Learning Research - Workshop and Conference Proceedings, 2011.
-  Nong Ye. A Markov chain model of temporal behavior for anomaly detection. In IEEE Workshop on Information Assurance and Security, pages 171–174, 2000.
To define for any on the same reference probability measure, we extend by unboserved non-intrusion events so that contains exactly non-intrusion events. Since intervals are mutually independent, the expected joint probability density of the unobserved events is the product of expected probability densities of each interval:
is computed as the product of probability densities of each transition in and of expected joint probability density of the unobserved events:
Here, accounts for the case when all events in are intrusion events. The probability density of a randomly chosen interarrival interval of duration is (known as ‘observation paradox’ ). The probability density of an interarrival interval of duration covering is for , 0 otherwise. and for account for intervals from first and last events in to the corresponding extra events. account for intervals between events in .
is unnormalized, hence can be scaled by any factor that does not depend on or . Equation (3) is obtained as
Provided that the probability density of can be computed in fixed time, a MAP subsequence of intrusion events can be computed in time .
The proof uses similar reasoning to the proof of Lemma 1. Any event belonging to the process can be reached from any event preceding it, . Line 3 accounts for transitions from the extra event at the beginning to . Line 5 — for transitions from earlier events in to . , , are the marginal likelihoods of subsequences over time intervals . Similarly, lines 8–11 account for transitions from any event to the extra event at the end. is the marginal likelihood of over time interval .
A renewal process is a Markov process: the arrival time of an event is independent of earlier events given the last event. Consequently, the likelihood of a sequence stays the same if the times of events and the interval bounds are reversed. Therefore, , , are the marginal likelihoods of subsequences over time intervals , and are the marginal likelihoods of over . The loop in lines 20–23 computes the marginal probabilities of events in to belong to the intrusion. Line 21 computes the probability of with by multiplying and and dividing by the probability of independently of other events, because this probability appears twice, both in and in . The expressions for returned values in lines 5 and 22 are due to (5) and (6).
Algorithm 2 runs in time and is called twice. The rest of the algorithm runs in time . Hence, the algorithm runs in time . ∎