Contextual Outlier Detection in Continuous-Time Event Sequences

12/19/2019 ∙ by Siqi Liu, et al. ∙ University of Pittsburgh 0

Continuous-time event sequences represent discrete events occurring in continuous time. Such sequences arise frequently in real-life and cover a wide variety of natural events, such as earthquakes, or events corresponding to human actions, such as medical administrations. Usually we expect the event sequences to follow some regular pattern over time. However, sometimes these regular patterns may be interrupted by unexpected absence or unexpected occurrences of events. Identification of these unexpected cases can be very important as they may point to abnormal situations that need human attention. In this work, we study and develop methods for detecting outliers in continuous-time event sequences, including unexpected absence and unexpected occurrences of events. Since the patterns that event sequences tend to follow may change in different contexts, we develop outlier detection methods based on point processes that take into account different contexts. Our outlier scoring methods are based on Bayesian decision theory and hypothesis testing with theoretical guarantees. To test the performance of the methods, we conduct experiments on both synthetic data and real-world clinical data and show the effectiveness of the proposed methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Continuous-time event sequences are defined by occurrences of various types of events in time. Event sequences may represent many real-world processes and observations including, e.g., arrival of customers to the store, arrival of packets or requests to servers in network systems, administration of drugs to patients, or occurrences of natural events like earthquakes.

The focus of this work is on anomaly/outlier detection methods in event sequences. The problem we want to solve is: given the current time and the history of the event sequence, is there anything anomalous about the recent occurrence or absence of events? There are two types of outliers that may arise here. First, given the history of past events and the recent absence of the events, the event may be overdue. We refer to these as omission outliers. Second, given the history of past events, the event that has just arrived is unexpected in terms of its timing, that is, it has arrived either too early or was not expected at all. We refer to these as commission outliers. Both types of outliers are often related to problems of practical importance. Take for example, a person suffering from a disease and taking specific medications on a regular schedule to treat the disease. Given the schedule, the history of past events and current time, we may infer that the person has not taken the medication yet and the medication is overdue (omission). The detection of the overdue medication can be then used to generate a reminder alert. Another important problem could be related to communication failures. Assume the messages arrive with some frequency (that may vary to a certain degree), but there is currently a long period of silence. This, when compared to expected/normal behavior, may indicate a communication failure (disconnection), and its detection can be very important in practice. To illustrate the second problem (commission) and its importance, consider a patient who takes a medication too early compared to the normal schedule. The detection of this event or its prevention (if we have information to detect the intention prior to the administration of the medication) is extremely important and may prevent adverse situations like high concentration of the drug and its possible toxic effects. Similar situations may happen when one receives a medication that is unrelated to his/her condition. The occurrence of this event may indicate a medical error, and once again its timely detection that can prevent or alleviate the consequences of its occurrence is extremely important.

In order to solve the above outlier detection problems, we study models that are able to accurately represent the event sequences, and outlier detection methods based on the model that are able to detect both omission and commission outliers. To build a flexible model of event sequences, we explore general point process models that permit inclusion of context and past event history to model the occurrence of the next event. Briefly, the occurrences of events may, in many problems of practical importance, depend on context. For example, a medication is administered only to a patient who suffers from a specific disease or a condition, so the disease defines a context inducing the occurrences of the events in time. Similarly, the patient who does not have the disease should not take the medication, so medication administration events under normal circumstances do not occur. In this work, we rely on and experiment with a context-enabled model adapted from the Neural Hawkes process based on continuous-time LSTM [13]. It allows us to represent the dependencies of the chance of the event occurring on the history and context. To detect outliers in event sequences, we propose and develop probabilistic outlier detection methods based on the distributions defined by the point processes. We develop decision rules and scoring methods for detecting these outliers based on Bayesian decision theory and hypothesis testing with theoretical guarantees.

To demonstrate the performance of our outlier detection methods and their abilities to detect outliers in event sequences, we conduct experiments on both synthetic and real-world data. We show our methods can successfully detect omission and commission outliers, even when the occurrences of events depend on the context, and the context may change dynamically in time.

2 Related work

In this section we review two research directions related to our work: outlier detection and point processes.

2.1 Outlier detection

In general, outlier detection [2, 1] aims to identify data instances that are unusual when compared to other instances in data. It has been successfully applied in variety of areas to identify rare and interesting data patterns, which may be associated with beneficial or malicious events, such as fraud identification [5], network intrusion surveillance [6], disease outbreak detection [19], medical error detection [8]. It is also utilized as a primary data preprocessing step that helps to remove noisy or irrelevant signals in data [11].

There exist various approaches for defining and detecting outliers in data. Excellent reviews of these methods can be found in [2, 1]. The outlier detection methods can be applied to detect outliers either on all data dimensions or on a subset of dimensions given the rest of dimensions. The latter case refers to contextual (or conditional) outlier detection [9, 16, 1]. Our objective in this work is to detect unusual absence (omission outliers) and occurrences (commission outliers) of events in event sequences that depend on other types of events defining the context. Unlike outlier detection in general time series [7], (contextual) outlier detection on continuous-time event sequences is a new problem that has not been explored to sufficient depth.

2.2 Point processes

In this work, we use point processes as probabilistic models for continuous-time event sequences. Point processes [3] are probabilistic models for discrete points in continuous domains. They have been widely used to model continuous-time event sequences. An event sequence can consist of a single type or multiple types of events. Hawkes processes [10] are point processes that can model both self-excitation (events of the same type in the past can increase the rate of occurrences of events in the future) and mutual-excitation (events of other types in the past can increase the rate of occurrences of events in the future). Most research effort has been spent on developing different models that improve upon the original Hawkes processes (e.g., [22, 18, 20, 4, 13]). The main focus has been centered around the development of more flexible models to better fit the data and make more accurate predictions.

Recently, researchers have started to develop methods to deal with noisy data, such as incomplete data [21, 15, 14] and desynchronized data [17]. They assume the data has been corrupted by some source (e.g., censoring or noise), and the goal is to recover the original data and/or learn a model nonetheless. Although the omission outliers we address in this work is related to missing data, the goal is completely different. Our goal is to detect these outliers as accurately as possible, i.e. to distinguish them from normal data. Moreover, our method is assumed to be executed in an online manner. That is, we must make a decision about whether there is an outlier based only on the history. We do not have access to the future data. Finally, detecting commission outliers is not related to any of these works.

3 Method

3.1 Problem formulation

First, we formally define the problem of contextual outlier detection in continuous-time event sequences. Everything we observe can be formulated as a marked event sequence: , where is the number of all types of events in the sequence, and are the time and mark of the -th event, is the domain of time, and is the finite set of distinct marks for different types of events.

We make the distinction between context events and target events. The target events are the type of events from which we try to detect outliers. We denote this type as . The other types of events with type are the context events. The (sub)sequence of the target events is , where is the number of the target events in the sequence. We denote the full history of the sequence up to time as , where we omit in the notation, because it should always be clear from the context.

Now, we are ready to define two types of outlier detection problems we want to solve. The first one aims to detect commission outliers (unexpected events). Given an observed target event (of type ) at time in a marked event sequence , and the full history of that sequence up to time , , the goal is to assign a label to indicating whether it is a commission outlier. Notice that is only defined if is the time of an event of type . In this work, instead of hard labels, we consider outputting a commission outlier score for to indicate how likely it is a commission outlier.

The second problem aims to detect omission outliers. Given a blank interval in the target event sequence of a full sequence , i.e., there is no event of type in , and the full history of up to time , , the goal is to assign a label to indicating whether there are any omission outliers in . Notice that is only defined when is a blank interval for (no target event within). In this work, instead of hard labels, we consider outputting an omission outlier score for to indicate how likely it contains any omission outliers.

3.2 Probabilistic models

We develop algorithms for detecting outliers in continuous-time event sequences based on probabilistic models, specifically (temporal) point processes. Point processes are probabilistic models for discrete points in continuous domains. For a continuous-time event sequence, the points are the events, and the domain is the time . In this case, the models also called temporal point processes. A temporal point process can be defined as a counting process on , where is the number of points in the interval . We make a common assumption that at most one event can happen at a given time.

For a temporal point process, the conditional intensity function (CIF),

, characterizes the probability of observing an event in an infinitesimal time interval

given the history up to time . That is

(1)

For our problem, we model the target events using a point process, while the history contains all types of events including both the target events and the context events. Because is conditioned on by definition, we omit for the rest of the paper and always condition on it implicitly.

For a sequence of target events generated from the point process with CIF , the probability density is

(2)

An intuitive interpretation of the equation is that for the observed events at time , is the probability of observing the events at those specific time points. Meanwhile, corresponds to the fact that there are no events at any other time points in .

When detecting outliers, we assume that we already have a point-process model for the target events in normal cases. The model may be specified by an expert or, more generally, learned from existing data. If the model is learned from data, we assume that the training data is outlier-free or that the outliers in the training data are insignificant for learning a model to detect outliers in the test data. The choice of the model can have an influence on the performance of the outlier detection. In this work, we use a flexible model adapted from the neural Hawkes process built on top of the continuous-time LSTM [13]. Next, we briefly describe the continuous-time LSTM and how we apply it for our purpose.

The input to the continuous-time LSTM consists of the marked events in the full sequence, . That is, we not only use the target events but also the context events as input, although we only model the CIF of the target events, . The output consists of the hidden states corresponding to the input. It is a non-linear mapping from the content in the memory cell of the LSTM at time , . As in a traditional LSTM, each continuous-time LSTM unit also has an input gate , an output gate , and a forget gate . The relations between the memory cells, the hidden states, the input, and these gates are summarized as follows.

Let

be a vector representation of the mark

, which is a learnable embedding. For , is a continuous function changing over time from to , and, for and , there are separate input gates and forget gates:

(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)

where is the concatenation of the vectors and , is the element-wise product, is the logistic function, and is the scaled softplus function with parameter . All the , and with/without different subscripts and bars are learnable parameters of the continuous-time LSTM.

Finally, to convert the output of the continuous-time LSTM to the CIF of the target events, , we have where and are learnable parameters. The model is learned by maximizing the likelihood (Eq 2) for all sequences in the training data. Monte-Carlo integration is used to evaluate .

3.3 Detecting commission outliers

To derive an outlier scoring method for commission outliers (unexpected events), we first describe a generative process for defining normal points and outliers. Then based on the generative process, we derive a Bayes decision rule, from which we derive the outlier scoring method for commission outliers.

Suppose we are given a target event (and implicitly the history up to the time

). Define a random variable

, such that if is a commission outlier, and otherwise. We are interested in calculating .

Assume the process that generates outliers is independent from the process that generates normal points. Then, the generative process for the normal points and outliers can be viewed together as a marked point process. That is, for each event there is a hidden mark associated indicating whether it is an outlier. The overall CIF , where is the CIF for the normal point process, and for the point process that generates outliers.

Suppose we are at time . The generative process for a new event given the history is

  1. Sample a new point from the overall marked point process with CIF .

  2. Sample a mark for

    from the Bernoulli distribution

    .

Based on the definition of the CIF of a marked point process, conditioned on the history

(11)
(12)

From the above, we can derive the posterior

(13)

Therefore, the Bayes decision rule is

(14)

where if is true, and 0 otherwise. Assuming is a constant, the decision rule becomes , where is a threshold. This justifies ranking by

(15)

across all , so we use as the commission outlier score: the higher the score, the more likely it is a commission outlier.

3.4 Detecting omission outliers

To derive an outlier scoring method for omission outliers, we first describe a generative process. Based on the process, we derive a Bayes decision rule, from which we derive the outlier scoring method for omission outliers. Finally, we provide an alternative justification for the scoring method based on hypothesis testing.

Assume we have generated a sequence of normal points with the normal CIF . To generate omission outliers, we assume that each point can be removed independently with probability . After the removal, we have a new sequence of points with (unobservable) omission outliers. Then, given any blank interval , we can derive the probability of at least one removal occurring in the interval.

To derive the method, we first define some notation. For any interval , let be the number of points observed, and be the number of points generated by the normal point process with CIF , so is the result of combining with random removal, and we can observe but not . Furthermore, we define an auxiliary random variable that counts the number of points removed in a blank interval as .

For any blank interval , we observe , but can take different values . The joint probabilities are

(16)

where is the probability that points are generated by the normal point process in for . These probabilities depend on the normal CIF

. Then the posterior probability of

can be calculated as

(17)

Define a random variable to indicate whether there are any omission outliers in the blank interval : is equivalent to ; is equivalent to .

(18)

Then the Bayes decision rule is

(19)

The only thing we know about is that . Without further assumptions, cannot be evaluated in closed form, but we can get a lower bound from Eq. 17

(20)

because

Then the posterior probability of containing any omission outliers is

(21)

Therefore, we propose to use

(22)

as the omission outlier score. When we rank the blank intervals by , we essentially rank them by an upper bound of .

We note that, if the normal point process is an inhomogeneous Poisson process, then

(23)

for . The posterior becomes

(24)

Therefore, the posterior probability of containing any omission outliers is

(25)

This justifies scoring the interval by , because if we rank the intervals by their scores, the result will be the same as ranking by the posterior probabilities of containing omission outliers, .

Without assuming that the process is an inhomogeneous Poisson process, an alternative justification for using as the omission outlier score can be given based on hypothesis testing for inter-event time, i.e., the time interval between two consecutive events and . Let be the random variable for the inter-event time. Assume is an observed inter-event interval. The null and alternative hypotheses are

Assuming the null hypothesis is true, i.e.,

is an inter-event interval generated by the normal point process with CIF , the probability that the inter-event time is at least as long as is

(26)

which is the p-value. A lower p-value means that the observation is more extreme, given that the null hypothesis is true, which means it is more likely to contain omission outliers. This justifies scoring by , where a higher score means that is more likely to contain omission outliers.

3.5 Bounds on FDR and FPR

In this section, we prove some bounds on the performance of the proposed scoring methods. We recall the definitions of false discovery rate (FDR) and false positive rate (FPR). Let denote the true label (1=outlier, 0=normal) of an object (a target event or a blank interval) and denote the predicted label. Then FDR and FPR are defined as

Given the above definitions, we can prove the following theorems (see Appendix for proofs).

Theorem 3.1.

If we use the commission outlier score , where is the time of a target event, with a threshold , such that the decision rule is , and let denote the CIF of the independent process generating commission outliers, then we have an upper bound on the FDR as .

Theorem 3.2.

If we use the omission outlier score for an inter-event interval , with a threshold , such that the decision rule is , then we have an upper bound on the FPR as .

Theorem 3.3.

If we use the omission outlier score for a blank interval , with a threshold , such that the decision rule is , and assuming that the normal point process is an inhomogeneous Poisson process and the probability of omission is , we have an upper bound on the FDR as .

4 Experiments

We perform experiments on both synthetic and real-world event sequences. First, we briefly describe the compared methods. Next, we conduct experiments on synthetic data. Finally, we experiment with real-world clinical data.

4.1 Compared methods

We compare the following methods in the experiments. rand

: A baseline that generate outlier scores by sampling from a uniform distribution on

. len: A baseline based on the empirical distribution of the inter-event time lengths. NH: Our method based on the neural Hawkes process but only using the history of the target events as the context. CNH: Our method based on the neural Hawkes process using the history of both the target events and the context events as the context. true: Our method using the true CIF to calculate the outlier scores (only available on synthetic data).

Next we briefly describe the method len. For training, the lengths of all the inter-event time of the target events, are collected. Here, for simplicity, we describe the method as if we only had one sequence in the training data, but it is easy to see how it works for multiple sequences, which is the case in our experiments. Then, an empirical distribution of the inter-event time can be formulated as .

For testing, len outputs a commission outlier score for a target event at time as where is the inter-event time between the current and previous target events. Intuitively, if the inter-event time is too small ( is small) or too big ( is small), it is likely that has occurred at an atypical time and therefore is a commission outlier. The negation makes sure that a higher score indicate that it is more likely to be an outlier. For a blank interval , len outputs an omission outlier score as the length of , . Intuitively, the longer the blank interval, the more likely it contains omission outliers.

For NH and CNH, we tune the number of hidden units in , use 20 percent of the training set as the validation set, and choose the number according to the likelihood on the validation set. We stress that, for training and validation, we do not use labeled outlier data but only use normal data.

4.2 Experiments on synthetic event sequences

We generate synthetic event sequences using two different types of point processes. One is the inhomogeneous Poisson process. The other one is the Gamma process. For each type of processes, there is a set of parameters that determine the distribution of the points. We allow the parameters to vary according to a context state, .

To keep things simple, we allow two different values for the state . Associated with each value of the state is a set of values of the parameters for the point process.

For the inhomogeneous Poisson process, the CIF is a piecewise constant function with the value , where is the context state. In the experiments, we set and .

For the Gamma process, the inter-event time follows a Gamma distribution

( shape, rate), where is the context state. In the experiments, we set and .

The changes of the context state

are driven by a continuous-time Markov chain with a transition matrix

such that

where is infinitesimal time. Each change of the state generates a context event.

For each point process type, we simulate 40 sequences. Each sequence is simulated in the same time range . We use 50 percent of the sequences for training and the other for testing.

4.2.1 Simulation of commission and omission outliers

To define outliers, we simulate commission and omission outliers on top of the existing data. In this way, we can obtain ground-truth labels for testing.

To define commission outliers, we simulate a new sequence of target events independently from the existing data, and then merge the new events with the existing events. We use a Poisson process with a parameter to generate the outliers. controls the rate of such outliers. In the experiments, for each dataset, we set , where and is the empirical rate of the target events calculated from the original test data. We also vary to study the effect.

To define omission outliers, we randomly remove target events in the original sequences according to independent Bernoulli trials. That is, each event is removed with probability and kept with probability . We always keep the event if it marks the start time of the sequence. In the experiments, we set and . We also vary to study the effect.

4.2.2 Detection of commission and omission outliers

We detect the presence of commission and omission outliers differently. To test for commission outliers, each method outputs an outlier score at the time of each target event. That is, whenever there is a new target event, we ask the question: is this event a commission outlier or not?

Testing for omission outliers is trickier, because we need to decide the checkpoints more carefully, i.e., when to ask for outlier scores. The simplest thing to do is to only check at the target event times. That is, whenever there is a new target event, we ask the question: is there any omission outlier starting from the previous target event till now?

However, this may become unsatisfactory in real-world applications, because there could be cases when the target events just stop occurring for a long period of time or even forever (potentially due to malfunctions of the underlying system). These are interesting and important cases we are supposed to detect, but the above testing method will not work. Therefore, we use a combined approach. We still have a checkpoint at each target event time, but on top of that, we also randomly generate checkpoints in long blank intervals.

Specifically, we have a parameter set to , where

is the empirical rate of the target events estimated from the training data for each dataset, so within

, on average, we should see two events normally. Then, whenever the blank interval from the previous checkpoint till now is longer than , we generate a new checkpoint within the interval by uniform sampling, and set the previous checkpoint to the generated checkpoint. We keep generating checkpoints until we reach the next target event or the end of the sequence.

4.2.3 Results

Figure 1 and  2 show the receiver operating characteristic (ROC) curves of the outlier detection methods on the synthetic data generated from inhomogeneous Poisson processes and Gamma processes. The number after each method name is the area under the ROC curve (AUROC). We note that the curves of true and CNH are almost identical.

Both true and CNH achieve the best performance for both commission and omission outliers, showing the effectiveness of our outlier scoring methods. The fact that CNH almost has the same performance as true is an evidence that the model based on the neural Hawkes process is flexible enough to model these different processes that generate the data. NH being worse than CNH shows the importance of the context events in these cases. Although len performs much better than rand, it is worse than the other methods, because it neither uses the context events nor models the dependencies of the target events on the past events. We also change for simulating the outliers, and the results are similar (see Appendix).

[width=]fig/roc_pois_commiss.pdf

[width=]fig/roc_pois_omiss.pdf

Figure 1: ROC curves on synthetic data (Poisson process). Left: commission. Right: omission.

[width=]fig/roc_gam_commiss.pdf

[width=]fig/roc_gam_omiss.pdf

Figure 2: ROC curves on synthetic data (Gamma process). Left: commission. Right: omission.

4.2.4 Empirical verification of the bounds on FDR and FPR

To empirically verify the bounds on FDR and FPR as presented in Section 3.5, we randomly repeat the experiments using true

on the synthetic data 10 times, with the same training data but different test data. Each time, we calculate the FDR and FPR for different thresholds on the scores. For verifying FPR, we only test the inter-event time intervals for omission outliers. Their means and standard deviations over all repetitions are shown, along the theoretical bounds, in Figure 

3 and 4. For FPR, the bounds overlap the empirical rates. FDR (omission outlier) on Poisson processes is in Appendix.

[width=]fig/fdr_pois_commiss.pdf

[width=]fig/fpr_pois_omiss.pdf

Figure 3: FDR (commission outlier) and FPR (omission outlier) on synthetic data (Poisson process).

[width=]fig/fdr_gam_commiss.pdf

[width=]fig/fpr_gam_omiss.pdf

Figure 4: FDR (commission outlier) and FPR (omission outlier) on synthetic data (Gamma process).

4.3 Experiments on real-world clinical data

In this part, we use real-world clinical data derived from MIMIC III dataset [12]. The dataset consists of de-identified electronic health records of ICU patients. We pick four types of events as our targets and form four separate datasets by collecting the target events and corresponding context events. The target events and their context events are listed in the Table 1. The medical category (medication, lab, or vital sign) of each type of events is in brackets following the type. For example, Potassium Chloride is a type of medications, and Potassium (Blood) is a type of lab tests. The latter is used as the context for the former, as the administration of the medication can be triggered by observing an abnormally low value in the lab test.

For every event type in the table, we record that type in the sequence data. However, for Potassium (Blood) and Total Calcium (Blood), we further split the events of into three subtypes depending on whether the value in the lab test is low, normal, or high. For Arterial Blood Pressure systolic (ABPs) and Non-invasive Blood Pressure systolic (NBPs), we split the events into two subtypes depending on whether the value is normal or low. These event subtypes help us define better contexts influencing the target events, since depending on their value, the target events can be more/less likely to occur.

All target and context events for one patient admission form one event sequence. For the first three datasets (first three targets), we have randomly selected 2000 sequences. For the last one, we randomly selected 500 sequences, because each sequence contains much more events than the previous three. For each dataset, we use 50 percent of the sequences for training and the other for testing.

We generate commission and omission outliers on top of the existing sequences with the same processes described for synthetic data. This allows us to obtain ground-truth labels for analyses. Similarly, we detect commission and omission outliers using the same approaches applied to synthetic data.

Target Context
Potassium Chloride [Medication] Potassium (Blood) [Lab]
Calcium Gluconate [Medication] Total Calcium (Blood) [Lab]
INR(PT) [Lab] Heparin [Medication];
Warfarin [Medication]
Norepinephrine [Medication] Arterial Blood Pressure systolic [Vital Sign];
Non-invasive Blood Pressure systolic [Vital Sign]
Table 1: Names of target and context events from MIMIC. INR=international normalized ratio; PT=prothrombin time.

4.3.1 Results

Table 2 shows the AUROC of the methods for the datasets derived from MIMIC data. The results have more variations across different datasets in this case, which can be seen by examining the performance of len. Omission outliers appear to be more challenging than commission outliers except for INR(PT) lab test. Comparing the methods, CNH and NH outperform rand and len on all the datasets for both commission and omission outliers.

In all cases, CNH is either the best or very close to the best. In the latter cases, the best method is always NH, and the difference is very small. These are the cases where the additional context events are not as influential as the history of the target events themselves for the occurrences of the target events, so NH is as good as but simpler than CNH. However, for Potassium Chloride and Calcium Gluconate, we can see a clear advantage of CNH over NH by using additional context events.

Dataset rand len NH CNH
Pot (C) [0.1] 0.498 0.733 0.839 0.878
Pot (C) [0.05] 0.488 0.707 0.827 0.878
Pot (O) [0.1] 0.495 0.533 0.735 0.749
Pot (O) [0.05] 0.503 0.539 0.727 0.744
Cal (C) [0.1] 0.504 0.739 0.830 0.866
Cal (C) [0.05] 0.470 0.753 0.843 0.885
Cal (O) [0.1] 0.493 0.526 0.760 0.775
Cal (O) [0.05] 0.513 0.531 0.760 0.761
INR (C) [0.1] 0.496 0.596 0.682 0.687
INR (C) [0.05] 0.486 0.613 0.702 0.701
INR (O) [0.1] 0.498 0.726 0.748 0.746
INR (O) [0.05] 0.487 0.736 0.777 0.782
Nor (C) [0.1] 0.494 0.864 0.890 0.897
Nor (C) [0.05] 0.506 0.868 0.899 0.899
Nor (O) [0.1] 0.510 0.468 0.834 0.832
Nor (O) [0.05] 0.506 0.489 0.830 0.826
Table 2: AUROC on MIMIC data. First column consists of target name abbreviation (C=commission, O=omission) [].

5 Conclusion

In this work, we have studied the new problem of detecting commission and omission outliers in continuous-time event sequences. We have proposed outlier scoring methods based on Bayesian decision theory and hypothesis testing with theoretical guarantees. The proposed methods depend on a probabilistic model for normal data, which we adapt from the neural Hawkes process based on continuous-time LSTM and combine with the outlier scoring methods. We have conducted experiments on both synthetic and real-world event sequences. The results show the flexibility of the adapted model and the effectiveness of the proposed outlier scoring methods.

Appendix A Proofs of the bounds on FDR and FPR

a.1 Theorem 3.1

Proof.

From Eq. 13 and implicitly conditioned on the event and the history

Given that , i.e., , we get

a.2 Theorem 3.2

Proof.

Let be the random variable for the inter-event time corresponding to the observed inter-event interval , assuming it is generated from the normal point process. From Eq. 26

The last equality is because , and

is the cumulative distribution function of

, implying it follows a uniform distribution. ∎

a.3 Theorem 3.3

Proof.

From Eq. 25 and implicitly conditioned on and the history

Given that , i.e., , we get

Appendix B Empirical verification of the bounds on FDR and FPR

We show the results of empirically verifying the bounds proved in Section 3.5, continuing the results in Section 4.2.4. Figure 5

shows the FDR (omission outlier) with means and standard deviations on data simulated from inhomogeneous Poisson processes, along the theoretical bounds. As we can see, the FDR has high variance when the threshold is high, because there are smaller number of samples above a higher threshold. Nonetheless, the empirical FDR is well aligned with the bound.

[width=]fig/fdr_pois_omiss.pdf

Figure 5: FDR (omission outlier) on synthetic data (Poisson process).

Appendix C Vary on synthetic data

We vary for simulating commission and omission outliers, and see its effect. Table 3 shows the AUROC of the methods. As we can see, changing does not affect the advantage and disadvantage of each method.

Dataset rand len NH CNH
Poi (C) [0.1] 0.500 0.601 0.684 0.711
Poi (C) [0.05] 0.493 0.627 0.684 0.716
Poi (O) [0.1] 0.503 0.650 0.737 0.778
Poi (O) [0.05] 0.491 0.650 0.736 0.776
Gam (C) [0.1] 0.485 0.754 0.816 0.871
Gam (C) [0.05] 0.479 0.776 0.840 0.897
Gam (O) [0.1] 0.505 0.799 0.901 0.956
Gam (O) [0.05] 0.503 0.803 0.919 0.961
Table 3: AUROC on synthetic data. First column consists of dataset name abbreviation (C=commission, O=omission) [].

References

  • [1] C. C. Aggarwal (2013) Outlier analysis. Springer New York. External Links: ISBN 9781461463955, LCCN 2012956186, Link Cited by: §2.1, §2.1.
  • [2] V. Chandola, A. Banerjee, and V. Kumar (2009-07) Anomaly detection: a survey. ACM Comput. Surv. 41 (3), pp. 15:1–15:58. External Links: ISSN 0360-0300, Document, Link Cited by: §2.1, §2.1.
  • [3] D. J. Daley and D. Vere-Jones (2003) An introduction to the theory of point processes: volume I: elementary theory and methods. Springer, New York. Cited by: §2.2.
  • [4] N. Du, H. Dai, R. Trivedi, U. Upadhyay, M. Gomez-Rodriguez, and L. Song (2016) Recurrent marked temporal point processes: Embedding event history to vector. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1555–1564. Cited by: §2.2.
  • [5] T. Fawcett and F. Provost (1997) Adaptive fraud detection. Data mining and knowledge discovery 1 (3), pp. 291–316. Cited by: §2.1.
  • [6] P. Garcia-Teodoro, J. Diaz-Verdejo, G. Maciá-Fernández, and E. Vázquez (2009) Anomaly-based network intrusion detection: techniques, systems and challenges. computers & security 28 (1), pp. 18–28. Cited by: §2.1.
  • [7] M. Gupta, J. Gao, C. C. Aggarwal, and J. Han (2013) Outlier detection for temporal data: a survey. IEEE Transactions on Knowledge and Data Engineering 26 (9), pp. 2250–2267. Cited by: §2.1.
  • [8] M. Hauskrecht, I. Batal, M. Valko, S. Visweswaran, G. F. Cooper, and G. Clermont (2013-02) Outlier detection for patient monitoring and alerting. Journal of Biomedical Informatics 46 (1), pp. 47–55. External Links: ISSN 1532-0464, Document, Link Cited by: §2.1.
  • [9] M. Hauskrecht, M. Valko, B. Kveton, S. Visweswaram, and G. Cooper (2007-11) Evidence-based anomaly detection. In Annual American Medical Informatics Association Symposium, pp. 319–324. Cited by: §2.1.
  • [10] A. G. Hawkes (1971) Spectra of some self-exciting and mutually exciting point processes. Biometrika 58 (1), pp. 83–90. External Links: ISSN 0006-3444 Cited by: §2.2.
  • [11] V. Hodge and J. Austin (2004) A survey of outlier detection methodologies. Artif. Intell. Rev. 22 (2), pp. 85–126. External Links: ISSN 0269-2821, Document, Link Cited by: §2.1.
  • [12] A. E. W. Johnson, T. J. Pollard, L. Shen, L. H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. Anthony Celi, and R. G. Mark (2016-05) MIMIC-III, a freely accessible critical care database. Scientific Data 3, pp. 160035 (en). External Links: ISSN 2052-4463 Cited by: §4.3.
  • [13] H. Mei and J. M. Eisner (2017) The neural Hawkes process: A neurally self-modulating multivariate point process. In Advances in Neural Information Processing Systems, pp. 6757–6767. Cited by: §1, §2.2, §3.2.
  • [14] H. Mei, G. Qin, and J. Eisner (2019-05) Imputing missing events in continuous-time event streams. arXiv:1905.05570 [cs, stat]. External Links: 1905.05570 Cited by: §2.2.
  • [15] C. R. Shelton, Z. Qin, and C. Shetty (2018-04) Hawkes process inference with missing data. In

    Thirty-Second AAAI Conference on Artificial Intelligence

    ,
    (en). Cited by: §2.2.
  • [16] X. Song, M. Wu, C. Jermaine, and S. Ranka (2007) Conditional anomaly detection. Knowledge and Data Engineering, IEEE Transactions on 19 (5), pp. 631–645. Cited by: §2.1.
  • [17] W. Trouleau, J. Etesami, M. Grossglauser, N. Kiyavash, and P. Thiran (2019-05) Learning Hawkes processes under synchronization noise. In

    International Conference on Machine Learning

    ,
    pp. 6325–6334 (en). Cited by: §2.2.
  • [18] Y. Wang, B. Xie, N. Du, and L. Song (2016) Isotonic Hawkes processes. In International Conference on Machine Learning, pp. 2226–2234. Cited by: §2.2.
  • [19] W. K. Wong, A. Moore, G. Cooper, and M. Wagner (2003-08) Bayesian network anomaly pattern detection for disease outbreaks. In International Conference on Machine Learning, pp. 808–815. Cited by: §2.1.
  • [20] H. Xu, M. Farajtabar, and H. Zha (2016) Learning Granger causality for Hawkes processes. In International Conference on Machine Learning, pp. 1717–1726. Cited by: §2.2.
  • [21] H. Xu, D. Luo, and H. Zha (2017-07) Learning Hawkes processes from short doubly-censored event sequences. In PMLR, pp. 3831–3840 (en). Cited by: §2.2.
  • [22] K. Zhou, H. Zha, and L. Song (2013) Learning triggering kernels for multi-dimensional Hawkes processes. In International Conference on Machine Learning, pp. 1301–1309. Cited by: §2.2.