I Introduction
Event detection has many realworld applications such as surveillance [1, 2], border security using unattended ground sensors (UGS) [3], crime hotspot detection for law enforcement [4], cyberinfrastructure monitoring [5], realtime traffic monitoring [6], and environmental and natural disaster monitoring [7, 8]. We address the problem of realtime event detection for gathering tactical intelligence, which is critical for military and lawenforcement missions. For instance, in tactical scenarios, like cordon and search, there is a need for gathering realtime intelligence that can help Soldiers at the squad level gain situational understanding of a scene and to quickly make missionoriented decisions. To help gain such actionable information, Soldiers may deploy a variety of sensors such as cameras for imagery and video. The squad may also have access to auxiliary information such as SIGINT, SPOT reports, Blue Force Tracking data, and local social network feeds. Currently, relevant information is processed and analyzed in a faroff rear position, but there can be significant delays in receiving important decisions at forward positions. Some of the automated decisions that could help the squad carry out the mission successfully include the detections and locations of enemy entities such as personnel and vehicles. The goal is to provide realtime threat indicators from all available information sources at the point of need.
We tackle two fundamental questions that arise in such scenarios. First, we have to process and fuse the information from traditional physicsbased sensing systems, such as video sensors, and nontraditional sensing systems, such as social networks, to provide indications and warnings. Second, we have to push the processing to the operational environment where the information is needed most, and hence there is a need for realtime event detection. Motivated by these two questions, in this paper, we consider the problem of realtime event or anomaly detection using multimodal data.
To solve the complex multimodal event detection problem, we need to develop useful mathematical models as well as efficient algorithms and validate them on realworld data collected during tactical scenarios. However, access to such data for research is severely restricted. To overcome this, we instead use publicly available data sources as surrogates for tactical data sources. In the case of imagery, we use New York City (NYC) traffic CCTV cameras as surrogates for low altitude UAVs with video sensors. The video sensors onboard tactical UAVs typically are lowresolution and have a wide range of ground sample distances. Simiar image qualities are present in the publicly available CCTV traffic camera imagery. Instagram has medium to highresolution imagery, which can be viewed as surrogates for imagery collected by Soldierworn cameras. In scenarios where social media posts are not available, the social media posts in this data collection could be viewed as surrogates for SIGINT data (e.g., counts of communications packets through local nodes).
In this work, we are interested in the subtle information available in the dynamics of sequences of subevents, e.g., changes in the counts of persons and vehicles in a spatial region and changes in the corresponding social network posts in the same region. As a result, we utilize the images from the CCTV cameras to extract counts of persons and vehicles in a spatial region. We also utilize the social media posts to generate count sequences of Twitter and Instagram posts in the constrained region. We develop a theoretical framework and a novel algorithm for sequential detection of changes in count statistics. The developed algorithm is then applied to data collected from the NYC CCTV cameras and social media feeds to detect a 5K race. The proposed mathematical framework, and the developed algorithm can also be adapted to other event detection problems. For example, in cyberinfrastructure monitoring, the types and counts of intrusion attempts can indicate the onset of a coordinated attack [9].
Towards developing a mathematical model for the problem, we first study the statistical behavior of the count data on the day of the event (a 5K race in NYC) and also on the nonevent days (see Section II
). We observe that the count sequences have nonstationary rates, i.e., the average counts of persons, vehicles, or social media posts, change over time, on each day. Thus, the event detection problem of interest in this paper is a problem of detecting changes in the levels of nonstationarity of rates. We use the framework of POMDP to model the rate level change detection problem as detection of time to absorption in a hidden Markov model (HMM)
[10], [11] (see Section III). Our POMDP problem is more general than the one studied in [12] as we detect both increases and decreases in rates. As a result, it is not apparent if our POMDP solution has the threshold structure that was found in the problem in [12]. However, in this paper, we show that under certain assumptions on the transition structure of the HMM, the solution to our POMDP problem also has a simple threshold structure (see Section IV). We then apply the resulting belief sum algorithm to detect the event (see Section V).Ii Data Collection and Modeling
We collected imagery from CCTV cameras and social networks around the Tunnel to Towers 5K run that occurred on September 24th, 2017, in NYC. Data was also collected on two weekends before the run, on September 10th and 17th, and a weekend after the run, on October 1st. CCTV imagery and social media posts were collected over a geographic region from the Red Hook village in Brooklyn on the south end to the Tribeca village on the north end of the collection area. Data were collected between 8:30 am and 2 pm local on each of the 4 days. On average, the frame rate from 7 CCTV cameras was roughly 0.5 frames per second. While the average post rates from Twitter and Instagram for the geographic region and collection period were 1.4 and 0.7 posts per second, respectively. Note that for this initial modeling and analysis work, no other filtering of social posts was applied (e.g., hashtag clustering or content analysis).
The objective is to detect the event in terms of location and time of the 5K run from the multimodal data. It is to be expected that the run would increase the number of persons on the streets overlapping with the route followed for the run. The run would also cause a sudden decrease in the number of cars on the same streets. It is also expected that the event would cause a surge in the number of tweets or Instagram posts pertaining to the event. Motivated by these observations, we approach this problem through the framework of quickest detection in count data. The multimodal data is used to obtain counts of objects (persons and vehicles) per frame and counts of tweets and Instagram posts per second. Fig. 1 illustrates the block diagram of the event detection system.
To obtain the counts of persons and vehicles, first, we use a Convolutional Neural Network (CNN) based object detector to detect persons and vehicles in each frame coming from the CCTV cameras. Specifically, we use faster RCNN
[13], which uses VGG16 architecture [14] as the base CNN with region proposal networks, to perform realtime object detection. The faster RCNN is trained on the PASCAL Visual Object Classes (VOC) dataset [15]. The dataset has labeled training data for person class as well as vehicle classes that include bus, car, and motorbike. The counts of persons and vehicles are generated by simply counting the number of detected objects belonging to the corresponding class in each frame.In Fig. 2, we have plotted the total person counts, summed across the seven CCTV cameras of interest, for each of the four separate dates. Similar data for the total number of cars are shown in Fig. 3. In this figure and the figures in the rest of this paper, the horizontal axis is time in multiples of six seconds. In Fig. 4, we have plotted the person counts for two cameras: camera C1 is on the path of the race while camera C2 is off the path. In Fig. 5, we have similarly plotted individual car counts for camera C1 and C2. We see a clear increase in the rate of the number of persons and a slight decrease in the number of cars on the day of the event between the time slots and . Finally, in Fig. 6, we have plotted cumulative counts of Instagram posts in geographical vicinity of camera C1 and C2 for the four days. We see an increase in the cumulative Instagram counts just around time slot , just before the person and car counts return to their normal rates. We hypothesize that the latter is due to the fact that the participants started posting on social media after completing the race.
From these figures, one can make an observation that the rates and counts are nonstationary in nature. Thus, the problem of event detection here can be posed as a detection problem in nonstationary environments. Since the event detection has to be performed in real time, this would translate to sequential detection of changes in rate from one nonstationary level to another. In the next section, we formulate this problem in a POMDP framework and solve it to obtain structural results on the optimal solution. The resulting optimal algorithm will then be applied to the collected data to detect the event.
Iii Problem Formulation
We note that the count data generated from any modality is a sequence of discrete positive integers. For simplicity, we model the count data as a sequence of Poisson random variables. The results below are also valid for any single parameter probability distribution, discrete or continuous, with sums replacing integrals, where appropriate. Also, for simplicity, we develop the theory for event detection in a single stream of count data here. The resulting algorithm is then trained and applied to every count sequence generated from every modality. However, the mathematical model can easily be extended to a vector stream of observations to detect an event jointly across modalities.
As observed in Fig. 2–Fig. 6, count sequences are nonstationary in nature, on both the event day and the noevent days. In order to capture this nonstationarity, we model the count data as an HMM. In this HMM, there are a finite number of hidden states, and for each hidden state, the rate or mean of the observed count or Poisson random variable is different. Thus, if represents the observed count variable, represent the hidden state variable, and if are possible rates for preevent data, then
When the event starts, which we call a change point, the rate of counts either decreases (as in the case of cars) or increases (as in the case of persons). In practice, the postchange rates may not be known or may be hard to learn due to a lack of enough training data of a rare event. Motivated by this, we model the prechange and postchange rates by boundary rates (to capture a decrease in rates) and (to capture an increase in rates) such that
In other words, and represent the minimum amount of change the designer of the system is interested in detecting. Note that while the number of cars decreases during the event in this data, one may also observe an increase in the numbers due to congestion of traffic. Thus, both increase and decrease of rates are of interest to us, and our model allows for both these possibilities.
We have states, normal states with corresponding Poisson rates , and abnormal states corresponding to Poisson rates . Our aim is to observe the Poisson count data , and optimally detect the change of the hidden rate from normal to abnormal rates. Specifically, we want to detect this change as quickly as possible subject to a constraint on the false alarm rate. This leads us to the realm of quickest change detection [16], [17], [18]. Here, we solve the rate change detection problem by formulating it as a POMDP [10], [11].
Iiia POMDP Formulation

States
Let be the sequence of states with values. The state process is a finite state Markov chain taking values
. The state is a special absorbing state introduced for mathematical convenience in a stopping time POMDP [10]. Its role will be clear when we define the cost structure below. The transition probability matrix of the Markov chain is a function of the control, and will also be defined below. 
Control
The control sequence taking values is binary valued: . The control is used to continue the observation process and is used to stop it. At the time of stopping, an alarm is raised indicating that a change in the rate of the data has occurred. 
Observations
The observations are Poisson distributed with rate
, if the state and if the control is to continue:The distribution of observations, if the state is or if the control is to stop, is irrelevant. We use to denote the law of when the state is and the control is . We also assume that the variable is independent of the past states and controls, given the current state and last control. That is

Transition Structure
The transition structure depends on the control . Let be the transition matrix for the Markov chain from time to , given the control is . Then,we haveHere,
and
(1) To understand these two transition structures, we first define the initial distribution for the Markov chain as
which satisfies . Thus, the Markov chain starts in one of the states . As long as the control , which means to continue, the states evolve according to the transition probability matrix . The transition probabilities
(2) that are part of the matrix in (1) control the transition of the Markov chain within the states , and its jump to the absorbing states and . We assume that absorption to the states and is inevitable. Once in these two states, the Markov chain jumps between these two states with probabilities controlled by and . We are especially interested in the case when . This is because the states correspond to the normal states for the counts before the change. After the change, we expect that either the rate will increase, corresponding to absorption of the Markov chain to the state , or it will decrease, corresponding to absorption to the state . Once the rate increases or decreases, it is unnatural to expect that rate will transition between too low and too high rates. However, the case , is of mathematical interest, and its role and importance will be briefly discussed below.

Cost
Our objective is to detect a change in the rate of counts from normal rates to abnormal rates and . This is equivalent to detecting the time to absorption of the Markov chain from the states to the states . We now define a cost structure for the POMDP to capture the sequential event detection framework. Let be the cost associated with state , and control and defined asHere, is a unit column vector with value at the th position. The constant captures the cost of false alarm, and is incurred when the control is to stop and the state is in . Similarly, captures the cost of delay, and is incurred when the control is to continue even if the Markov chain is absorbed in either of the states . Note that the cost of being in state is zero independent of the choice of control.

Policy
Let be the information at time . Also define a policy to be a sequence of mappings such that .
We want to find a control policy so as to optimize the long term cost, which is
where . Let . Then,
(3) 
Thus, the cost is finite if . The role of the extra state is now clear. After the stopping control is applied, the Markov chain’s transition is governed by the transition matrix . As a result, the Markov chain gets absorbed into the state immediately. From here, due to the cost structure, the cost to go is zero, no matter what control is chosen. In conclusion, we search over policies for which . This is hardly a concern since any openloop policy, where we always stop at a fixed time, satisfies this condition. We are looking for policies better than that, i.e., for closedloop control that allows us to stop dynamically after observing the system.
Iv Structure of the Optimal Policy
Let be the belief at time defined as . Note that the belief is a vector of length . By standard Bayes arguments, this belief can be computed recursively as
(4) 
Here, is a diagonal matrix of emission probabilities
and is a vector of all s of length .
It is a standard result in the POMDP literature that the cost to go in (3) satisfies the Bellman’s equation
where . It can be shown that the optimal policy is stationary and is a function of only the belief state. Furthermore, the value function can be computed using value iteration [10], [11]. That is, the optimal policy is of the form . In addition, the following result can also be shown.
Theorem IV.1 ([19], [10])
Let
be the region of the belief space on which the control is chosen, or the stopping decision is made. Then is convex.
A standard approach to solving POMDP problems, which are typically hard to solve due to the highdimensionality of the belief space, is to establish additional structural results on the policy . Specifically, it is of interest to show that the optimal stopping time in a POMDP has a threshold structure, or the policy is, in some sense, monotone in . The threshold structure motivates the use of policies that are linear in the belief state. See [10] for a detailed discussion.
Unfortunately, all the conditions needed to establish the threshold structure are not satisfied in our problem. For example, the transition structure and the emission probabilities satisfy the socalled total positivity conditions. But, the cost structure does not have the required monotonicity and submodularity structure [10]. Even a transformation of the problem, as suggested in [10] does not help. The main issue is that in comparison with the results in [12], in this paper, we have two absorbing states, one for the low rate and another for the high rate. However, we now establish that under some additional assumptions, the optimal policy can be shown to be only a function of the probabilities and .
Theorem IV.2
Proof:
Note that even without the addition assumptions that are made in the theorem statement, the value function satisfies the condition
This is because of the special structure of the cost function assumed in the paper. Now, we can show that under the assumptions of the theorem, the belief recursion (4) can be computed just based on the values of and . Hence, the fixed point equation of the value function is only a function of these two values. The rest follows by using the standard value iteration arguments.
The condition that the rows of be same is easily satisfied in the following special case:
(5) 
Thus, with this choice of , the Markov chain moves around the states to randomly and can get absorbed to and , all with equal probability. Numerical evaluation of the optimal policy for this case, under some parameters of choice, shows that the optimal policy is a function of and only through . In fact, according to this numerical study, the optimal stopping rule is
(6) 
We note that although the marginal costs are a function of and only through , the belief recursion cannot be computed just using this sum. To compute (4) we need both and
individually. This can be verified by explicitly writing the belief recursion under the stated assumptions. Thus, it is not clear to us at this moment if this threshold structure, the optimal policy being only a function of the sum
, holds for more general cases.However, if we make the assumption that , then in this case, we can show that the optimal policy is only a function of the sum .
Theorem IV.3
Proof:
Under the stated assumptions, the belief recursion can be shown to be a function of only the sum . The rest of the proof is identical to that of Theorem IV.2.
Note that this means that under the assumptions made in Theorem IV.3, and in Theorem IV.2, the quickest change detection problem studied here reduces to the classical quickest change detection problem [20] in some sense. In the classical problem, there are two hidden states, one before the change and another after the change. The hidden Markov chain starts with one state and gets absorbed into the other. The objective in the classical change point problem is to detect this time to absorption. The optimal algorithm for the classical problem is to stop the first time, the belief that the Markov chain is in the postchange state, is above a threshold.
In the change point problem considered in this paper, we have two classes of states, one class consisting of prechange or normal states , and another class consisting of postchange or abnormal states . And the objective here is to detect the time at which the Markov chain moves from the prechange class to the postchange class. The above theorems suggest that under the stated assumptions, the optimal stopping rule has a similar structure. That is, it is optimal to stop the first time the probability that the Markov chain is in the postchange class of states is above a threshold, such as that in stopping rule (6).
V The Belief SUM Algorithm and its Applications
In the previous section, we established conditions on the transition structure of the HMM under which the algorithm,
(7) 
is optimal. However, note that under more general cost structures, it is not obvious if this is still the optimal algorithm. As a result, and motivated by the optimality of (7) under some assumptions, we use a more general class of algorithms that are linear in the beliefs and . That is, we use the sum belief algorithm using a convex combination of the beliefs
(8) 
where , and optimize over the choice of . Due to a paucity of space, a detailed delay and false alarm analysis of this algorithm will be reported elsewhere. In this section, however, we apply the algorithm to real data to show its effectiveness. In the following, we often use . In those cases, we actually report values of the sum statistic in (7).
Va Global Event Detection
We now apply the algorithm to data collected around the 5K run. The details on the data are provided in Section II. In practice, the algorithm should be applied to data collected from each individual source: to outputs of each camera and also to outputs of social media data in each geographical region. A high value of the sum statistic would indicate an abnormal behavior in a stream. This can be used to both detect and isolate the event. This is done in the next subsection. However, we may also wish to apply the algorithm to the global sum of data collected to detect global trends, in case they generate a collaborative effect.
We applied the algorithm to total count data from all cameras for global detection of the event. We first used the data from the first recorded day, Sept. 10th, to learn the Poisson rates. We then applied the trained algorithm to data from other days. The parameters used were , , , , , , and , i.e., . The transition matrix used was assumed fixed as in (5) and (1) with . Note that the rate parameters we learn from the data are to . The values and
are chosen to be the boundary of the learned rates based on multiple standard deviations from normal rates.
In Fig. 7 and Fig. 8, we report the results on application to total person count data. The data for the event day and a nonevent day is shown in Fig. 2. Note that the statistic is sporadically large on the nonevent days, but consistently fires around the event on the day of the event.
Similar results are reported in Fig. 9 and Fig. 10 for the total car count data from Fig. 3. The learned values of rates from Sept. 10th data are , , and , with the rest of the parameters kept the same. As seen in Fig.9, the statistic fires sometimes even on the nonevent days. This is because we are applying the algorithm to the sum of count data from all cameras, and this may reduce the quality of the count sequence.
VB Event Localization
In this section, we report results on the application of the algorithm (8) to individual data streams. In Fig. 11, we have plotted the evolution of the sum statistic for data from the camera C1. The count data is shown in Fig. 4. The parameters learned from the data on Sept. 10th are , , , , , and . We have also used (8) with . In Fig. 12, we have plotted the sum statistic corresponding to data from camera C1 on the nonevent days, and from camera C2. As can be seen in the figures, the statistic corresponding to C1 on the event day fires around the event, while we see almost no activity in other streams.
In Fig. 13, we report results for the Instagram count data. We have again used (8) with . The parameters learned from the data on Sept. 10th are , , and . The sum statistic fires and stays close to one for counts from area close to camera C1 on the event day, while there is sporadic activity for data from other streams. Thus, qualitatively, the sum statistic or sum belief algorithm successfully detects the 5K event.
Vi Conclusions
We proposed a theoretical framework for event detection in nonstationary environments using multimodal data. Motivated by the statistical behavior of count data extracted from CCTV images and social network posts, we formulated the event detection problem as a quickest change detection problem for detecting changes in count rates from one family of rates to another. We then obtained structural results for the optimal policy for the resulting POMDP and motivated a belief sum algorithm. We then applied the algorithm to real data collected around a 5K run in NYC to detect the event. For simplicity, we developed the framework for a single stream of count data here. However, the mathematical model can easily be extended to a vector stream of observations to detect an event jointly across modalities. The POMDP model studied in this paper is a Bayesian model. In the future, we will explore detection in nonBayesian models. We will also explore more general parametric and nonparametric models for count data for wider applicability.
Acknowledgment
The work of Taposh Banerjee and Vahid Tarokh was supported by a grant from the Army Research Office, W911NF1510479.
References
 [1] R. Panda and A. K. RoyChowdhury, “Multiview surveillance video summarization via joint embedding and sparse optimization,” IEEE Transactions on Multimedia, vol. 19, no. 9, pp. 2010–2021, 2017.
 [2] S. C. Lee and R. Nevatia, “Hierarchical abnormal event detection by real time and semireal time multitasking video surveillance system,” Machine vision and applications, vol. 25, no. 1, pp. 133–143, 2014.
 [3] R. Szechtman, M. Kress, K. Lin, and D. Cfir, “Models of sensor operations for border surveillance,” Naval Research Logistics (NRL), vol. 55, no. 1, pp. 27–41, 2008.
 [4] D. B. Neill and W. L. Gorr, “Detecting and preventing emerging epidemics of crime,” Advances in Disease Surveillance, vol. 4, no. 13, 2007.
 [5] R. Mitchell and I. R. Chen, “Effect of intrusion detection and response on reliability of cyber physical systems,” IEEE Transactions on Reliability, vol. 62, pp. 199–210, March 2013.
 [6] E. D’Andrea, P. Ducange, B. Lazzerini, and F. Marcelloni, “Realtime detection of traffic from Twitter stream analysis,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, pp. 2269–2283, Aug 2015.
 [7] E. W. Dereszynski and T. G. Dietterich, “Probabilistic models for anomaly detection in remote sensor data streams,” arXiv preprint arXiv:1206.5250, 2012.
 [8] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes Twitter users: Realtime event detection by social sensors,” in Proceedings of the 19th Int. Conf. on World Wide Web, pp. 851–860, ACM, 2010.
 [9] R. Harang and A. Kott, “Burstiness of intrusion detection process: Empirical evidence and a modeling approach,” IEEE Transactions on Information Forensics and Security, vol. 12, pp. 2348–2359, Oct 2017.
 [10] V. Krishnamurthy, Partially Observed Markov Decision Processes. Cambridge University Press, 2016.
 [11] D. P. Bertsekas and S. Shreve, Stochastic optimal control: the discretetime case. Academic Press, 1978.
 [12] V. Krishnamurthy, “Bayesian sequential detection with phasedistributed change time and nonlinear penalty—a pomdp lattice programming approach,” IEEE Transactions on Information Theory, vol. 57, no. 10, pp. 7096–7124, 2011.
 [13] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster RCNN: towards realtime object detection with region proposal networks,” CoRR, vol. abs/1506.01497, 2015.
 [14] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” CoRR, vol. abs/1409.1556, 2014.

[15]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The
pascal visual object classes (voc) challenge,”
International Journal of Computer Vision
, vol. 88, pp. 303–338, June 2010.  [16] V. V. Veeravalli and T. Banerjee, Quickest Change Detection. Academic Press Library in Signal Processing: Volume 3 – Array and Statistical Signal Processing, 2014. http://arxiv.org/abs/1210.5552.
 [17] H. V. Poor and O. Hadjiliadis, Quickest detection. Cambridge University Press, 2009.
 [18] A. G. Tartakovsky, I. V. Nikiforov, and M. Basseville, Sequential Analysis: Hypothesis Testing and ChangePoint Detection. Statistics, CRC Press, 2014.
 [19] W. S. Lovejoy, “On the convexity of policy regions in partially observed systems,” Operations Research, vol. 35, no. 4, pp. 619–621, 1987.
 [20] A. N. Shiryayev, Optimal Stopping Rules. New York: SpringerVerlag, 1978.
Comments
There are no comments yet.