I Introduction
Numerous studies within the past decade have focused on connected and automated vehicle (CAV) technology, which is considered to be an integral part of the future of the intelligent transportation system (ITS) field [33]. CAVs have the potential to transform the ITS field by introducing numerous safety, mobility, and environmental sustainability benefits. A CAV system combines connected vehicle (CV) and automated vehicle (AV) technologies, creating a synergistic impact that goes well beyond the benefits that each of these technologies can offer in isolation. It is envisioned that CAVs, with diverse degrees of connectivity and automation, will lead the path toward the next generation of transportation systems, which is more intelligent, efficient, and sustainable [21, 19].
CAVs use wireless technologies to enable communication and cooperation not only among vehicles but also between vehicles and the transportation infrastructure. By using dedicated shortrange communication (DSRC) [17], or other types of communication technologies, vehicles and roadside units (RSUs) are able to continuously transmit and receive information such as speed, position, acceleration, braking status, traffic signal status, etc., through what is called a Basic Safety Message (BSM). These communication messages have a range of about 400 meters and can detect highrisk situations that may not be observable otherwise due to traffic, terrain, or weather [17]. The CAV technology extends and enhances currently available crash avoidance systems that use radars and cameras to detect collision threats by enabling CAVs to warn their surrounding vehicles of collisions and potentially hazardous circumstances. In addition, they provide mobility and sustainability benefits by enabling platoon formation, which can increase road capacity and reduce fuel consumption. However, as vehicles and infrastructures become more interconnected and automated, the vulnerability of their components to faults and/or deliberate malicious attacks increases. This vulnerability is exacerbated by the increase in vehicletovehicle (V2V) and vehicletoinfrastructure (V2I) communications, which increase a vehicle’s external connection interfaces. At the system level, CAVs and the infrastructure can be viewed as individual nodes in a large interconnected network, where a single anomaly or malicious attack can easily propagate through this network, affecting other network components (e.g., other vehicles, traffic control devices, etc.). Therefore, there is an increasing demand for cyber security solutions, e.g., anomaly detection methods, in CAV sensor systems to enhance safety and reliability of CAVs and the entire network.
Anomaly detection in CAV sensors is an important but also challenging task. A traveling CAV could use the most recent history of data to detect anomalies. Presence of an anomaly in the pattern of data collected from a CAV sensor system can imply () a subset of sensors are faulty, or () there has been a malicious attack. In both cases, it is vital to detect the anomalies and exclude the anomalous data from the decision making process.
An anomaly detection scheme introduces two types of errors – false negatives and false positives. It is easy to see that a false negative error can allow falsified data to affect trajectory planning, which could lead to fatal consequences. Although less apparent, a false positive error can have consequences that are just as severe. Consider a situation where an actual event in the network (e.g., an unexpected braking from a downstream vehicle) has led to an abrupt change in the pattern of observed data. If the vehicle falsely detects such an unexpected change as a fault/attack and discards the information, it may lead to the CAV not reacting to such abrupt changes in the network appropriately and in a timely manner, creating dangerous, and potentially fatal, scenarios. In order to prevent this type of false positive error, it is necessary for vehicles to incorporate networklevel information in their anomaly detection scheme.
In addition to distinguishing between real changes in network conditions and anomalies, the anomaly detection methods should be able to identify the noise introduced by sensors and the communication channel, exacerbated by potential communication delay, as well as the missing values in the collected data. Moreover, due to resource constraints for each vehicle, the anomaly detection techniques in CAVs need to be lightweight, and implementable in realtime.
Anomalous sensor behavior could manifest itself in various forms and representations. Several faulty sensor behaviors are discussed in [32]. Petit et al. [27] summarize the taxonomy of intrusions or attacks on automated vehicles, among which the false injection attack is considered to be the most dangerous attack. In this paper, we consider five types of the anomalous sensor behavior resulting from both sensor faults and false injection attacks. We base this paper on the sensor failure and/or attack taxonomy provided by Sharma et al. [32] and Van et al. [37]:

Short: A single, sharp and abrupt change in the observed data between two successive sensor readings.

Noise: An increase in the variance of the sensor readings. Different from Short, the Noise anomaly type occurs across
multiple successive sensor readings. 
Bias: A temporarily constant offset from the sensor readings.

Gradual drift: A small and gradual drift in observed data during a time period. Over time, a gradual drift can result in a large discrepancy between the observed and the true state of the system.

Miss: Lack of available data during a time period.
In this paper, we do not explicitly account for ‘miss,’ which can result from DoS attacks preventing the exchange of information. However, note that ‘miss,’ depending on its duration, can be viewed as ‘instant’ or ‘bias’ behaviors, where the sensor reading is nonexistent instead of showing a wrong value. Hence, it can partially be addressed using the same methods for detecting ‘short’ or ‘bias’ behaviors. For examples of specific scenarios that could lead to these anomalies, refer to [25].
In order to successfully detect different types of anomalies, avoid falsely identifying unexpected changes in the network as anomalies, and mitigate the impact of random noise/missing values, we develop a novel and comprehensive framework that combines the adaptive extended Kalman Filter (AEKF) with a car following motion model, and employs a datadriven fault detector. Our framework is capable of accounting for delay in observing the environment, introduced by a congested communication channel and/or delayed sensor observation. Specifically, we use a carfollowing model to govern the motion of the vehicle in order to capture the interaction between the subject vehicle and its immediate leading vehicle. We demonstrate that the time delay incorporated in the motion model renders the traditional fault detector [5] not appropriate for anomaly detection, and propose and implement a One Class Support Vector Machine (OCSVM) model for anomaly detection [31] together with AEKF instead. We demonstrate the power of the proposed framework in detecting various types of anomalies.
Our main objective in this study is to detect sensor anomalies and to recover the corrupt signals by utilizing the surrounding vehicles’ information. To this end, the following assumptions are made:

Vehicles move according to a carfollowing model (i.e., under adaptive cruise control mode), have access to location and velocity of their leader (either through BSMs or using their onboard sensors), and are able to control over their own acceleration rates.

A known time delay (e.g., communication, sensing, and/or reaction delay) is applied to the input vector of the carfollowing model.
The rest of the paper is organized as follows: Section II provides a brief review of the existing related work in the field of anomaly detection in CAVs. Section III introduces the formulation of the problem and our method. In section IV we conduct a case study based on a wellknown carfollowing model. Finally, in section V, we conclude this paper.
Ii Related Work
Anomaly detection research has generated a substantial volume of literature over the past few years, as it is an important and challenging problem in many disciplines including but not limited to automotive systems [23, 24], wireless networks [29], and environmental engineering [13, 14]. Anomaly detection methods are used in a variety of applications including fault diagnosis, intrusion detection, and monitoring applications. In some cases if the source of an anomaly can be quickly identified, appropriate reconfiguration control actions can be made in order to avoid or minimize potential loss.
In the past few years, a variety of methods have been developed to detect anomalous behavior, and/or identify the source of anomaly [16, 15]. Examples of anomaly detection methods include observerbased methods [8, 40], parity relation methods [9, 12], and parameter estimation methods [3], etc. Among them, observerbased (quantitative modelbased) fault detection is a common fault detection approach, as discussed in [15]. Observerbased fault detection is based on the residual (or innovation) sequence obtained from using a mathematical model and (adaptive) thresholding. In this paper we study anomaly detection in CAVs using observerbased anomaly detection.
Anomalous sensor behavior in CAVs could result from both sensor failures or malicious cyber attacks. Sensor readings may be influenced by a variety of factors, leading to collection of faulty information [6, 30, 28]
. For example, environmental perturbations and sensor age may result in higher probabilities of failure. A short circuit, loose wire connection, or low battery supply are among other reasons that may cause inaccurate data reporting, including an unexpectedly high variation of sensor reading, or noise
[25].Additionally, malicious attacks may cause anomaly in sensor readings. CAVs have several internal and external cyber attack surfaces through which they can be accessed and compromised by illintended actors [27, 6, 18, 39, 41]. Petit and Shladover [27] showed that false injection of information and map database poisoning are two of the most dangerous potential attacks on CAVs. For example, the infrastructure (i.e., RSU) or a neighboring vehicles can transmit fake messages (e.g., WAVE Service Advertisement, BSM), which may in turn generate wrong, and potentially harmful, reactions (e.g., spurious braking), placing CAV occupants and other road users in lifethreatening situations. There are several existing studies that illustrate the vulnerability of CAV sensors, e.g., speed, acceleration and location sensors, to cyber attacks or faults. For invehicle speed and acceleration sensors, a false injection attack mentioned in [27] through the CAN bus or the onboard diagnostics (OBD) system could induce any of the four types of anomalies considered in this paper. As another example, Trippel et al. demonstrates that an acoustic injection attack could lead to anomalous sensor values for the invehicle acceleration sensor [36]. Lastly, for the location measurement from the GPS, both the operating environment of the vehicle and GPS spoofing/jamming attacks may result in anomalous sensor values [10]. Note that in this study we only consider false injection attacks whose manifestations can be described by four types of anomaly defined in Section I. As such, the study leaves out any types of attacks that do not impact sensor readings.
Despite the severe consequences of failing to detect sensor anomalies in CAVs, there is a scarcity of anomaly detection techniques in the ITS literature. Only a limited number of studies have focused on cyber security in CAVs, or more generally in ITS. In [26], Park et al. use graph theory based on a transient fault model to detect transient faults in CAVs. Christiansen et al. [7]
combine background subtraction and convolutional neural networks to detect anomalies/obstacles. Muter et al.
[23] and Marchetti et al. [20] use entropybased methods to detect anomalies (attacks) in invehicle networks. Faughnan et al. measure the discrepancy between redundant sensor readings to detect hijacking in unmanned aerial vehicles [10]. Van Wyk et al. [37] use a CNN  Kalman Filter  detector hybrid method to detect and identify sensor anomaly in a CAV system.In this paper, we focus on detection of anomalous sensor readings and recovery of the corrupt signals. We propose an observerbased anomaly detection method, which combines a wellknown filtering technique, namely, AEKF, to smooth the CAV sensor values, and a machine learning method, i.e., OCSVM, to learn the normal vehicle behavior, with the objective of detecting anomalous behavior. Specifically, we utilize a car following model to take into account the information from the leading vehicle, so as to better detect anomalies by reducing the false positive error rate. Additionally, to make our methodology robust to practical network conditions and improve its anomaly detection performance, we account for time delay in perceiving the environment, which could arise from communication delay or sensor observation delay.
One of the major differences of this paper with our past work is that in [37] we examine multiple sensor readings for each type of sensor at the same time by feeding multiple sensor readings into a CNN network. However, in this paper, for each type of sensor we rely on readings from a single sensor only and propose a novel anomaly filtering and detection technique accordingly. Another major difference is that this work takes into account the state of the leading vehicle when conducting anomaly detection for the subject vehicle. These two major differences make the two frameworks fundamentally different, and applicable to different scenarios. Finally, in this paper we have replaced the traditional detector, which was used in [37], with a OCSVM model. Our experiments show that by cooperating leading vehicle’s information and using OCSVM, we achieve a better detection performance compared to the traditional
detector. To the best of our knowledge, this is the first study that detects CAV sensor anomaly by utilizing leading vehicle’s information, i.e., by incorporating a carfollowing model into a continuous statespace model with time delay. Additionally, given the fact (demonstrated in the paper) that the model noise does not follow a Gaussian distribution, rendering the traditional
test inapplicable, we propose an OCSVM model in order to deal with the bias and abnormal distribution of innovation caused by time delay.Iii Methods
In this section, we first discuss how a carfollowing model with time delay can be used to describe the motion (also known as statetransition) model in AEKF. Next, we formulate a new continuous nonlinear statespace model with discrete measurement based on a carfollowing motion model. The continuous statetransition model represents the intrinsic nature of a vehicle’s response to the actions of its immediate downstream traffic, and the discrete measurement model represents the mechanics of sensor sampling, as is the case in practice. Based on the proposed statespace model, we propose an anomaly detection method, which combines AEKF and OCSVM. Also a traditional detector is discussed and its performance is compared with that of OCSVM.
Iiia CarFollowing Model with Time Delay
Consider the carfollowing model in [35]:
(1)  
where , , are respectively the acceleration, speed, and location of the th vehicle, to which we refer as the ‘subject vehicle’, and and are the distance gap and the speed difference between the subject vehicle and its leading vehicle, the th vehicle, respectively. Parameter
denotes time delay, also known as the ‘perceptionreaction time’, i.e., the period of time lapsed from the moment the leading vehicle performs an action, to the moment the subject vehicle executes an action in response. Function
is the stimulus function.in Equation (1) can be recast in the following form:
(2) 
We define a state vector in continuous time as:
(3) 
where and . Note that, without loss of generality, and can be extended to vector form to allow for incorporating historical location and speed observations, respectively, into the statespace model, when desired.
Recasting equation (2) as a function of produces a car following model that maps the state into an actionable decision for the subject vehicle:
(4) 
where is the input vector containing information received from the leading vehicle, and denotes the stimulus function describing velocity in a continuous sate space.
IiiB Continuous StateDiscrete Measurement StateSpace Model
We now define a statespace model with a continuous statetransition model and discrete measurements. Using previous definition of the state vector , the statetransition model satisfies the following differential equation:
(5)  
where .
When , the statespace model in equation (5) satisfies the Markovian property, allowing for applying AEKF. However, in practice, a variety of factors including time required for data processing and computations as well as delays in the communication network can cause to be nonzero. As such, in practice AEKF cannot be applied to equation (5), since the derivative of the state vector is determined by multiple previous state vectors.
In order to apply AEKF, we approximate equation (5) in the following way: We assume the acceleration of each vehicle is bounded within the interval , where and indicate the magnitude of the maximum deceleration and acceleration rates, respectively. Based on the assumption of bounded acceleration, we can obtain lower and upper bounds on the approximation of :
Then a delay differential equation (DDE), describing the delayed statetransition model, can be used to approximate equation (5):
(6)  
where is the acceleration of the th vehicle at time , and denotes the derivative of the continuoustime state space. Finally, we obtain a continuoustime statetransition model with discretetime measurement as the following:
(7)  
where is the measurement function, denotes sensor reading of the th vehicle, and are the process noise and the observation noise, respectively, which are assumed to be mutually independent, , and is the sampling time interval for sensors. Note that accounts for the error introduced by the approximation steps in equation (6).
IiiC Adaptive Extended Kalman Filter with Fault Detector
Extended Kalman Filter (EKF) is a wellestablished method used for timely and accurate estimation of the dynamic state of a nonlinear system [38]. One important issue that needs to be addressed in EKF is how to properly set up the covariance matrices of process noise (i.e., ) and measurement noise (i.e., ). The performance of EKF is highly affected by proper tuning of and [22], while in practice these parameters are usually unknown a priori. Therefore, we apply an adaptive extended Kalman filter (AEKF) to approximate these matrices.
An EKF is used to estimate state vector from sensor reading . Let and denote the state prediction and state covariance prediction at time , given the estimate at time , respectively. Note that for ease of notation, we omit subscript . Hence, considering the statespace model in equation (15), the EKF consists of the following 3 steps:
Step 0  Initialize: To initialize EKF, the mean values and covariance matrix of the states are set up at as the following:
(8)  
Step 1  Predict: The state and its covariance matrix at are projected one step forward in order to obtain the a priori estimates at time ,
Solve  (9)  
with  
where is the firstorder approximation of the Jacobian matrix of function .
Step 2  Update:
(10)  
where , is the covariance matrix of the process noise at time , is the covariance matrix of the measurement noise at time , and is innovation (i.e., the difference between the measurement and the prediction) at time .
Since in practice and are usually unknown, based on the work in [1] with slight modifications, we apply an AEKF to estimate these two matrices by using a moving estimation window of size , as follows:
(11)  
where is the residual at time , which is the difference between actual measurement and its estimated value using the information available at time , { are forgetting factors, and . Note that using a moving window, as we place more weight on previous estimates, less fluctuation of and will incur, and it takes longer for the model to capture changes in the system. Additionally, note that we replace in equation (9) with during the time interval .
One of the traditional fault detectors used in conjunction with Kalman filter is the detector [5, 2, 11]. Since AEKF is a special type of Kalman filter, the detector can be seamlessly applied to AEKF as well. Specifically, it constructs test statistics to determine whether the new measurement falls into the gate region with the probability determined by the gate threshold , as shown in the following:
(12) 
where is the predicted value of measurement at time . The test statistics for the fault detector is defined as
(13) 
For the test to provide meaningful results, the innovation should be zero mean Gaussian distributed with covariance . However, in reality, the innovation can follow a nonzero mean Gaussian distribution if there is bias in the background (e.g., due to nonzero mean process noise or imperfect model), as shown in figure 1. This figure displays a scatter plot of normalized innovation generated from the training dataset in our experiments, when there exits a time delay and in equation (6
) is not zero. Moreover, in practice the variance of the normalized innovation is not normally distributed when the noise does not follow a Gaussian distribution. In the 2dimensional case, the
detector defines a circular boundary with its center located at , i.e., the blue lines in figure 1, which corresponds to the thresholding boundary of the detector with . Therefore, in order to correctly detect anomalies, the detector requires the data to be zero mean and normally distributed. In such a scenario, the detector would not be a good detector since it will generate higher rates of false positives and false negatives. As such, the boundary should be shifted toward the true mean in order to achieve a ‘fair’ boundary on both sides.In the context of our problem, the approximation introduced in equation (6) can generate such a bias. This approximation assumes the value to be zero. As such, unless the acceleration and deceleration rates during the period sum to zero, or the time delay is equal to zero, the resulting bias would degrade the performance of the detector. Consequently, we propose a novel approach to use oneclass support vector machines (OCSVMs) [31] to adaptively learn the normal boundary of the innovation sequence. Specifically, we train several OCSVM models using normal (i.e., nonanomalous) sensor data with different parameter values (i.e., anomaly percentages). We use the trained OCSVM models for detecting anomalies in realtime.
IiiD One Class Support Vector Machine
Let us define normalized innovation at time as:
(14) 
Assume we have a training set , with data points, , sampled from a normal (i.e., nonanomalous) set. Let us define as a kernel mapping function . OCSVM solves the following quadratic program:
(15)  
where is a constant parameter in the range of
, denoting the false positive rate of the decision boundary that classifies normal and anomalous sensor readings. Decision variables
in model (15) define the most generalizable linear decision boundary in an infinitedimensional space (created by the Gaussian kernel) to determine a region in the input space that encompasses at least percentage of data points. Decision variables are slack variables introduced to penalize the degree of violation of the constraint .According to Proposition 3 in [31], parameter
provides an upper bound on the fraction of outliers in the training dataset, and asymptotically equals the fraction of outliers outofsample, with probability 1, under certain conditions. We train
different OCSVM models, each model with a parameter selected from the set with . For a measurement sequence with dimension , we compute the average of the normalized innovation sequence over a window of time intervals, up to time ,(16) 
where is the L1norm. For small values of , i.e., when the average of normalized innovation within the current time window is small, suggesting that AEKF performs well, we choose a trained OCSVM model with large value of to ensure that we would detect even small variations and mark them as outliers. Following the same line of logic, when is large, we choose an OCSVM model with small to avoid unnecessary dismissal of data points where we are not certain enough that a point is truly an outlier. To that end, in order to determine the parameter properly, we use the histogram of innovation values constructed from normal data to approximate the distribution of the innovation. Without loss of generality, we assume is zero mean Gaussian distributed. For the case when it is not zero mean, we first subtract the mean value, and add it back after determining the value of .
The tail of the probability density function (PDF) in figure
2 represents drastic changes in data, and the center of the PDF represents smooth changes. The histogram of is an approximation of the PDF of the normal training data. As shown in figure 2, parametercontrols the area of the shadow, and the PDF of the normal data is approximated by the histogram. We assume the absolute value of normal data as a random variable denoted as
with a certain distribution and domain . Then, given the number of training samples , the number of outliers can be computed from(17)  
where is the CDF of . As such, we have:
(18) 
Finally, for OCSVM models with parameters , when testing on a new data point, we have:
(19)  
In summary, assuming we have measurements, figure 3
summaries an implementation flowchart of the proposed algorithm, which combines AEKF and OCSVM to detect anomalies and recover the corrupt sensor readings. Specifically, at each time epoch, the following vehicle receives the measurements from both the leading vehicle and its own onboard sensors. The AEKF smooths the following vehicle’s speed and location signal based on the motion model. Meanwhile the AEKF generates the innovation, which measures the discrepancy between the measurements and the prediction, and sends the innovation to the fault detector model for anomaly detection. The fault detector model consists of several OCSVM models and it can dynamically choose which one to use based on the average innovation. If there is no sensor anomaly detected, the innovation will be combined with the measurement at current time in order to generate an estimation. Otherwise, we do not trust the current sensor measurement and replace the estimation with the prediction, which will be used in the next time epoch.
IiiE Anomaly Model
The dataset for this study is generated by randomly adding anomalies to normal trajectory data, since there is no publicly available dataset on CAV trajectories that includes anomalies in sensor measurements. Specifically, we account for the four major anomaly types including short, noise, bias, and gradual drift. The detailed construction of the anomalies are as follows:

Short: The short anomaly type is simulated as a random Gaussian variable with mean and variance of 0 and , respectively.

Noise: The noise anomaly type is simulated as a sequence of i.i.d. random Gaussian variables with length of , mean of 0, and variance of .

Bias: The bias anomaly type is simulated as adding a temporarily offset to the observation. We simulate the magnitude of the anomaly as Gaussian distributed with mean of 0 and variance of . The duration of the sequence of bias anomaly is .

Gradual Drift: The gradual drift anomaly type is simulated by adding a linearly increasing/decreasing set of values to the base values of the sensors. Specifically, first we use a vector of linearly increasing values from 0 to , where
is a uniformly distributed random variable in range of
. The duration of the sequence is . We then use a Bernoulli random variable with probability to generate one of the two outcomes of 1 and 1, by which we scale the sequence to generate increasing or decreasing drift, respectively.
Here is the parameter of distribution for each type of anomaly.
We inject all four types of anomalies into each sensor reading. We assume the onset of anomalous values in sensors occur independently. That is, we do not explicitly train OCSVM on a dataset containing interdependent sensor failures or systemic cyber attacks on vehicle sensors. However, this assumption does not preclude scenarios under which multiple sensors are under simultaneous attack.
Similar to our previous work [37], we generate various datasets for our experiments with anomaly rate of . In addition, we simulate anomalies to start at randomly selected times, lasting for random durations (if it applies to the anomaly type), affecting randomly selected sensors. These anomalies are then used to adjust the corresponding sensors’ normal readings in the original dataset, which indicate the traveling location and speed of the CAV, making them anomalous. The pseudo code describing random generation of anomalies is presented in Algorithm 1. In the algorithm, denotes a discrete uniform distribution among integer numbers from 1 to . Note that each anomaly type is equally likely to get selected when an anomaly is generated.
Iv Case Study Based On the Intelligent Driver Model
In this section we use a wellknown carfollowing model, namely the Intelligent Driver Model (IDM), proposed by Treiber et al. [34], to compare the anomaly detection performance of the traditional detector and the OCSVM model. As mentioned in [35], since the IDM has no explicit reaction time and its driving behavior is given in terms of a continuously differentiable acceleration function, it describes more closely the characteristics of semiautomated driving by adaptive cruise control (ACC) than that of a human driver. However, it can easily be extended to capture the communication delay as described in previous section. We also evaluate the impact of using the IDM motion model on the anomaly detection performance. In order to evaluate system performance, we assume that the input vector containing the leading vehicle’s information is not anomalous.
Using the definition of state and input in the previous section, the IDM model with time delay can be described as the following:
with
where and are model parameters. The state vector and the input vector both have dimension of 2. For detailed information on IDM refer to [34]. Following the typical parameter values of city traffic used in [35], we set the parameter values in our study as follows: , , and define the measurement function as:
The data for this study is obtained from the research data exchange (RDE) database constructed as part of the Safety Pilot Model Deployment (SPMD) program [4]
funded by the US department of Transportation, and collected in Michigan. This program was conducted with the primary objective of demonstrating CAVs, with the emphasis on implementing and testing V2V and V2I communications technologies in realworld conditions. The program recorded detailed and highfrequency (10 Hz) data for more than 2,500 vehicles over a period of two years. The data features extracted from the SPMD dataset used in this study include the invehicle speed for one of the test vehicles with a trip length of 400 seconds (4000 samples) for training data, and 200 seconds (2000 samples) for testing data. As mentioned in section I, we assume that the vehicles are in ACC mode according to a carfollowing model, i.e. the IDM model. Therefore, as shown in figure
4, we use the extracted speed data as the leading vehicle’s speed , and generate its location and the following vehicle’s state ( and ) as the baseline based on the following rules:(20)  
where is a random term that describes the uncertainty of the following vehicle’s state. In our study we generate based on a uniformly distributed random variable within the range
. Furthermore, we add Gaussian white noise with variance 0.02 to the leading vehicle’s baseline data. Since we want to test the detection performance, the noise variance should be smaller than the anomaly variance so that it would not be overpowered by the white noise. Note that adding white noise to the leading vehicle’s baseline data is equivalent to using a Gaussian distributed random time delay factor of
with mean .To demonstrate the importance of incorporating the leading vehicle’s information into the following vehicle’s anomaly detection procedure, we implement our framework once using the IDM car following model, and once without it, using the statespace model expressed in the following:
(21)  
where the process noise accounts for the introduced error.
To measure the effectiveness of our two main contributions, i.e., incorporating a car following model into the AEKF motion model and using a OCSVM fault detector, we conduct sensitivity analysis over the motion model (i.e., with and without the IDM motion model), the anomaly detection methodology (i.e., the detector and OCSVM), and the time delay (i.e., ). To evaluate the impact of changing models/parameters, we compute the Area Under the Curve (AUC) for each receiver operating characteristic (ROC) curve. The ROC curve is a graphical plot tool to illustrate the diagnostic ability of a binary classifier as its discrimination threshold is varied, and is created by plotting the true positive rate (sensitivity) against the false positive rate ( specificity) at various threshold settings. More specifically, we change the values of in our detector, and the vector for the OCSVM. Note that the vector is a threedimensional vector as we train and utilize four different OCSVM models.
The experiments are separately implemented into three scenarios, where scenario 1 contains a detector without the IDM motion model, scenario 2 contains the detector with the IDM model, and scenario 3 contains OCSVM with IDM model. Each scenario is implemented under three experiment settings generated by varying the value of the anomaly parameter . More specifically, values of , , and are used for settings 1, 2 and 3, respectively. This suggests anomalous readings become more subtle, and generally more difficult to detect, from setting 1 to setting 3. Lastly, the maximum duration of anomaly, , is set to 20 for each setting.
Tables IIII present the AUC values of the three scenarios in our three experiment settings, with time delays of , 0.5, and 1.5 seconds, respectively. The experiments indicate that the IDM observerbased fault detection method provides significant improvement (up to ) compared with the performance of AEKF without the IDM model, regardless of the value of time delay. Additionally, we can see that OCSVM consistently achieves a better fault detection performance than the detector. Results also indicate that there is a degeneracy of performance for each method as the parameter becomes smaller. This observation is in line with intuition, since smaller makes the anomaly more subtle and therefore harder to detect. Additionally, the trends of AUC values indicate that as we increase the time delay, the overall detection performance systemically deteriorates. This suggests that the time delay of the carfollowing model may have a negative impact on the detection performance.
without IDM  with IDM  OCSVM with IDM  

0.9059  0.9723  0.9806  
0.7764  0.9453  0.9470  
0.7294  0.9228  0.9357 
without IDM  with IDM  OCSVM with IDM  

0.9024  0.9703  0.9793  
0.7637  0.9402  0.9452  
0.7258  0.9118  0.9260 
without IDM  with IDM  OCSVM with IDM  

0.8939  0.9701  0.9782  
0.7681  0.9201  0.9294  
0.7208  0.8875  0.8940 
V Conclusion
This paper proposes an anomaly detection method to protect CAVs against anomalous sensor readings and/or malicious cyber attacks. We use an adaptive extended Kalman filter, informed by not only the vehicle’s onboard sensors but also the leading vehicle’s trajectory, in order to detect anomalous information. The wellknown IDM car following model is used to incorporate the leading vehicle’s information into AEKF. Lastly, to improve the anomaly detection performance, and given the fact that using AEKF the innovation is not normallydistributed, we replace the traditionally used detector with an OCSVM model. We quantify the effect of these contributions in isolation, as well as in a combined model, by conducting experiments under three scenarios: () detector without the IDM model, () detector with the IDM model, and (), OCSVM with the IDM model. Results show that the AEKF enhanced with OCSVM and the IDM model outperforms the traditional detectorbased anomaly detection used in conjunction with AEKF. Furthermore, our results indicate that a modelbased anomaly detection method that can incorporate the status of the lead vehicle can further improve the detection performance. More specifically, by utilizing the leading vehicle’s information to inform the AEKF and using OCSVM for anomaly detection, the proposed method can not only effectively filter out the sensor noise in CAVs, but also detect the anomalous sensor values in realtime with better performance than that without utilizing leading vehicle’s information. This high performance is showcased by high AUC values in our experiments. Moreover, we study the general relationship between the delay in receiving information and the performance of anomaly detection. We show that as the time delay of signal transmission (i.e., communication channel delay or sensor delay) becomes larger, the overall detection performance deteriorates.
The current study can be improved/expanded in multiple ways. First, the following vehicle’s state and anomalous sensor values used in section IV are simulated, due to the paucity of ACC datasets with anomalies for CAVs. There exist multiple carfollowing datasets, but most of them were collected from human drivers. Although our study mainly focuses on the detection performance of our proposed method, it may be beneficial to directly collect ACC data from CAVs and calibrate the carfollowing model based on a real dataset, since the potential discrepancy between the car following model and the true traveling behaviour of the vehicle may introduce new challenges. Second, in section IV, we assume the input vector containing leading vehicle’s information is not anomalous. However, our proposed method can still detect anomalies without such an assumption, i.e., as long as the discrepancy between input vector and measurement are large enough. Note that we also assume that anomalous input vector are caused either by sensor failures or false injection attacks that can be described by four types of anomaly. Third, in this study for each vehicle we only utilize a single leading vehicle’s information, whereas a connected vehicle can benefit from information shared by any number of connected vehicles within its communication range as well as the infrastructure. In our future work, we plan to study the impact of incorporating multiple sources of information (e.g., multiple vehicles or RSUs) on the overall anomaly detection performance. Furthermore, we plan to expand our work to identify the source of anomaly after detection.
References
 [1] (2017) Adaptive adjustment of noise covariance in kalman filter for dynamic state estimation. In Power & Energy Society General Meeting, 2017 IEEE, pp. 1–5. Cited by: §IIIC.
 [2] (1995) Multitargetmultisensor tracking: principles and techniques. Vol. 19, YBs Storrs, CT. Cited by: §IIIC.
 [3] (1979) Parameter identification and discriminant analysis for jet engine machanical state diagnosis. In Decision and Control including the Symposium on Adaptive Processes, 1979 18th IEEE Conference on, Vol. 2, pp. 648–650. Cited by: §II.
 [4] (2014) Safety pilot model deployment: test conductor team report. Report No. DOT HS 812, pp. 171. Cited by: §IV.
 [5] (1987) A chisquare test for faultdetection in kalman filters. IEEE Transactions on Automatic Control 32 (6), pp. 552–554. Cited by: §I, §IIIC.
 [6] (2011) Comprehensive experimental analyses of automotive attack surfaces.. In USENIX Security Symposium, pp. 77–92. Cited by: §II, §II.

[7]
(2016)
DeepAnomaly: combining background subtraction and deep learning for detecting obstacles and anomalies in an agricultural field
. Sensors 16 (11), pp. 1904. Cited by: §II.  [8] (1975) Detecting instrument malfunctions in control systems. IEEE Transactions on Aerospace and Electronic Systems (4), pp. 465–473. Cited by: §II.
 [9] (1977) F8 dfbw sensor failure identification using analytic redundancy. IEEE Transactions on Automatic Control 22 (5), pp. 795–803. Cited by: §II.
 [10] (2013) Risk analysis of unmanned aerial vehicle hijacking and methods of its detection. In Systems and Information Engineering Design Symposium (SIEDS), 2013 IEEE, pp. 145–150. Cited by: §II, §II.
 [11] (2008) Adaptive estimation of multiple fading factors in kalman filter for navigation applications. Gps Solutions 12 (4), pp. 273–279. Cited by: §IIIC.
 [12] (1997) Fault detection and isolation using parity relations. Control engineering practice 5 (5), pp. 653–661. Cited by: §II.
 [13] (2007) Realtime bayesian anomaly detection for environmental sensor data. In Proceedings of the CongressInternational Association for Hydraulic Research, Vol. 32, pp. 503. Cited by: §II.
 [14] (2010) Anomaly detection in streaming environmental sensor data: a datadriven modeling approach. Environmental Modelling & Software 25 (9), pp. 1014–1022. Cited by: §II.
 [15] (2010) A survey of fault detection, isolation, and reconfiguration methods. IEEE transactions on control systems technology 18 (3), pp. 636–653. Cited by: §II.
 [16] (1984) Process fault detection based on modeling and estimation methods—a survey. automatica 20 (4), pp. 387–404. Cited by: §II.
 [17] (2011) Dedicated shortrange communications (dsrc) standards in the united states. Proceedings of the IEEE 99 (7), pp. 1162–1182. Cited by: §I.
 [18] (2010) Experimental security analysis of a modern automobile. In Security and Privacy (SP), 2010 IEEE Symposium on, pp. 447–462. Cited by: §II.
 [19] (2017) Autonomous vehicle implementation predictions. Victoria Transport Policy Institute Victoria, Canada. Cited by: §I.
 [20] (2016) Evaluation of anomaly detection for invehicle networks through informationtheoretic algorithms. In Research and Technologies for Society and Industry Leveraging a better tomorrow (RTSI), 2016 IEEE 2nd International Forum on, pp. 1–6. Cited by: §II.
 [21] (2014) Road vehicle automation. Springer. Cited by: §I.
 [22] (1999) Adaptive kalman filtering for ins/gps. Journal of geodesy 73 (4), pp. 193–203. Cited by: §IIIC.
 [23] (2011) Entropybased anomaly detection for invehicle networks. In Intelligent Vehicles Symposium (IV), 2011 IEEE, pp. 1110–1115. Cited by: §II, §II.
 [24] (2010) A structured approach to anomaly detection for invehicle networks. In Information Assurance and Security (IAS), 2010 Sixth International Conference on, pp. 92–98. Cited by: §II.
 [25] (2009) Sensor network data fault types. ACM Transactions on Sensor Networks (TOSN) 5 (3), pp. 25. Cited by: §I, §II.
 [26] (2015) Sensor attack detection in the presence of transient faults. In Proceedings of the ACM/IEEE Sixth International Conference on CyberPhysical Systems, pp. 1–10. Cited by: §II.
 [27] (2015) Potential cyberattacks on automated vehicles.. IEEE Trans. Intelligent Transportation Systems 16 (2), pp. 546–556. Cited by: §I, §II.
 [28] (2017) Intelligent vehicle embedded sensors fault detection and isolation using analytical redundancy and nonlinear transformations. Journal of Control Science and Engineering 2017. Cited by: §II.
 [29] (2008) Anomaly detection in wireless sensor networks. IEEE Wireless Communications 15 (4). Cited by: §II.
 [30] (2015) Sensor fault detection and diagnosis for autonomous vehicles. In MATEC Web of Conferences, Vol. 30, pp. 04003. Cited by: §II.
 [31] (2001) Estimating the support of a highdimensional distribution. Neural computation 13 (7), pp. 1443–1471. Cited by: §I, §IIIC, §IIID.
 [32] (2010) Sensor faults: detection methods and prevalence in realworld datasets. ACM Transactions on Sensor Networks (TOSN) 6 (3), pp. 23. Cited by: §I.
 [33] (2018) Connected and automated vehicle systems: introduction and overview. Journal of Intelligent Transportation Systems 22 (3), pp. 190–200. Cited by: §I.
 [34] (2000) Congested traffic states in empirical observations and microscopic simulations. Physical review E 62 (2), pp. 1805. Cited by: §IV, §IV.
 [35] (2014) Traffic flow dynamics: data, models and simulation. Physics Today 67 (3), pp. 54. Cited by: §IIIA, §IV, §IV.
 [36] (2017) WALNUT: waging doubt on the integrity of mems accelerometers with acoustic injection attacks. In Security and Privacy (EuroS&P), 2017 IEEE European Symposium on, pp. 3–18. Cited by: §II.
 [37] (2019) Realtime sensor anomaly detection and identification in automated vehicles. IEEE Transactions on Intelligent Transportation Systems. Cited by: §I, §II, §II, §IIIE.
 [38] (2006) Sigmapoint filters: an overview with applications to integrated navigation and vision assisted control. In Nonlinear Statistical Signal Processing Workshop, 2006 IEEE, pp. 201–202. Cited by: §IIIC.
 [39] (2015) An overview of automotive cybersecurity: challenges and solution approaches.. In TrustED@ CCS, pp. 53. Cited by: §II.
 [40] (1987) Sensor fault detection via robust observers. In System fault diagnostics, reliability and related knowledgebased approaches, pp. 147–160. Cited by: §II.
 [41] (2016) Can you trust autonomous vehicles: contactless attacks against sensors of selfdriving vehicle. DEF CON 24. Cited by: §II.