Multi-IF : An Approach to Anomaly Detection in Self-Driving Systems

04/27/2020 ∙ by Kun Cheng, et al. ∙ 0

Autonomous driving vehicles (ADVs) are implemented with rich software functions and equipped with many sensors, which in turn brings broad attack surface. Moreover, the execution environment of ADVs is often open and complex. Hence, ADVs are always at risk of safety and security threats. This paper proposes a fast method called Multi-IF, using multiple invocation features of system calls to detect anomalies in self-driving systems. Since self-driving functions take most of the computation resources and upgrade frequently, Multi-IF is designed to work under such resource constraints and support frequent updates. Given the collected sequences of system calls, the combination of different syntax patterns is used to analyze and construct feature vectors of those sequences. By taking the feature vectors as inputs, one-class support vector machine is adopted to determine whether the current sequence of system calls is abnormal, which is trained with the feature vectors from the normal sequences. The evaluations on both simulated and real data prove that the proposed method is effective in identifying the abnormal behavior after minutes of feature extraction and training. Further comparisons with the existing methods on the ADFA-LD data set also validate that the proposed approach achieves a higher accuracy with less time overhead.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 18

page 19

page 20

page 23

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Automobiles have been smarter than ever, resulting in the application of autonomous driving vehicles (ADVs). Current ADVs are equipped with many intelligent devices, such as dozens of electrical control units (ECUs), a variety of sensors, and powerful computing platforms.

However, with the higher degree of autonomy, safety and security problems are escalated due to the complex software and the increased exposure of functionality to the adversaries. On one hand, potential software bugs may lead to runtime errors, which put pedestrians, passengers, and vehicles at risk. According to the open Autonomous Vehicle Disengagement Reports Department of Motor Vehicles, California [2017], functional errors and system failures are the main causes of disengagement handling. On the other hand, the broad attack surface makes intrusion possible. For example, the vehicular networks, such as the vehicle to everything networks (V2X), Bluetooth, Wi-Fi hot spots, cellular networks, keyless entry systems, etc., make it possible for attackers to breach into the ADV system. Back in 2015, researchers hacked a running Jeep Cherokee by embedding a trojan program into one of its ECUs Greenberg [2015]. In 2016, another group of researchers intruded a vehicle through a malicious application installed in the Android-based car-play system Cho and Shin [2016], which could directly take control of the running car by fabricating control messages over the vehicular bus. Keen Security Lab at Tencent successfully intruded a Tesla Model S remotely by exploiting a vulnerability of the embedded web browser in the central information display in 2016 Liu et al. [2017]. And they breached into a BMW i3 electric vehicle by compromising the telematics unit in 2018 Keen Security Lab of Tencent [2018].

Anomaly detection becomes a vital task in an ADV to guarantee its safe motion. Several mitigation measures have been proposed for some specific attacks, such as GPS spoofing and in-car communication fabrication. However, as one of the most critical parts of an ADV, there is little work on the mitigation of the self-driving software system. The self-driving software system (often referred to as self-driving systems) of an ADV conducts the software control logic of the vehicle, which performs most of the tasks during the motion, such as sensing the surrounding environment, planning the route, controlling the trajectory, etc. More importantly, it is the only one generating control inputs to the ADV’s actuators/ECUs. Thus, anomaly detection, as well as related protection, becomes more crucial and urgent to secure the self-driving system.

However, compared with general software systems, the anomaly detection for self-driving systems remains a challenging task for the following reasons.

  • Large-scale and complex software architecture. Self-driving systems are usually massive, e.g., the open-source project

    Autoware111http://github.com/CPFL/Autoware contains over 270k lines of code, and Baidu Apollo222http://github.com/ApolloAuto/apollo has over 240k lines in its main functional modules currently. Together with ECUs and other systems (e.g. car-play system), the software scale of an ADV becomes tremendous. The complexity of the software is also inherent to the massive functions of ADVs. To guarantee safe driver-less motion, a self-driving system needs to perform many tasks, such as object detection and tracking, localization, motion planning, and data fusion, each of which is a complicated task. Their integration makes the whole system more complex. Moreover, the self-driving system needs to continuously interact with actuators and the environment.

  • Complex and open environment. Usually, the execution environment of an ADV, which is also the input space of the self-driving system, is with higher complexity. On one hand, the environment is open and contains many unsafe factors, such as obstacles. On the other hand, the environment is dynamic and partially unknown due to the existence of dynamic surrounding objects, such as other vehicles and pedestrians. It is often challenging to accurately predict and track their occurrences/motion.

  • Non-deterministic behavior. In a self-driving system, different (machine) learning algorithms/models are applied to complete some tasks, such as object detection. Given the same data, different models may be obtained after the training phase. The outputs of those models are challenging to predict, and the related components may react differently to the same input at different time instances. Besides, due to the limitation of mechanisms’ accuracy, a vehicle may perform the result of a self-driving system with some tolerant disturbance. Hence, the motion of a vehicle, controlled by the self-driving system, may not be repeated exactly under the same (road) environment. Due to the non-determinism, the validation of system behaviors is also challenging (especially for the logic-/specification-based detection), as it is difficult to determine the normal disturbance of outputs between the vehicle and validation system, or the real output and specification.

In this paper, we propose a fast and “inexpensive” approach called Multi-IF, which utilizes multiple invocation f

eatures of system calls (syscalls) to detect anomalies in self-driving systems. First, the sequence of syscalls is used to model the program behavior. Syscall invocation is the essential programmatic way for user programs to directly request any privileged operations from the operating system kernel. It has long been used in host-based intrusion detection systems, program profiling, malware analysis, etc. Second, the syntax feature of syscall traces is extracted and analyzed to build an SVM based classifier. Anomaly detection is essentially a classification problem. The classifier should ideally learn from the normal behavior to set up the baseline, by which a pending trace can be examined whether it is compliant or not. Thus, the one-class SVM is chosen due to its wide application in solving such issues. At last, experiments are conducted on the real data from both our self-driving car and ADFA-LD 

Creech [2013] data set. The evaluation shows that our approach achieves a high detection accuracy with minimal false alerts on the monitoring of self-driving functions. Comparisons with existing works on the ADFA-LD data set proves that such an approach can achieve high detection accuracy and efficiency, and reduces the time overhead.

Multi-IF is inexpensive as 1) it combines cheap syntax features from testing data to achieve a more accurate detection engine; 2) the training is faster as it adopts an SVM-based solution (will be described in Section 4.3). Note that with an inexpensive approach, it is necessary to not only work under the limitation of the onboard computer since most computation capability and resource should be allocated to perform basic functionalities for safe motion, but also meet the requirement of rapid system/program update (e.g., Tesla Autopilot updates up to hundreds of times for a build in one year 34), which also demands the detection engine to be upgraded quickly.

The contribution of this paper includes:

  • We propose a syscall based method to model self-driving systems and prove it is effective, instead of looking into the complex software architecture and environment.

  • By extracting multiple syntax features from syscall sequences, we show they can be used in a one-class SVM based classification to perform anomaly detection in self-driving systems.

  • The proposed approach achieves better performance and requires low training costs than existing similar methods on the ADFA-LD data set.

The rest of this paper is organized as follows. Section 2 states the related work. Section 3 briefs preliminaries as well as the motivation of this work. Section 4 presents the detail of our approach. Section 5 shows the evaluation results. Finally, we conclude this paper in Section 6. For the review purpose, all code and data could be found in https://bitbucket.org/chengkunbuaa/avdetection_code/.

2 Related Work

Attacks and protection in cyber-physical systems. Mitchell et al Mitchell and Chen [2014] proposed an adaptive specification-based intrusion detection system (IDS) to detect malicious unmanned aerial vehicles (UAVs) in an airborne system. Such an IDS monitored the output of embedded sensors and actuators, then defined behavior rules from their threat model. Vuong et al Vuong et al. [2015]

developed a decision tree-based method to detect cyber attacks on a small robotic vehicle and tested it with both cyber and physical attacks. The conclusion was that adding physical features could help improve detection accuracy. Moosbrugger

et al Moosbrugger et al. [2017] performed runtime monitoring on UAVs to detect threats such as the denial of service (DoS) or GPS spoofing by monitoring commands, signals, software behaviors, and so on. Choi et al Choi et al. [2018] proposed to use control invariants to detect external physical attacks by using a proportional-integral-derivative (PID) controller to examine whether the runtime behavior matches the controller. However, those solutions require either pre-defined errors (e.g., decision tree solution) or detailed analysis of source code and binary executable files (e.g., control invariant modeling), which hinders the application in large systems. Besides, F. Guo  et al Guo et al. [2019] used sensor data consistency and frequency to detect abnormal execution in autonomous driving networks. K. Zhu et al Zhu et al. [2019]

proposed an anomaly detection approach based on the long short-term memory (LSTM) network to check any CAN bus traffic from the time and data dimension, which resulted in a satisfactory accuracy.

Moreover, a variety of methods have been proposed to counter attacks against sensors or communication in vehicles. For example, Park et al Park et al. [2015] used pairwise inconsistencies between sensors to detect transient attacks or faults for GPS receivers. Kar et al Kar et al. [2014] proposed an automated detection and vehicle identification system to mitigate GPS interference on the vehicle tracking system. However, neither method was applicable as either multiple sensors or specific roadside units were required. Cho and Shin Cho and Shin [2016] revealed a new type of DoS attack, which was caused by the vulnerability in in-vehicle networks and could disable ECUs via the error-handling mechanism. However, the detection and mitigation required accurate time synchronization. Bouard et al Bouard et al. [2013] proposed a decentralized information control approach to enhancing the security and privacy of in-car communication based on the deployed authentication framework for each ECU. Woo et al Woo et al. [2015] proposed an encryption and authentication protocol to protect the CAN bus. However, the performance overhead of authentication in Bouard et al. [2013], Woo et al. [2015] limits their practical adoption. Recently, Steger et al proposed a framework for secure and dependable wireless software update on ECUs Steger et al. [2018], which utilized IEEE 802.11s mesh network and deployed a cryptography solution. However, no internal computation platform security concerns such as system intrusion were considered.

Syscall-based anomaly detection. Syscalls have long been used with anomaly detection or signature-based detection in host-based intrusion detection systems Forrest et al. [2008]. Anomaly detection often suffers from high the false-positive rate since it is difficult to establish a perfect baseline. That is because the software execution is highly dynamic, and the complexity of modern computer systems makes it even harder to gather and process all normal execution data. However, anomaly detection still plays an important role in defense as it assumes no prior knowledge of potential attacks, which greatly differs itself from signature-based detection. Signature-based detection usually offers a low false alarm rate and high accuracy for the attacks that match the pre-collected data templates, but it cannot deal with unseen attacks. Moreover, it relies on the accurate signature gathered from the attack evidence, which increases the application difficulty in a new system.

For syscall based intrusion detection, language modeling techniques are widely used Forrest et al. [2008], Bridges et al. [2019]. Among the recent anomaly detection works, Marteau Marteau [2019] defined the covering similarity to measure a testing symbolic trace against a bunch of (normal) traces, which was used to identify the abnormal sequences. However, since testing on ADFA-LD reached the best result when all 833 training traces and an additional 1000 traces from the validation set were used, such an approach acquired a rich feature set. Besides, Marteau [2019] tended to extract and build the optimal covering sets using all subsequences in the evaluation (though optimized algorithms were proposed). By contrast, our approach reduces the pattern and feature sets by adopting the proposed 3-step method in Section 4.3, which cuts down the storage and training overhead. Khreich et al Khreich et al. [2018]

combined different detection methods, namely the Sequence Time-Delay Embedding (STIDE), Hidden Markov Model (HMM), and one-class SVM to improve the accuracy. Although computing the combination of different detectors was fast, training all detectors was computing-intensive and time-consuming. Creech

et al Creech and Hu [2014]

used context-free semantic features in syscall traces, together with an extreme learning machine, to build a neural network based classifier. They achieved almost the perfect performance on the KDD98 data set, but the approach was computational heavy, which took weeks to extract the semantic features and days for training. With the semantic features and SVM, they also achieved a good result on the ADFA-LD data set. Xie

et al Xie et al. [2014], Xie and Hu [2013]

used syntax features, such as the length of a syscall trace and the relative frequency of individual calls, and the k-NN and k-mean clustering models to achieve acceptable results. In another work 

Xie et al. [2014], they used short sequences and frequencies to train a one-class SVM classifier, which improved the accuracy. Haider et al Haider et al. [2015] used four statistical features in a trace, i.e., the least/most repeated and the minimum/maximum values, to detect attacks. Three learning algorithms were applied to improve the performance over Xie’s works while achieving a fast training, including SVM with linear kernels, SVM with radial basis kernels and k-NN. Huorong et al. Ren et al. [2017] segmented sequence data by a sliding window to build a dynamic Markov model to analyze their simulated data set and an airport traffic data set, which improved the adaptability and stability when compared with the classical Markov approaches. On the UNM data set, Hoang et al Hoang et al. [2009] proposed to use the Hidden Markov model to examine normal syscall sequences and generate four pattern sets. Khreich et al Khreich et al. [2017]

used various n-grams and their frequencies as features, which was exactly the complete large pattern set (all L-

clusters) extracted in Section 4.2

. However, it would lead to enormous feature vector space (e.g. there were 142,190 features when N=6). And as the average Euclidean distance was used to determine the similarity of a testing trace to all normal sequences, it would involve a large amount of computation. Fuzzy rules are applied to check whether the tested trace is normal or not by considering the produced probability and pattern frequencies. Results showed they reduced the false alarm rate by almost a half. Although the above studies have made remarkable contributions to the host-based intrusion detection system (HIDS), they are either time/resource-consuming or less accurate, which are too imperfect to be used in ADVs. Thus, a faster and more accurate detection method is required for the current defense system for ADV.

3 Preliminaries and Motivation

3.1 The Self-Protection Framework for ADVs

Robot Operating System (ROS), built on the top of the Linux kernel, is an open-source and flexible framework for developing robot control systems, which is prevailing in robotics. Currently, we have been working on a self-protection framework with great flexibility and extensibility for ROS-based self-driving systems Cheng et al. [2020]. In this framework, hardware-assisted virtualization was used to isolate different software components of a self-driving system. Each isolated software component, referred to as a partition, is equipped with a self-protection subsystem to inspect the partition execution and plan mitigation measures.

As an important part of the efforts made to secure the cyber world of autonomous driving platforms. The anomaly detection is designed to handle those possible threats from both inside and outside, such as malicious intrusion (e.g., a compromised partition image during cloud update) or runtime faults in a partition (may lead to system failure/malfunctioning). Together with the efforts made in securing the physical world (the motion-based detection), we hope to build a complete model to better explain the overall status of the vehicle, and to locate the possible root cause of any detected abnormal functioning of the whole CPS system.

To secure the cyber world, anomaly detection becomes vital. The main steps to detect abnormal execution are selecting proper information to monitor and proposing a proper analysis method to check whether the monitored information is correct. Due to the complexity of architecture and program/code logic in self-driving systems, monitoring and analyzing the execution of each partition directly may be challenging. Critical execution paths may serve the monitoring well, but it is also important to remain stealthy and do not alert the intruder. Thus, code instrument solutions are not suitable for our purpose. Inspired by the virtual machine introspection (VMI) based techniques Payne et al. [2007], Lengyel et al. [2014], the sequence of syscalls invoked by a partition should be a good choice to develop new methods to extend and improve the current MAPE loop.

3.2 Syscall and Anomaly Detection

Modern Linux kernel provides over 300 syscalls 22, which are used by running processes to interact with the operating system kernel. On one hand, the usage of syscalls is determined by the program source code and the libraries it relies on; on the other hand, the invocation sequence of syscalls is highly dynamic and determined by the program logic and input data. Hence, syscall sequences are often used for (host-based) intrusion detection in systems Bridges et al. [2019], under the assumption that only running processes can harm the system and any damage can happen only through privileged operations. Language models are prevalent for syscall-based intrusion detection, where sequential features, such as n-grams, are often used to build various detection engines.

Anomaly detection plays an important role in defense as it does not assume the prior knowledge of potential attacks. It usually works by first establishing a baseline for the behavior of the protected targets. Ideally, if such a baseline is sufficiently accurate, any anomaly will be identified as a real threat. But establishing such a perfect baseline is difficult and sometimes impractical, since 1) the software execution environment is highly open, dynamic, and complex, and 2) the complexity of modern software systems makes it even harder to gather and process all normal execution data. It is often required to upgrade the detection engine regularly with more evidence (training data) either online (e.g. semi-supervised learning) or offline (e.g. retraining). However, the online upgrade is challenging as it is difficult to distinguish a rare-seen normal trace from real threats. Thus, we try to accelerate the offline upgrade by reducing the training overhead, which is also what “inexpensive” here stands for.

3.3 Support Vector Machine

Support Vector Machine (SVM) is a supervised machine learning model for classification and regression. The basic idea of SVM for linear classification is to determine optimal hyperplanes to maximize the margin between any different groups of data. In case they cannot be separated linearly, SVM uses kernel functions to map the data into a higher-dimensional space such that they can be separated in the new space. One-class SVM is essentially a regular binary-class SVM where all training data belongs to the same class. There are two popular kinds of one-class SVMs: one-class SVM according to Schölkopf, where the boundary is a hyperplane

Schölkopf et al. [2000], and another one according to Tax and Duin, where the boundary is a sphere Tax and Duin [2004]. One-class SVM is popular for anomaly detection as we conclude in Section 2.

In a self-driving platform such as mentioned in Sec. 3.1, GPU is exclusively used by the object detection function (in our platform it is done via device passthrough). Moreover, it is tedious to maintain and update traditional intrusion detection systems installed on every host, which requires approaches that can reduce the time overhead while keeping the acceptable detection accuracy Liu et al. [2019]. As it is critical to develop a lightweight detection engine that would not potentially consume too many resources or introduce too much overhead, SVM-based approaches are preferred in this work.

4 Methodology

The detection goal here is to identify the abnormal execution trace from the normal ones. As described in Creech and Hu [2014], given the set of valid subsequences, called patterns, extracted from normal traces, the occurrence of these subsequences in a new normal trace is significantly greater than those in an abnormal one. Hence, the appearance of syscall patterns in a pending trace is vital to detect anomalies. There are two main steps for such pattern-based detection. The first one is pattern extraction. In this work, the subsequences of contiguous syscalls are used as patterns. How to find a proper set of subsequences with different lengths is the first important thing for accurate prediction. The second one is building the detection engine . As abnormal behavior is rare but catastrophic in self-driving systems, the training data is collected from the clean and safe execution of the target system. Hence, how to use normal data to get a classifier to detect abnormal data is another challenge. In this section, details of the proposed method will be presented.

4.1 The Framework of the Proposed Method

Let be the set of all allowed syscalls by the kernel. Given the kernel source code, is a finite and deterministic set. For example, as defined in /usr/include/x86_64-linux-gnu/asm/unistd_64.h, Ubuntu kernel 4.4.108 x64 contains 326 syscalls, such as ‘0’ for ‘read’, ‘1’ for ‘write’ and ‘96’ for ‘gettimeofday’. An invocation trace, denoted as , is a time-ordered sequence of syscalls, i.e., , where .

Fig. 1: Anomaly detection procedure.

Fig. 1 shows the whole structure of the proposed method. Suppose is a set of normal syscall traces, and is the testing set containing both normal and abnormal syscall traces. Our method includes two steps. The first one is pattern discovery and feature extraction from . This step performs the preprocessing of the raw syscall sequences in and generates the inputs for decision making. We first extract and build the set of patterns from the normal data. Then we classify those patterns into different clusters based on their length, and calculate their frequencies in a trace as the features of this trace. The second step is to train a detection engine. Since the amount of normal data is usually larger than that of abnormal data, we adopt one-class SVM to construct the decision engine. The input to the detection engine is the extracted feature vectors of syscall traces.

4.2 Feature Extraction

Before diving into details, some basic definitions applied in this paper shall be given. Based on the -gram model, patterns can be defined by Definition 1.

Definition 1 (Pattern)

Given a set of normal syscall traces , a syscall pattern is a -gram in , denoted as , where . The set of all patterns is denoted as .

The scale of patterns extracted from normal traces may be very large. The direct application of these patterns may cause high dimensions of the input data for training and detection, which could result in unnecessary overhead. Hence, further clustering is applied to reduce the number of used patterns.

Definition 2 (L- Cluster)

Given the pattern set , its L- cluster, denoted as , is the set of all -grams in , i.e., . All clusters in are denoted as .

Based on this definition, all patterns in a cluster have the same sequence length .

Definition 3 (Feature)

Given the clusters , , , , the feature of is defined as the frequency of in a trace , denoted as . , where is the frequency of in , and is the number of syscalls in .

For example, given a set of normal syscall traces, suppose , , , and , , . If there exists a trace , , , , , , , , , , then we have .

Pattern cluster building. The first thing for feature extraction is to build the set of -k clusters. Given a value , the construction of can be described by function BUILD_SET() (Lines 613) in Algorithm 1, where Lines 812 show the extraction of -grams in each trace . More specifically, the algorithm first searches for all -grams in one trace from the first contiguous syscalls to the last contiguous syscalls. Hence, there are total iterations of Lines 912 for a trace . Once it completes on the current trace, the algorithm switches to check the next one, until all traces in are processed.

1:
2:: an integer denotes the length
3:: the training data set
4:: the testing data set
5:
6:function build_set(, )
7:      
8:      for  each  do
9:            for  do
10:                 if  not in  then
11: is the subsequence of syscalls from the -th syscall to -th syscall in .
12:                       add into                                    
13:      return
14:
15: function to compute the frequency of in a given trace
16:function eval_trace(, )
17:      
18:      for each  do
19:             the times that appears in ;
20:            ;       
21:      return ;
Algorithm 1 Feature extraction for L- cluster.

Take a normal trace as an example to illustrate the performed Line 912. Suppose and currently . Since , Lines 912 will be iterated 5 times to generate from . For the first iteration, the 3-gram is . Since , is added to , resulting in . The next 3-gram is . Since is not in , . Similarly, and are added to sequentially, generating . The last iteration checks the final 3-gram in , i.e., . As it has been in , there is no need to add it again.

Feature extraction. In this paper, the frequency of each cluster is used as features to characterize the property of a trace. Based on Definition 3, given a trace , the frequency of can be computed by function EVAL_TRACE(, ) (Lines 1621) in Algorithm 1. For each pattern in , the function first counts the number of its occurrence in (Line 19), and then updates (Line 20). Since different traces may have different lengths, the absolute amounts of occurrence of in different traces may be significantly different, which can cause bias during training and prediction. Hence, we normalize the occurrence amount of in a trace by dividing the trace length, i.e., Line 21 in Algorithm 1.

By checking all in , we can count their frequencies in a trace. Hence, given an arbitrary , we can measure it via the following -dimensional vector.

(1)

where and is computed from EVAL_TRACE(, ). Clearly, with (1), we can translate the trace space to a subset of , where is the m-dimensional Euclidean space. Suppose the translated spaces of and are and , respectively.

4.3 One-Class SVM based model training

In this paper, we use a popular tool called LIBSVMChang and Lin [2011]. As pointed out by Khan and Madden [2014], the method of Schölkopf et al. Schölkopf et al. [2000] and the SVDD method of Tax and Duin Tax and Duin [2004]

operate comparably and both perform best when the Gaussian kernel is used in practical implementations, we choose one-class SVM (OCSVM) with (Gaussian) radial basis function (RBF) to train the prediction model with the normal data. Such one-class SVMs applied here is according to Schölkopf’s work. By walking through the SVM model, we clearly explain what parameters are required and how they are determined.

In LIBSVMChang and Lin [2011], training an OCSVM model with RBF kernel is to solve (4.3).

s.t. (2)

where , are Lagrange multipliers or dual variables, and ; is the vector of all ones, by default. Thus, is the only one required to be tuned. Because different may generate various optimization problems of (4.3), which in turn affect the solution of . Here, We use the grid search method to determine from the set {0.5 (default), 0.2, 0.1, 0.005, 0.001, 0.0005, 0.0001}, which is collected in early attempts.

Fig. 2: Procedure of training the detection model.
0.5 0.2 0.1 0.05 0.01
FAR DR FAR DR FAR DR FAR DR FAR DR
1 30.695 77.748 12.557 28.150 5.993 11.528 5.672 10.456 4.414 8.847
2 65.027 66.890 8.669 7.775 4.071 4.155 2.653 4.558 1.212 3.619
3 40.096 78.954 15.256 41.957 6.725 33.780 5.833 33.110 4.140 30.831
4 42.864 92.627 27.196 67.024 21.523 62.064 19.511 61.260 17.383 60.188
5 72.210 98.257 34.149 93.566 31.084 89.678 28.454 87.265 26.189 84.316
6 74.909 99.330 40.393 98.794 36.002 95.040 34.218 92.895 31.999 92.493
7 77.493 99.598 43.207 98.928 38.701 95.174 36.848 93.164 35.522 93.164
8 80.764 99.866 48.605 98.525 42.978 95.040 40.714 93.164 38.426 93.164
9 60.476 99.732 53.088 96.381 48.399 94.370 45.334 93.700 43.435 93.700
10 61.528 99.732 53.728 96.381 50.114 94.504 46.935 93.834 44.831 93.834
11 63.701 99.464 54.140 95.576 51.189 94.102 49.062 93.834 47.027 93.834
12 64.478 97.185 54.872 94.906 52.196 93.834 50.160 93.834 47.850 93.834
13 65.599 97.051 56.176 94.504 52.722 93.834 50.595 93.834 48.673 93.700
14 65.759 98.257 58.326 94.906 55.627 94.102 51.715 93.968 49.611 93.700
15 67.109 98.257 58.875 96.113 55.855 94.102 53.317 93.834 50.069 93.834
Table 1: ADFA-LD early results on different clusters (step 1) [%].

In this paper, each input for the OCSVM is a vector in or . Specifically, the data in is used to train the SVM model, and that in is used to test the trained model. For each trace , its feature vector is . If is a normal trace, then is labeled as , otherwise labeled as . Thus, given and its label, the input format for LIBSVM can be written as , where or .

Given a set of normal syscall traces, the simplest way to determine the clusters is to select all possible lengths, from to the length of the longest trace. However, the number of clusters may also become very large. Hence, to optimize the training cost and detection effectiveness, we need to deal with the following issues in practice.

  • The dimension of input data. The number of clusters to be selected for feature extraction has to be decided properly since it determines the dimension of data space of SVM, which affects the training cost significantly.

  • The combination of different clusters. The feature vector is based on the cluster set . Different combinations of clusters will result in various training data, which directly affect the performance of classification. Thus, a proper set of clusters has to be determined.

To solve the above issues, a 3-step method is proposed to choose the clusters and get a proper combination to train a detection model. The whole procedure is shown in Fig. 2. In the sequel, we take the ADFA Linux data set (ADFA-LD) Creech [2013]

as an illustrative example to show a heuristic process for cluster selection.

4.3.1 Probing potential clusters

To find out a proper set of clusters, the first step is to test the classification performance by taking the features of a single cluster (as shown in Fig. 2, whose length varies from 1 to 20, as inputs and checking with different , . The results are shown in Table 1. As our goal is to distinguish abnormal traces from normal ones, as well as to easily compare the final result with those published by similar researches Creech and Hu [2014], Marteau [2019], Xie et al. [2014], Xie and Hu [2013], Xie et al. [2014], Haider et al. [2015], the detection focuses on the false alarm rate (FAR) and detection rate (DR), whose definitions are given in (3),

(3)

where , , and stand for the numbers of true positives, true negatives, false positives, and false negatives, respectively, regarding an abnormal trace as a positive result, as shown in Table 2.

ActualDetected
Abnormal
(Positive)
Normal
(Negative)
Total
Abnormal
Normal
Table 2: Meanings of , , and .

4.3.2 Determining the optimal clusters

The second step is to choose the candidate set and test each possible combination, as shown in Fig. 2. The key here is to decide the maximum pattern length based on the results from the first step. The goal of increasing is to achieve better performance. On one hand, increasing could eventually yield more feature data, which generally benefits the training, if it can increase DR and suppress (decrease or at least no to increase quickly) FAR. On the other hand, should be capped to lower the storage and computation overhead once the increment of does little help in differentiating DR and FAR.

From the observation of early results presented in step-1 (e.g., Table 1), it is a general trend that during the given range of probed in the first step: for each given , 1) both DR and FAR increase with the growth of before they reach the utmost; 2) the increment becomes slow when they are reaching the utmost. In such a situation, can be decided by analyzing the change rate () of DR and FAR, which can be described as (4).

(4)

where is the range of probed in the first step. Thus, can be decided by (5).

(5)

According to (5), is the very point where decreases most, which means that keeping increasing k (while ) helps little in differentiating DR and FAR in the given range of k presented in the first step.

Back to the example, as shown in Table 1, FAR and DR almost stop increasing (or we could say the increment is little) when . Thus, the clusters from L-1 to L-9 can form a candidate set for potential feature extraction, as they yield better detection performance. When searching for a proper set of clusters, possible combinations can range from 2 clusters to 9 clusters in this case. Indeed, for selected clusters, there are combinations. Since the computation of feature (i.e., cluster’s frequency) is independent, evaluating a trace with multiple clusters can be directly done by collecting the frequency of every single cluster from the step-1, and concatenating them to get new input data for the SVM model. For example, suppose the frequencies of L-3, L-4 and L-7 clusters are , and in a given trace individually, we can get a new 3-dimensional data consisting of those clusters to describe this trace, by directly concatenating them:

As reducing operation takes time complexity, we can try different combinations with brute force.

Given the fact that training and evaluating an SVM takes only seconds as shown in our evaluation, trying all possibilities depends on the number of cluster combinations. Thus, the training with multiple sets takes time complexity, where . Back to the example, When such an approach was applied to ADFA-LD, it took minutes (wall time) to finish the training process, from feature extraction to getting all multi-set evaluation results.

4.3.3 Deriving the best cluster combination

However, different combinations may perform similarly. In such a case, two rules are proposed to choose the best one, which is also the final step to achieve the detection model as shown in Fig. 2. First, when multiple candidate combinations are available, the one with the shortest combination length is the best. Here, the combination length is defined as the sum of the length of all patterns used in the combination. For example, for {L-3, L-4, L-7}, the combination length is . Then, if there are still more than one candidate, we use measure to choose the best one.

measure is the harmonic average of the precision and recall, which is defined in  (

6).

(6)

where

where is the number of normal testing traces, and is the number of abnormal testing sequences. In such a case, the one with the largest will be considered the best.

5 Evaluations

5.1 Experiment Setups

The testbed was equipped with Intel Xeon E5-1650 v3 CPU, 32GB Memory and 1TB disk. The host operating system was Ubuntu 16.04.3 x64, with Linux kernel version 4.4.108. We isolated either of localization and mapping components in a Xen virtual machine (VM) with 2 virtual CPUs and 2GB memory and deployed the VMI-based monitor to capture the execution trace for analysis.

The evaluation contains two parts. In the first one (Sec. 5.2 and 5.3), we demonstrate how our approach performs on securing the components of , which is the self-driving system of our ACRONIS Self-Driving Car (modified from a Toyota COMMS Electrical Vehicle with a Velodyn VLP-16 Lidar, a Delphi ESR 2.5 radar, a MTi-G-710-2A8G4 GPS/IMU module and a BFS-PGE-31S4C-C camera). In our early work, we separated into 8 partitions: sensing, localization, data loading (mapping), fusion, object tracking, path planning, motion planning, and path following, where sensing, localization and mapping are isolated by partitions based on virtualization. Experiments were conducted on real data gathered from field tests.

In the second part (Sec. 5.4), we used the ADFA-LD data set to prove the generalization of our method. ADFA-LD is released for host-based anomaly detection, replacing the existing benchmarks such as the KDD-98 and UNM data sets, whose applicability to modern computer systems is suspected. Besides, ADFA-LD has a much larger degree of similarity between attack data and normal data than the KDD collections, which is more complex and harder for detection analysis Creech and Hu [2013]. Thus, testing with ADFA-LD can further validate our method.

5.2 Testing on GNSS Localization Partition

In this test, we used the sample data recorded in Japan and provided by project as the input of the self-driving system. To get valid execution traces, we first ran the system and recorded all syscalls issued by GNSS localization partition (mainly the program), which provided us 479 normal traces as the 1-second monitoring window was used. Each normal trace contained 1274 syscalls averagely. We divided those traces into the training set (240 traces for model training) and the normal testing data set (239 traces).

Then, we ran a malicious program to act as an adversary. Such a program was coded to work as follows. It stealthily gathered and sent resource usage data periodically to a remote server, and issued other critical syscalls trying to disrupt any normal execution or crash other critical services. The whole process was to simulate a hijacked program (e.g., Trojan), which was based on a legitimate system monitor agent embedded in each partition. After the test, an abnormal data set of 479 invalid traces (as abnormal testing data) was gathered. Each abnormal trace’ length was 1479 averagely.

We first trained the detection model with the training set where all data was normal, then tested the model with both normal and abnormal testing data. To train the model, we first extracted features of the pattern clusters from L- to L- based on Algorithm 1. With GNU bash command, we recorded the time overhead of the execution of our python implementation. The total time spent in parallel extracting the features of these clusters was 6 minutes and 55.0 seconds.

0.5 0.2 0.1 0.05 0.01
FAR DR FAR DR FAR DR FAR DR FAR DR
1 50.833 100.000 19.167 99.791 12.917 99.582 5.833 99.374 0.000 36.326
2 45.417 99.791 18.333 99.791 12.500 99.791 7.083 99.582 0.417 36.534
3 50.833 100.000 27.083 99.791 13.750 99.791 6.250 99.791 2.917 37.161
4 54.167 100.000 27.500 100.000 13.750 100.000 7.083 100.000 1.250 99.791
5 61.667 100.000 33.333 100.000 24.583 100.000 16.250 100.000 8.333 100.000
6 75.833 100.000 55.417 100.000 45.417 100.000 35.833 100.000 22.083 100.000
7 89.583 100.000 80.833 100.000 75.417 100.000 65.000 100.000 43.750 100.000
8 96.250 100.000 94.167 100.000 91.667 100.000 87.083 100.000 75.000 100.000
9 99.167 100.000 96.667 100.000 95.417 100.000 94.583 100.000 92.500 100.000
10 99.167 100.000 97.917 100.000 97.500 100.000 94.167 100.000 94.583 100.000
11 99.583 100.000 99.583 100.000 99.167 100.000 98.333 100.000 99.583 100.000
12 100.000 100.000 100.000 100.000 100.000 100.000 98.750 100.000 99.167 100.000
13 100.000 100.000 100.000 100.000 100.000 100.000 99.583 100.000 99.583 100.000
14 100.000 100.000 100.000 100.000 100.000 100.000 100.000 100.000 100.000 100.000
15 100.000 100.000 100.000 100.000 100.000 100.000 100.000 100.000 100.000 100.000
Table 3: Early results of the evaluation on Localization with single cluster [%].

As shown in Table 3, after applying the method proposed in Sec. 4.3, the evaluation results indicated that L- to L- clusters could form a potential candidate set as they provided better performance than others when = 0.01.

Fig. 3: F-measure of the evaluations on localization partition with different .

Then, we used the extracted feature data sets in the previous evaluation to generate the inputs of SVM under various combinations of different clusters. The total number of combinations is , which took 8.2s (recorded by command) to train and test them in parallel. Among all trials, we found that when , combinations such as {1, 8}, {1, 4, 8}, {1, 5, 8}, {1, 6, 8}, {1, 3, 5, 8}, {1, 3, 6, 8}, {1, 3, 7, 8}, {1, 4, 5, 8}, {1, 4, 6, 8}, {1, 4, 7, 8} and {1, 5, 6, 8}, etc. offered the best performance (F-measure = 0.999), as shown in Fig. 3. Note that 10-fold cross validation was adopted to reduce over-fitting during the training for each combination. We chose

since it was the shortest combination. The ROC (Receiver Operating Characteristic) curves of different combinations were also shown in Fig. 

4, which proved that using the combination of L- and L- clusters, i.e., {1, 8}, significantly improved the classification performance.

We compared our approach with single-cluster classifiers and a multi-voter classifier. A single-cluster classifier is a one-class SVM trained by the feature of a single cluster, and the multi-voter classifier contained several single-cluster classifiers. The voter collected results from those individual classifiers and returned the final result with their sum. If the sum is negative, then the voter returns “abnormal”, otherwise “normal”. The results were shown in Table 4. We can conclude that compared with single-cluster classifiers, multi-cluster classifier decreased FAR and maintained high DR, while the multi-voter classifier performed the worst by providing the highest FAR.

Classifier FAR(%) DR(%)
{1, 8} 0.417 100.000
{1} 5.833 99.374
{8} 75.000 100.000
{1} {8} 73.000 100.000
Table 4: Comparison on Localization Partition
Fig. 4: ROC curves for the evaluations of the execution of localization partition ().

5.3 Testing on Mapping Partition

During the test, we used the mapping data gathered from our field test in Singapore as the workload for the target software. The mapping partition broadcast vector maps and point-cloud maps of the driving area, as shown in Fig. 5(a), which were collocated by our self-driving car in a field test, as shown in Fig. 5(b). Our self-driving car circled the area following those waypoints in the map. Meanwhile, all syscalls issued by the Mapping partition were captured and recorded. Like in Sec. 5.2, the model was built with training data (normal traces), and evaluated with both normal and abnormal testing data.

(a) Map of the testing car park area.
(b) Our Toyota COMS AV.
Fig. 5: Our ADV and the map generated by the car in the testing car park.

Like the previous setup, we collected a normal data set consisting of 300 traces with 1-second monitoring window. Those sequences were divided evenly into model training and normal testing data. After another run with the aforementioned malicious program embedded, 300 abnormal traces were gathered in 5 minutes, which were used as abnormal testing data. Among those data, each normal trace contained 617 elements, while each abnormal one had 724, averagely.

0.5 0.2 0.1 0.05 0.01
FAR DR FAR DR FAR DR FAR DR FAR DR
1 56.000 100.000 17.333 99.667 10.000 99.667 2.667 95.333 0.667 93.667
2 33.333 99.667 14.000 99.667 10.000 99.333 2.000 98.000 0.000 97.333
3 48.667 100.000 20.667 100.000 16.667 100.000 8.667 100.000 6.000 100.000
4 52.667 100.000 24.000 100.000 16.000 100.000 14.000 100.000 7.333 100.000
5 65.333 100.000 40.000 100.000 27.333 100.000 22.000 100.000 12.667 100.000
6 87.333 100.000 73.333 100.000 63.333 100.000 47.333 100.000 36.667 100.000
7 93.333 100.000 86.000 100.000 83.333 100.000 80.667 100.000 62.000 100.000
8 99.333 100.000 98.667 100.000 98.000 100.000 97.333 100.000 93.333 100.000
9 99.333 100.000 99.333 100.000 99.333 100.000 98.667 100.000 98.667 100.000
10 100.000 100.000 100.000 100.000 100.000 100.000 100.000 100.000 100.000 100.000
11 100.000 100.000 100.000 100.000 100.000 100.000 100.000 100.000 100.000 100.000
12 100.000 100.000 100.000 100.000 100.000 100.000 100.000 100.000 100.000 100.000
Table 5: Early results of using single pattern set on Mapping [%].

The total time for parallel feature extraction of L-1 to L-12 clusters took 1 minute and 29.0 seconds. Among the results shown in Table 5, L-1 to L-8 clusters can form the pattern set as reaches the minimum value when . Thus, the number of possible combinations was , which took 6.31s to test them using the grid searching. The F-measure results are shown in Fig. 6, from which we could conclude that when = 0.01, each combination achieved the best performance. Among the achieved results, we found combinations of {1, 2, 5} and {1, 2, 6} yielded better results, achieving the best F-measure 0.998. Thus, {1, 2, 5} was chosen as it was the shortest combination, as shown in Fig. 7. The comparison is shown in Table 6, which suggested that the multi-cluster classifier achieved better results than single-cluster ones.

Fig. 6: F-measure of the evaluations on mapping partition with different .
Fig. 7: ROC curves for the evaluations of the execution of mapping partition ().
Classifier FAR(%) DR(%)
{1, 2, 5} 0.667 100.00
{1} 10.000 99.667
{2} 0.000 97.333
{5} 12.667 100.000
{1} {2} {5} 8.00 100.00
Table 6: Comparison on Mapping partition.

From the above two tests, we could find that combining multiple clusters did improve detection accuracy. In the first simulation, combining {1, 8} reduced FAR by 92.85% (compared with only using L-1 cluster), and achieved 100% DR. In the second one, {1, 2, 5} reduced FAR by 93.33% (compared with using L-1 cluster only), and achieved 100% DR. Even when compared with the multi-voter classifier, it greatly reduced FAR by over 90.0%. The comparison showed that by combining multiple features, it can achieve the minimum FAR and maximum DR among all single sets used.

5.4 Comparison with Other Methods on ADFA-LD

In both the above cases, most self-driving programs were running periodically, and the control logic was not very complex, which might lead to the uniformity of syscalls and raise the similarity concern, i.e. normal/abnormal data distributed differently to make detection easier. Although we had shown that it was hard to distinguish the data (Table 3 and 5) with single features (similar to methods in Xie et al. [2014]), we would further evaluate our approach on the public ADFA-LD data set and compare the result with other similar methods.

The ADFA-LD contains 833 training traces, 4372 validation (normal) traces and 746 attack (abnormal) traces. Each training trace contains about 370 syscalls, each validation (normal testing) trace has 485 items, and each attack (abnormal) trace has 426 elements, averagely. In the test, we used the training set to tune the classifier, and tested it on the validation and attack sets.

From the early results in Table 1, we set and used L-1 to L-9 clusters to find a proper combination. With GNU bash command, the recorded time for parallel feature extraction was 24 minutes and 7 seconds, including building the clusters and computing their frequencies. Training and testing trials took 1 minute and 19.8 second. Among all attempts, we found combinations {1, 2, 6}, {1, 3, 6}, {1, 2, 5, 6}, {1, 3, 4, 6}, {1, 2, 3, 4, 7} and {1, 2, 3, 5, 6} had better performance (higher F-measures), as shown in Fig. 9. However, {1, 2, 5, 6}, {1, 3, 4, 6} and {1, 2, 3, 5, 6} were removed since they can be regarded as extensions of {1, 2, 6} and {1, 3, 6}. We compared {1, 2, 6}, {1, 3, 6} and {1, 2, 3, 4, 7} and showed their ROC curves in Fig. 8. Even though their AUCs (Area Under the Curve) were almost the same, {1, 2, 6} provided better classification performance than others since it has the largest F-measure with and .

Fig. 8: ROC curves of evaluations on ADFA-LD ().
Fig. 9: F-measure of evaluations on ADFA-LD with different .

We compared the achieved results with ones reported by the following works, among which either the SVM based solution or other fast detection method was chosen, and the same amount of data was used for training and testing. Those included: Creech et al Creech and Hu [2014] used one-class SVM with semantic features, achieving DR of 80% at 15% FAR, but the training overhead was expensive as it took weeks for extracting the semantic features. Marteau Marteau [2019] used covering similarity with only the training data set (no additional training data from the validation set), achieving DR of 80% at 19% FAR. Xie et al Xie et al. [2014], Xie and Hu [2013] combined k-nearest neighbors (k-NN) and k-mean clustering approaches and achieved DR of 60% at FAR of 20%, In another work Xie et al. [2014], they used short sequences and frequency features to train a one-class SVM, and achieved DR of 70% at FAR of 20%. Haider et al Haider et al. [2015] used four statistical features, i.e., the least/most repeated and the minimum/maximum values in a trace, to represent a trace and detect attacks. They used three supervised learning algorithms: SVMs with linear and radial basis kernels and k-NN, and the best result was k-NN with a 78% DR at 21% FAR. As shown in Table 7, compared with those best results quoted from each publication, our proposed method could get a good detection performance while still took only minutes’ training. For example, compared with Marteau [2019], our approach achieves better results while reducing the size of the pattern set with the proposed method in Sec. 4.3.2, and our approach runs much faster than Creech and Hu [2014] while maintaining similar performance. Please note that some papers only reported the feature extraction time (the overall execution time will be longer).

Methods FAR(%) DR(%) Training Time
Marteau Marteau [2019], training the SVM with only the training data set 19.0 80.0 minutes
Creech et al Creech and Hu [2014]: SVM with semantic features 15.0 80.0 weeks
Xie et al Xie et al. [2014], Xie and Hu [2013]: k-NN and k-mean clustering 20.0 60.0 seconds
Xie et al Xie et al. [2014]: frequencies of short sequences and one-class SVM 20.0 70.0 seconds
Haider et al Haider et al. [2015]: statistical features and SVM 21.0 78.0 seconds
Proposed method 18.9 83.6 minutes
Table 7: Comparison of different methods on ADFA-LD.

5.5 Overhead analysis

The achieved results show that feature extraction takes more time than training SVM models. The process of feature extraction traverses the entire training data set and calculates the feature vector of each trace in both training and testing data sets. Hence, such a process is subjective to the data amount. However, since each extraction attempt is independent, such a process can run in parallel to extract a single L- cluster and calculate the frequency features. Hence, only the maximum extracting time matters. The extraction is essentially a string search process, and the number of traces , the maximal trace length and the number of desired clusters affect the processing time greatly. The time complexity of such a process is . In our evaluation, training a one-class SVM doesn’t take much time. Although the input data size matters, training an SVM model takes averagely less than a second in our experiments.

6 Conclusion and Future Work

In this paper, we propose an inexpensive detection method using syscalls to monitor the execution of critical software functions of self-driving systems. Such a method extracts the syscall pattern of normal traces and uses the invocation frequency of multiple pattern clusters as featured inputs to train a one-class SVM based detection model. The evaluation shows the proposed method can further reduce the false alarm rate and maintain high accuracy based on the test on real data obtained from the self-driving system Autoware. A further comparison against existing works with the ADFA-LD data set demonstrates that such a method improves the detection performance with short training time.

This work shows that combining multiple features could improve detection performance. It is also possible to apply such a method in other domains. In the future, we plan to extend the approach by stealthily intercepting other critical APIs to trace programs’ behaviors (e.g. computation) more accurately, or by mapping several contiguous syscalls to an API invocation (e.g. and its corresponding set of syscalls). In such a way, we may better understand the execution, which could help model the self-driving functions more precisely.

References

  • A. Bouard, B. Weyl, and C. Eckert (2013) Practical information-flow aware middleware for in-car communication. In Proc. 2013 ACM Workshop on Security, Privacy & Dependability for Cyber Vehicles, pp. 3–8. Cited by: §2.
  • R. A. Bridges, T. R. Glass-Vanderlan, M. D. Iannacone, M. S. Vincent, and Q. (. Chen (2019) A survey of intrusion detection systems leveraging host data. ACM Comput. Surv. 52 (6), pp. 128:1–128:35. Cited by: §2, §3.2.
  • C. Chang and C. Lin (2011) LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, pp. 27:1–27:27. Note: Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Cited by: §4.3, §4.3.
  • K. Cheng, Y. Zhou, B. Chen, R. Wang, Y. Bai, and Y. Liu (2020) Guardauto: a decentralized runtime protection system for autonomous driving. External Links: Link Cited by: §3.1.
  • K. T. Cho and K. G. Shin (2016) Error handling of in-vehicle networks makes them vulnerable. In Proc. 2016 ACM SIGSAC Conf. Comput. Commun. Security, pp. 1044–1055. Cited by: §1, §2.
  • H. Choi, W. Lee, Y. Aafer, F. Fei, Z. Tu, X. Zhang, D. Xu, and X. Deng (2018) Detecting attacks against robotic vehicles: a control invariant approach. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS ’18, pp. 801–816. Cited by: §2.
  • G. Creech and J. Hu (2013) Generation of a new ids test dataset: time to retire the kdd collection. In 2013 IEEE Wireless Communications and Networking Conference (WCNC), pp. 4487–4492. Cited by: §5.1.
  • G. Creech and J. Hu (2014) A semantic approach to host-based intrusion detection systems using contiguousand discontiguous system call patterns. IEEE Transactions on Computers 63 (4), pp. 807–819. Cited by: §2, §4.3.1, §4, §5.4, Table 7.
  • G. Creech (2013) The ADFA Intrusion Detection Datasets. External Links: Link Cited by: §1, §4.3.
  • Department of Motor Vehicles, California (2017) Autonomous vehicle disengagement reports 2017. Department of Motor Vehicles, California. Note: Accessed on Dec. 13, 2019 External Links: Link Cited by: §1.
  • S. Forrest, S. Hofmeyr, and A. Somayaji (2008) The evolution of system-call monitoring. In 2008 Annual Computer Security Applications Conference (ACSAC), pp. 418–430. Cited by: §2, §2.
  • A. Greenberg (2015) Hackers remotely kill a jeep on the highway With me in it. Note: Accessed on Dec. 13, 2019 External Links: Link Cited by: §1.
  • F. Guo, Z. Wang, S. Du, H. Li, H. Zhu, Q. Pei, Z. Cao, and J. Zhao (2019) Detecting vehicle anomaly in the edge via sensor consistency and frequency characteristic. IEEE Transactions on Vehicular Technology 68 (6), pp. 5618–5628. Cited by: §2.
  • W. Haider, J. Hu, and M. Xie (2015) Towards reliable data feature retrieval and decision engine in host-based anomaly detection systems. In 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA), pp. 513–517. Cited by: §2, §4.3.1, §5.4, Table 7.
  • X. D. Hoang, J. Hu, and P. Bertok (2009) A program-based anomaly intrusion detection scheme using multiple detection engines and fuzzy inference. Journal of Network and Computer Applications 32 (6), pp. 1219 – 1228. Cited by: §2.
  • G. Kar, H. Mustafa, Y. Wang, et al. (2014) Detection of on-road vehicles emanating GPS interference. In Proc. 2014 ACM SIGSAC Conf. Comput. Commun. Security, pp. 621–632. Cited by: §2.
  • Keen Security Lab of Tencent (2018) Experimental security assessment of BMW cars: a summary report. Keen Security Lab of Tencent. Note: https://keenlab.tencent.com/en/Experimental_Security_Assessment_of_BMW_Cars_by_KeenLab.pdf Cited by: §1.
  • S. S. Khan and M. G. Madden (2014) One-class classification: taxonomy of study and review of techniques.

    The Knowledge Engineering Review

    29 (3), pp. 345–374.
    Cited by: §4.3.
  • W. Khreich, B. Khosravifar, A. Hamou-Lhadj, and C. Talhi (2017) An anomaly detection system based on variable n-gram features and one-class svm. Information and Software Technology 91, pp. 186 – 197. Cited by: §2.
  • W. Khreich, S. S. Murtaza, A. Hamou-Lhadj, and C. Talhi (2018) Combining heterogeneous anomaly detectors for improved software security. Journal of Systems and Software 137, pp. 415 – 429. Cited by: §2.
  • T. K. Lengyel, S. Maresca, B. D. Payne, G. D. Webster, S. Vogl, and A. Kiayias (2014) Scalability, fidelity and stealth in the drakvuf dynamic malware analysis system. In Proceedings of the 30th Annual Computer Security Applications Conference, ACSAC ’14, New York, NY, USA, pp. 386–395. External Links: ISBN 978-1-4503-3005-3 Cited by: §3.1.
  • [22] Linux Syscall Reference. Note: Accessed on Dec. 12, 2019 External Links: Link Cited by: §3.2.
  • L. Liu, S. Nie, and Y. Du (2017) FREE-fall: hacking tesla from wireless to can bus. In BlackHat USA 2017, Keen Security Lab of Tencent, pp. 1–16. Cited by: §1.
  • M. Liu, Z. Xue, X. Xu, C. Zhong, and J. Chen (2019) Host-based intrusion detection system with system calls: review and future trends. ACM Comput. Surv. 51 (5), pp. 98. Cited by: §3.3.
  • P. Marteau (2019) Sequence covering for efficient host-based intrusion detection. IEEE Transactions on Information Forensics and Security 14 (4), pp. 994–1006. Cited by: §2, §4.3.1, §5.4, Table 7.
  • R. Mitchell and I. Chen (2014) Adaptive intrusion detection of malicious unmanned air vehicles using behavior rule specifications. IEEE Transactions on Systems, Man, and Cybernetics: Systems 44 (5), pp. 593–604. Cited by: §2.
  • P. Moosbrugger, K. Y. Rozier, and J. Schumann (2017) R2U2: monitoring and diagnosis of security threats for unmanned aerial systems. Formal Methods in System Design 51 (1), pp. 31–61. Cited by: §2.
  • J. Park, R. Ivanov, J. Weimer, M. Pajic, and I. Lee (2015) Sensor attack detection in the presence of transient faults. In Proc. ACM/IEEE 6th Int. Conf. Cyber-Physical Syst., pp. 1–10. Cited by: §2.
  • B. D. Payne, M. D. P. D. A. Carbone, and W. Lee (2007) Secure and flexible monitoring of virtual machines. In Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007), Vol. , pp. 385–397. External Links: ISSN 1063-9527 Cited by: §3.1.
  • H. Ren, Z. Ye, and Z. Li (2017) Anomaly detection based on a dynamic markov model. Information Sciences 411, pp. 52 – 65. Cited by: §2.
  • B. Schölkopf, R. C. Williamson, A. J. Smola, J. Shawe-Taylor, and J. C. Platt (2000)

    Support vector method for novelty detection

    .
    In Advances in neural information processing systems, pp. 582–588. Cited by: §3.3, §4.3.
  • M. Steger, C. A. Boano, T. Niedermayr, M. Karner, J. Hillebrand, K. Roemer, and W. Rom (2018) An efficient and secure automotive wireless software update framework. IEEE Trans. Ind. Informat. 14 (5), pp. 2181–2193. Cited by: §2.
  • D. M. Tax and R. P. Duin (2004) Support vector data description. Machine learning 54 (1), pp. 45–66. Cited by: §3.3, §4.3.
  • [34] (2019) Tesla firmware upgrade tracker. Note: Accessed on Feb. 21, 2020 External Links: Link Cited by: §1.
  • T. P. Vuong, G. Loukas, and D. Gan (2015) Performance evaluation of cyber-physical intrusion detection on a robotic vehicle. In 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, Vol. , pp. 2106–2113. Cited by: §2.
  • S. Woo, H. J. Jo, and D. H. Lee (2015) A practical wireless attack on the connected car and security protocol for in-vehicle CAN. IEEE Trans. Intell. Transp. Syst. 16 (2), pp. 993–1006. Cited by: §2.
  • M. Xie, J. Hu, and J. Slay (2014) Evaluating host-based anomaly detection systems: application of the one-class SVM algorithm to ADFA-LD. In 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp. 978–982. Cited by: §2, §4.3.1, §5.4, §5.4, Table 7.
  • M. Xie and J. Hu (2013) Evaluating host-based anomaly detection systems: a preliminary analysis of ADFA-LD. In 2013 6th International Congress on Image and Signal Processing (CISP), Vol. 03, pp. 1711–1716. Cited by: §2, §4.3.1, §5.4, Table 7.
  • M. Xie, J. Hu, X. Yu, and E. Chang (2014) Evaluating host-based anomaly detection systems: application of the frequency-based algorithms to ADFA-LD. In Network and System Security, M. H. Au, B. Carminati, and C.-C. J. Kuo (Eds.), pp. 542–549. Cited by: §2, §4.3.1, §5.4, Table 7.
  • K. Zhu, Z. Chen, Y. Peng, and L. Zhang (2019) Mobile edge assisted literal multi-dimensional anomaly detection of in-vehicle network using lstm. IEEE Transactions on Vehicular Technology 68 (5), pp. 4275–4284. Cited by: §2.