Reliability Validation of Learning Enabled Vehicle Tracking

02/06/2020 ∙ by Youcheng Sun, et al. ∙ 0

This paper studies the reliability of a real-world learning-enabled system, which conducts dynamic vehicle tracking based on a high-resolution wide-area motion imagery input. The system consists of multiple neural network components – to process the imagery inputs – and multiple symbolic (Kalman filter) components – to analyse the processed information for vehicle tracking. It is known that neural networks suffer from adversarial examples, which make them lack robustness. However, it is unclear if and how the adversarial examples over learning components can affect the overall system-level reliability. By integrating a coverage-guided neural network testing tool, DeepConcolic, with the vehicle tracking system, we found that (1) the overall system can be resilient to some adversarial examples thanks to the existence of other components, and (2) the overall system presents an extra level of uncertainty which cannot be determined by analysing the deep learning components only. This research suggests the need for novel verification and validation methods for learning-enabled systems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The wide-scale deployment of Deep Neural Networks (DNNs) in safety-critical applications, such as self-driving cars, healthcare and Unmanned Air Vehicles (UAVs), increases the demand for tools that can test, validate, verify, and ultimately certify such systems [10]. Normally, autonomous systems, or more specifically learning-enabled systems, contain both data-driven learning components and model-based non-learning components [23], and a certification needs to consider both categories of components, and the system as a whole.

Structural code coverage combined with requirements coverage has been the primary approach for measuring test completeness as assurance evidence to support safety arguments, for the certification of safety-critical software. Testing techniques for machine learning components, primarily DNNs, are comparatively new and have only been actively developed in the past few years, e.g.,

[18, 26]. Unlike model-based software systems, DNNs are usually considered as black boxes, and therefore it is difficult to understand their behaviour by means of inspection. In [28], we developed a tool, DeepConcolic, to work with a number of extensions to the MC/DC coverage metric, targeting DNNs. The MC/DC coverage metric [8] is recommended in a number of certification documents and standards, such as RTCA’s DO-178C and ISO26262, for the development of safety critical software. Its extension to DNN testing has been shown to be successful in testing DNNs by utilising white-box testing methods, i.e., by exercising the known structure with the parameters of a DNN to gather assurance evidence. It is, however, unclear whether such a testing tool can still be effective when working with learning-enabled systems containing both learning and model-based components. Primarily, we want to understand the following two research questions:

  • Can the system as a whole be resilient against the deficits discovered over the learning components?

  • Is there new uncertainty needed to be considered in terms of the interaction between learning and non-learning components?

The key motivation for considering Q1 is to understand whether, when generating a test suite, a DNN testing tool needs to consider the existence of other components in order to assess the safety of the system. The key motivation for considering Q2 is to understand whether a DNN testing tool can take advantage of the uncertainty presented in the interaction between learning and non-learning components in generating test cases.

Specifically, for the learning-enabled systems, we consider in this paper a tracking system in Wide Area Motion Imagery (WAMI) [39], where the vehicle detection is implemented by a few DNNs and the tracking is implemented with a Kalman filter based method. Using this system, we consider its reliability when running in an adversarial environment, where an adversary can have limited access to the system by intercepting the inputs to the perception unit.

Our experiments provide affirmative answers to the above research questions and point out the urgent need to develop system-level testing tools to support the certification of learning-enabled systems.

Ii Preliminaries

A (feedforward and deep) neural network is a function that maps an input to an output. According to the tasks, the output can be of different format. For example, for classification task, the DNN computes a label, which is denoted by .

Adversarial examples [29] are two very similar inputs with different labels. The existence of such pairs has been used as a proxy metric for the training quality of a DNN. Given an input that is correctly labeled by a DNN , another input is said to be an adversarial example if and are “close enough”, i.e., , and . Here, denotes the -norm distance metric and measures the term “sufficiently small”.

A number of algorithms have been proposed to find adversarial examples. However, such methods are not able to quantify progress, and thus, are not useful as a stopping condition for testing a DNN. This has motivated the coverage criteria recently developed for DNNs.

Ii-a The DeepConcolic Tool

Several structural coverage criteria have been designed for DNNs, including neuron coverage 

[18] and its extensions [15], and variants of Multiple Condition/Decision Coverage (MC/DC) for DNNs [27]. These coverage criteria quantify the exhaustiveness of a test suite for a DNN. Neuron coverage requires that, for every neuron in the network, there must exist at least one test case in the generated test suite that lifts its activation value above some threshold; the criteria in [15] generalise the neuron coverage from a single neuron to a set of neurons. The MC/DC variants for DNNs capture the fact that a (decision) feature (a set of neurons) in a layer is directly decided by its connected (condition) features in the previous layer, and thus require test conditions such that every condition feature must be exhibited regardless of its effect on the decision feature.

DeepConcolic111https://github.com/TrustAI/DeepConcolic implements a concolic analysis that examines the set of behaviors of a DNN, and is able to identify potentially problematic input/output pairs. DeepConcolic generates test cases and adversarial examples for DNNs following the specified test conditions. Developers can use the test results to compare different DNN models, and the adversarial examples can be used to improve and re-train the DNN or develop an adversarial example mitigation strategy. Moreover, a major safety challenge for the use of DNNs is due to the lack of understanding about how a decision is made by a DNN. The DeepConcolic tool is able to test each particular component of a DNN, and this aids human analysis of the internal structures of a DNN. By understanding these structures, we improve the confidence in DNN behaviour; this is different from providing guarantees (for example [12, 35, 19, 20, 36]), just as testing conventional software does not provide guarantees of correctness.

Iii Detection from Wide-Area Motion Imagery

This section describes the technical details of a tracking system. The tracking system requires continuous imagery input from e.g., airborne high-resolution cameras. The input to the tracking system is a video, which consists of a finite sequence of images. Each image contains a number of vehicles. Similar to [39], we use the WPAFB 2009 [1] dataset. The images were taken by a camera system with six optical sensors that had already been stitched to cover a wide area of around . The frame rate is 1.25Hz. This dataset includes 1025 frames, which is around 13 minutes of video. It is divided into training video ( frames) and testing video ( frames). All the vehicles and their trajectories are manually annotated. There are multiple resolutions of videos in the dataset. For the experiment, we chose to use the images, in which the size of vehicles is smaller than pixels. We use to denote the -th frame and the pixel on the intersection of -th column and -th row of .

In the following, we explain how the tracking system works by having a video as input. In general, this is done in two steps: detection and tracking. In Section III-A through to Section III-D, we explain the detection steps, i.e., how to detect a vehicle with CNN-based perception units. This is followed by the tracking step in Section III-E.

Iii-a Background Construction

Vehicle detection in WAMI video is a challenging task due to the lack of vehicle appearances and the existence of frequent pixel noises. It has been discussed in [24, 13] that an appearance-based object detector may cause a large number of false alarms. For this reason, in this paper, we only consider detecting moving objects for tracking.

Background construction is a fundamental step in extracting pixel changes from the input image. The background is built for the current environment from a number of previous frames that were captured by a moving camera system. It proceeds in the following steps.

Image registration

is to compensate for the camera motion by aligning all the previous frames to the current frame. The key is to estimate a transformation matrix,

, which transforms frame to frame using a given transformation function. For the transformation function, we consider projective transformation (or homography), which has been widely applied in multi-perspective geometry, an area where WAMI camera systems are already utilised.

The estimation of is generated by applying feature-based approaches. First of all, feature points from images at frame and , respectively, are extracted by feature detectors (e.g., Harris corner or SIFT-like [14] approaches). Second, feature descriptors, such as SURF [2] and ORB [21] are computed for all detected feature points. Finally, pairs of corresponding feature points between two images can be identified and the matrix can be estimated by using RANSAC [6]

which is robust against outliers.

Background Modeling

We generate the background, , for each time , by computing the median image of the previously-aligned frames, i.e.,

(1)

In our experiments, we take either or .

Note that, to align the previous frames to the newly received frame, only one image registration process is performed. After obtaining the matrices by processing previous frames, we perform image registration once to get , and then let

(2)
Extraction of Potential Moving Objects

By comparing the difference between and the current frame , we can extract a set of potential moving objects by first computing the following set of pixels

(3)

and then applying image morphology operation on , where is the set of pixels and is a threshold value to determine which pixels should be considered.

Iii-B CNN for Detection Refinement

After obtaining

, we develop a Convolutional Neural Network (CNN)

, a type of DNN, to detect vehicles. We highlight a few design decisions. The major causes of false alarms generated by the background subtraction are: poor image registration, light changes and the apparent displacement of high objects (e.g., buildings and trees) caused by parallax. We emphasise that the objects of interest (e.g., vehicles) mostly, but not exclusively, appear on roads. Moreover, we perceive that a moving object generates a temporal pattern (e.g., a track) that can be exploited to discern whether or not a detection is an object of interest. Thus, in addition to the shape of the vehicle in the current frame, we assert that the historical context of the same place can help to distinguish the objects of interest and false alarms.

By the above observations, we create a binary classification CNN to predict whether a pixels window contains a moving object given aligned image patches generated from the previous frames. The pixels window is identified by considering the image patches from the set . We suggest in this paper, as it is the maximum time for a vehicle to cross the window. The input to the CNN is a matrix and the convolutional layers are identical to the traditional 2D CNNs except that the three colour channels are substituted with grey-level frames.

Essentially, acts as a filter to remove from objects that are unlikely to be vehicles. Let be the obtained set of moving objects. If the size of an image patch in is similar to a vehicle, we directly label it as a vehicle. On the other hand, if the size of the image patch in is larger than a vehicle, i.e., there may be multiple vehicles, we pass this image patch to the location prediction for further processing.

Iii-C CNN for Location Prediction

We take a regression CNN to process image patches passed over from the detection refinement phase. As in [13], a regression CNN can predict the locations of objects given spatial and temporal information. The input to is similar to the classification CNN described in Section III-B, except that the size of the window is enlarged to . The output of is a

-dimensional vector, equivalent to a down-sampled image (

) for reducing computational cost.

For each image, we apply a filter to obtain those pixels whose values are greater than not only a threshold value but also the values of its adjacent pixels. We then obtain another image with a few bright pixels, each of which is labelled as a vehicle. Let be the set of moving objects updated from after applying location prediction.

Iii-D Detection Framework

Fig. 1: (a) The architecture of the vehicle detector. (b) Workflow for testing the WAMI tracking system.

The processing chain of the detector is shown in Figure 1(a). At the beginning of the video, the detector takes the first frames to construct the background, thus the detections from frame can be generated. After the detection process finishes in each iteration, it is added to the template of previous frames. The updating process substitutes the oldest frame with the input frame. This is to ensure that the background always considers the latest scene, since the frame rate is usually low in WAMI videos such that parallax effects and light changes can be pronounced. As we wish to detect very small and unclear vehicles, we apply a small background subtraction threshold and a minimum blob size. This, therefore, leads to a huge number of potential blobs. The classification CNN is used to accept a number of blobs. As mentioned in Section III-B, the CNN only predicts if the window contains a moving object or not. According to our experiments, the cases where multiple blobs belong to one vehicle and one blob includes multiple vehicles, occur frequently. Thus, we design two corresponding scenarios: the blob is very close to another blob(s); the size of the blob is larger than . If any blob follows either of the two scenarios, we do not consider the blob for output. The regression CNN (Section III-C) is performed on these scenarios to predict the locations of the vehicles in the corresponding region, and a default blob will be given. If the blob does not follow any of the scenarios, this blob will be outputted directly as a detection. Finally, the detected vehicles include the output of both sets.

Iii-E Object Tracking

Iii-E1 Problem Statement

We consider a single target tracker (ie. Kalman filter) to track a vehicle given all the detection points over time in the field of view. The track is initialised by manually giving a starting point and a zero initial velocity, such that the state vector is defined as where is the coordinate of the starting point. We define the initial covariance of the target, , which is the initial uncertainty of the target’s state222With this configuration it is not necessary for the starting point to be a precise position of a vehicle, and the tracker will find a proximate target to track on. However, it is possible to define a specific velocity and reduce the uncertainty in , so the tracker can track a particular target..

A near-constant velocity model is applied as the dynamic model in the Kalman filter which is defined in (4).

(4)

where

is a identity matrix,

is a zero matrix,

is the state vector in previous timestep, is the predicted state vector and is the process noise which can be further defined as , where

denotes a Gaussian distribution whose mean is zero and the covariance is Q defined in (

5).

(5)

where is the time interval between two frames and is a configurable constant. is suggested for the aforementioned WAMI video.

Next, we define the measurement model as (6).

(6)

where is the measurement (which is the position of the tracked vehicle), denotes the true state of the vehicle and denotes the measurement noise which models the uncertainty involved in the detection process. R is defined as , where we suggest for the WAMI video.

Since the camera system is moving, the position should be compensated for such motion using the identical transformation function for image registration. However, we ignore the influence to the velocity as it is relatively small, but consider integrating this into the process noise.

Iii-E2 Measurement Association

During the update step of the Kalman filter, the residual measurement should be calculated by subtracting the measurement () from the predicted state (). In the tracking system, the gated nearest neighbour to obtain the measurement from a set of detections is considered. K-nearest neighbour is firstly applied to find the nearest detection, , of the predicted measurement, . Then the Mahalanobis distance between and is calculated as follows:

(7)

where is the Innovation covariance, which is defined within Kalman filter.

A potential measurement is adopted if with in our experiment. If there is no available measurement, the update step will not be performed and the state uncertainty accumulates. It can be noticed that a large covariance leads to a large search window. Because the search window can be unreasonably large, we halt the tracking process when the trace of the covariance matrix exceeds a pre-determined value.

Iv Reliability Testing Framework

We consider the reliability of the vehicle tracking system introduced above when its perception units are subject to adversarial attacks. In general, we assume that the vehicle tracking system is running in an adversarial environment and the adversary is able to intercept the inputs to the perception unit in a limited way.

Figure 1(b) outlines the workflow of our testing framework, where DeepConcolic is deployed for reliability testing of the vehicle tracking system. Inside the dashed block, resides the workflow of the original WAMI tracking system as described in Section III. In order to test the reliability of this vehicle tracker, we interface it with DeepConcolic, which accepts as inputs the original WAMI image inputs and the convolutional network. It then generates the distortion, via MC/DC testing, for the input image to lead the network into failing to detect vehicles.

Iv-a Tracks

First of all, we formalise the concept of a track implemented in the WAMI tracker. At each step (frame), every detected vehicle can be identified with a location, represented as a tuple, , where and are the location’s horizontal and vertical coordinates.

Given two locations and , we assume that there is a distance function to quantify the distance between the two detected locations. For example, we can define the distance function as follows.

In reality, when tracking a car, it is reasonable to assume a threshold such that as long as the distance between two locations does not exceed , i.e., , the two locations and can be regarded as the same.

A detected track, denoted by , consists of a sequence of locations such that , where is the total number of steps for tracking. We write the number of steps as the length of the track. Let be the set of finite tracks. Subsequently, given two tracks and of the same length, there are multiple choices to measure the difference between the two, denoted by , including

where represents the -th location on the track . Intuitively, expresses the accumulated distance between locations of two tracks, is the average distance between locations of two tracks, highlights the largest distance between locations of two tracks, and counts the number of steps on which the distance between locations of two tracks is greater than .

Iv-B Reliability Definition

We start from explaining how to obtain a track by the WAMI detector in Section III. Given a number , we let be the set of image sequences of length . For each image sequence , the WAMI tracking system maps it into a track as follows. Let be the -th image and the state of the Kalman filter after steps, where is the location, and is the covariance matrix representing the uncertainty of . From , an estimated location can be obtained. Moreover, from the image , by the WAMI detection based on both and in Section III, we can have a set of locations representing the detected moving objects. By selecting a location which is closest to , we determine the next location , and use this information to update into . In the left of Figure 2, the causality between the above entities is exhibited. For simplicity, we can write the resulting track as

(8)

Because of the existence of the adversary, when the detection components and lack robustness, from we may have a different set, , of detected moving objects. Therefore, there may be a different , which will in turn lead to different and . The causality of these entities is exhibited in the right of Figure 2. We write the resulting track as

(9)
Fig. 2: Illustration of the dependencies and workflows between data, for the cases of before robustness testing (Left) and after robustness testing (Right), respectively.

Therefore, given a track , by attacking the learning components of the tracking system, another track may be returned. Now, we can define the reliability of a tracking system. The reliability of a tracking system over an image sequence, , a function, , and a threshold, , is defined as the non-existence of an adversary, , such that

(10)

when , with being a small enough real number. We use to denote that the maximum adversarial perturbation to be considered is no more than , when measured with -norm. Under such a constraint, the reliability requirement is to ensure that no adversary can produce an attack that results in the attacked track significantly deviating from the original track, i.e., .

Iv-C Reliability Validation

We use DeepConcolic generated test cases for the detection components and to validate the reliability of the tracking system. In theory, we may generate test cases for all steps or a few randomly selected steps. In practice, we found that the tracking system in Figure 1 (b) is more sensitive to the scheme when there are consecutive missing detections of the vehicle.

Thus, we design the testing scheme to test the tracking system, that is, starting from the -th frame, test cases are generated by applying DeepConcolic to successive frames for the tracking system. A track comprises of a finite sequence of vehicle detection, and is useful to access the reliability of different parts of the track. This helps reveal the potential vulnerabilities of the tracking system.

V Experiments

We sample a number of tracks with maximum length of 30, and then apply the with a variety of configurations: , . Example tracks are as in Figure 3.

(a)
(b)
(c)
(d)
Fig. 3: Original detected tracks from the tracking system

At first, we confirm that even by only testing the deep learning component in the tracking system, it is able to find test inputs that help discover the vulnerabilities in the tracking. Figure 4 shows the adversarial tracks found by the testing.

(a)
(b)
(c)
(d)
Fig. 4: Distorted tracks found by DeepConcolic testing

Furthermore, different parts of the same track exhibit different levels of robustness. For example, as shown in Figure 5, the frames are less robust than , as the testing result, from the latter, results in less deviation from the original track; such information provides insight to evaluate and improve the tracking system.

Meanwhile, we also observe that by only testing the deep learning component it may not be sufficient to mislead the overall tracking system. As in Figure 5, the tracking finally converges to the original one. This is due to compensation provided by other components in the system, and it answers the research question, Q1, that the tracking system is resilient to some extent towards the deficits of a learning component.

(a)
(b)
Fig. 5: Adversarial tracks (red) after testing different parts of the original track (green)

Finally, changing more frames does not necessarily result in larger deviation from the original track. This is demonstrated as in Figure 6, where a larger distance between the original and adversarial tracks is found when testing frames , instead of . This observation reflects the uncertainty from other components (other than the CNNs) in the tracking system, as specified in research question Q2.

(a)
(b)
Fig. 6: Adversarial tracks from testing different numbers of consecutive frames starting from

Vi Related Works

Verification of DNNs from software testing perspective has been a popular direction [28, 25, 7, 17, 37, 33, 34, 16, 22, 3, 5, 38], see [10] for a survey with other perspectives such as formal verification and interpretability. Autonomous driving has been the primary application domain for assessments of DNN testing techniques [30, 4, 31, 32]. The analysis in [30, 4] comprises an image generator that produces synthetic pictures for testing neural networks used in classification of cars in autonomous vehicles. In [31], a 3D simulator is proposed to test the dynamics of the pedestrians and the agent vehicles (including simple dynamics for suspension, tyres, etc.), in the virtual environment of the system under test. In [32], a technique is presented to reason about the safety of a closed-loop, learning-enabled control system. Research is also conducted on the verification of cognitive trust between human and autonomous systems [11].

We note that there is limited work focussed on the verification of learning-enabled systems such as the one highlighted in this work.

Vii Conclusions

In this paper, we show that solely testing the correctness of deep learning components in a vehicle tracking system in isolation is insufficient, since either a deficit discovered in the learning component may be suppressed by the existence of other components, or that there are new uncertainties introduced due to the interaction between learning and non-learning components. Similar results are also observed for the connections between LSTM and CNN layers [9]. These results clearly indicate the necessity of developing a testing strategy that addresses learning components in isolation, as well as in combination with other components within a wider system-level testing framework for learning-enabled systems.


This document is an overview of UK MOD (part) sponsored research and is released for informational purposes only. The contents of this document should not be interpreted as representing the views of the UK MOD, nor should it be assumed that they reflect any current or future UK MOD policy. The information contained in this document cannot supersede any statutory or contractual requirements or liabilities and is offered without prejudice or commitment.

Content includes material subject to © Crown copyright (2018), Dstl. This material is licensed under the terms of the Open Government Licence except where otherwise stated. To view this licence, visit http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3 or write to the Information Policy Team, The National Archives, Kew, London TW9 4DU, or email: psi@nationalarchives.gsi.gov.uk.

References

  • [1] AFRL (2009) Wright-patterson air force base (wpafb) dataset. https://www.sdms.afrl.af.mil/index.php?collection=wpafb2009. External Links: Link Cited by: §III.
  • [2] H. Bay, T. Tuytelaars, and L. Van Gool (2006) SURF: speeded up robust features. In Computer Vision – ECCV, A. Leonardis, H. Bischof, and A. Pinz (Eds.), Berlin, Heidelberg, pp. 404–417. External Links: ISBN 978-3-540-33833-8 Cited by: §III-A.
  • [3] J. Ding, X. Kang, and X. Hu (2017) Validating a deep learning framework by metamorphic testing. In Metamorphic Testing (MET), 2nd International Workshop on, pp. 28–34. Cited by: §VI.
  • [4] T. Dreossi, A. Donzé, and S. A. Seshia (2017) Compositional falsification of cyber-physical systems with machine learning components. In NASA Formal Methods Symposium, pp. 357–372. Cited by: §VI.
  • [5] A. Dwarakanath, M. Ahuja, S. Sikand, R. M. Rao, R. Bose, N. Dubash, and S. Podder (2018)

    Identifying implementation bugs in machine learning based image classifiers using metamorphic testing

    .
    In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 118–128. Cited by: §VI.
  • [6] M. A. Fischler and R. C. Bolles (1981-06) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24 (6), pp. 381–395. External Links: ISSN 0001-0782, Link, Document Cited by: §III-A.
  • [7] D. Gopinath, K. Wang, M. Zhang, C. S. Pasareanu, and S. Khurshid (2018) Symbolic execution for deep neural networks. arXiv preprint arXiv:1807.10439. Cited by: §VI.
  • [8] K. Hayhurst, D. Veerhusen, J. Chilenski, and L. Rierson (2001) A practical tutorial on modified condition/decision coverage. Technical report NASA. Cited by: §I.
  • [9] W. Huang, Y. Sun, J. Sharp, and X. Huang (2019)

    Test metrics for recurrent neural networks

    .
    arXiv preprint arXiv:1911.01952. Cited by: §VII.
  • [10] X. Huang, D. Kroening, W. Ruan, Y. Sun, E. Thamo, M. Wu, and X. Yi (2018) A survey of safety and trustworthiness of deep neural networks. arXiv preprint arXiv:1812.08342. Cited by: §I, §VI.
  • [11] X. Huang, M. Kwiatkowska, and M. Olejnik (2019-07) Reasoning about cognitive trust in stochastic multiagent systems. ACM Trans. Comput. Logic 20 (4). External Links: ISSN 1529-3785, Link, Document Cited by: §VI.
  • [12] X. Huang, M. Kwiatkowska, S. Wang, and M. Wu (2017) Safety verification of deep neural networks. In International Conference on Computer Aided Verification, pp. 3–29. External Links: Document, Link Cited by: §II-A.
  • [13] R. LaLonde, D. Zhang, and M. Shah (2018) ClusterNet: detecting small objects in large scenes by exploiting spatio-temporal information. In CVPR, pp. 4003–4012. External Links: Document Cited by: §III-A, §III-C.
  • [14] D. G. Lowe (2004-11) Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60 (2), pp. 91–110. External Links: ISSN 0920-5691, Link, Document Cited by: §III-A.
  • [15] L. Ma, F. Juefei-Xu, J. Sun, C. Chen, T. Su, F. Zhang, M. Xue, B. Li, L. Li, Y. Liu, J. Zhao, and Y. Wang (2018) DeepGauge: comprehensive and multi-granularity testing criteria for gauging the robustness of deep learning systems. In Automated Software Engineering (ASE), pp. 120–131. Cited by: §II-A.
  • [16] L. Ma, F. Zhang, J. Sun, M. Xue, B. Li, F. Juefei-Xu, C. Xie, L. Li, Y. Liu, J. Zhao, et al. (2018) DeepMutation: mutation testing of deep learning systems. In Software Reliability Engineering, IEEE 29th International Symposium on, Cited by: §VI.
  • [17] A. Odena and I. Goodfellow (2018) TensorFuzz: debugging neural networks with coverage-guided fuzzing. arXiv preprint arXiv:1807.10875. Cited by: §VI.
  • [18] K. Pei, Y. Cao, J. Yang, and S. Jana (2017) DeepXplore: automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles, pp. 1–18. Cited by: §I, §II-A.
  • [19] W. Ruan, X. Huang, and M. Kwiatkowska (2018) Reachability analysis of deep neural networks with provable guarantees. IJCAI. Cited by: §II-A.
  • [20] W. Ruan, M. Wu, Y. Sun, X. Huang, D. Kroening, and M. Kwiatkowska (2019) Global robustness evaluation of deep neural networks with provable guarantees for the hamming distance. IJCAI. Cited by: §II-A.
  • [21] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski (2011-11) ORB: an efficient alternative to sift or surf. In International Conference on Computer Vision, Vol. , pp. 2564–2571. External Links: Document, ISSN 2380-7504 Cited by: §III-A.
  • [22] W. Shen, J. Wan, and Z. Chen (2018) MuNN: mutation analysis of neural networks. In International Conference on Software Quality, Reliability and Security Companion, QRS-C, Cited by: §VI.
  • [23] J. Sifakis (2018) Autonomous systems - an architectural characterization. CoRR abs/1811.10277. External Links: Link, 1811.10277 Cited by: §I.
  • [24] L. W. Sommer, M. Teutsch, T. Schuchert, and J. Beyerer (2016) A survey on moving object detection for wide area motion imagery. In IEEE Winter Conference on Applications of Computer Vision, WACV, pp. 1–9. External Links: Document, Link Cited by: §III-A.
  • [25] Y. Sun, X. Huang, D. Kroening, J. Sharp, M. Hill, and R. Ashmore (2019) DeepConcolic: tesing and debugging deep neural networks. In Proceedings of the 41st International Conference on Software Engineering, Cited by: §VI.
  • [26] Y. Sun, X. Huang, D. Kroening, J. Sharp, M. Hill, and R. Ashmore (2019) Structural test coverage criteria for deep neural networks. ACM Transactions on Embedded Computing Systems. Cited by: §I.
  • [27] Y. Sun, X. Huang, D. Kroening, J. Sharp, M. Hill, and R. Ashmore (2019) Structural test coverage criteria for deep neural networks. In Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings, pp. 320–321. Cited by: §II-A.
  • [28] Y. Sun, M. Wu, W. Ruan, X. Huang, M. Kwiatkowska, and D. Kroening (2018) Concolic testing for deep neural networks. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE, pp. 109–119. Cited by: §I, §VI.
  • [29] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2014) Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), Cited by: §II.
  • [30] C. E. Tuncali, G. Fainekos, H. Ito, and J. Kapinski (2018) Simulation-based adversarial test generation for autonomous vehicles with machine learning components. In Intelligent Vehicles Symposium (IV), pp. 1555–1562. Cited by: §VI.
  • [31] C. E. Tuncali, G. Fainekos, H. Ito, and J. Kapinski (2018) Simulation-based adversarial test generation for autonomous vehicles with machine learning components. arXiv preprint arXiv:1804.06760. Cited by: §VI.
  • [32] C. E. Tuncali, H. Ito, J. Kapinski, and J. V. Deshmukh (2018) Reasoning about safety of learning-enabled components in autonomous cyber-physical systems. In 55th Design Automation Conference (DAC), pp. 1–6. Cited by: §VI.
  • [33] J. Wang, G. Dong, J. Sun, X. Wang, and P. Zhang (2019) Adversarial sample detection for deep neural network through model mutation testing. In Proceedings of the 41st International Conference on Software Engineering, Cited by: §VI.
  • [34] J. Wang, J. Sun, P. Zhang, and X. Wang (2018) Detecting adversarial samples for deep neural networks through mutation testing. arXiv preprint arXiv:1805.05010. Cited by: §VI.
  • [35] M. Wicker, X. Huang, and M. Kwiatkowska (2018) Feature-guided black-box safety testing of deep neural networks. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, arXiv preprint arXiv:1710.07859, External Links: 1710.07859, Link Cited by: §II-A.
  • [36] M. Wu, M. Wicker, W. Ruan, X. Huang, and M. Kwiatkowska (2020) A game-based approximate verification of deep neural networks with provable guarantees. Theoretical Computer Science 807, pp. 298 – 329. Note: In memory of Maurice Nivat, a founding father of Theoretical Computer Science - Part II External Links: ISSN 0304-3975, Document, Link Cited by: §II-A.
  • [37] X. Xie, L. Ma, F. Juefei-Xu, H. Chen, M. Xue, B. Li, Y. Liu, J. Zhao, J. Yin, and S. See (2018) Coverage-guided fuzzing for deep neural networks. arXiv preprint arXiv:1809.01266. Cited by: §VI.
  • [38] M. Zhang, Y. Zhang, L. Zhang, C. Liu, and S. Khurshid (2018) DeepRoad: GAN-based metamorphic autonomous driving system testing. In Automated Software Engineering (ASE), 33rd IEEE/ACM International Conference on, Cited by: §VI.
  • [39] Y. Zhou and S. Maskell (2019) Detecting and tracking small moving objects in wide area motion imagery (WAMI) using convolutional neural networks (CNNs). In FUSION, Cited by: §I, §III.