Ensuring the reliability of cyber-physical systems (CPS), such as cars with adaptive cruise control, robotic systems or smart buildings is of vital importance. Testing software for such systems is challenging, as developers need to take into account the interaction with hardware as well as the environment. Consider an autonomous vehicle that may drive in various conditions (fog, rain, snow) and react to pedestrians as well as other cars manoeuvres. Or an IoT network, where, due to extensive traffic, control commands would arrive with delay or get lost.
The input search space of such systems is substantial, therefore metaheuristics and random search based techniques, are often used to generate the test cases . Further, the system model is used to execute the test cases, as it is unpractical to use the physical system, especially on the pre-deployment testing stage . Therefore, it is also important to obtain an accurate system model, that is not time consuming to execute.
A number of tools have been developed to verify whether a model meets specific requirements and whether there are any inputs violating them, e.g., S-Taliro, Breach, falsify ARIsTEO and other tools described in . However, they require to manually specify the requirements, which is a tedious task and does not guarantee the consideration of all possible requirements.
Another direction of research is automatic search based generation of test suites for CPS. Those approaches are mostly often focused on finding the test cases with the best requirements coverage and diversity, but not falsification [3, 8, 10]. They lack flexibility as often require an external software to generate the initial test cases. Also, they are using the Simulink API to execute the models, which can be computationally expensive.
We surmise that it is important to design test suites with high fault revealing power, indicating to developers the possible worst case scenarios of system execution. Moreover, the test cases should consider possible combination of environmental conditions during system execution. For example, a car can ride on a dry or an icy road, changing the model describing it’s trajectory evolution.
Motivation. We explain the motivation for our work with a wirelessly controlled thermostat case study. This system is described in a greater detail in . The thermostat automatically controls the temperature in a closed room by switching between ”on” and ”off” modes. The temperature in the room is set by the user defined schedule. A developer writing software for the thermostat defines a number of parameters such as the sampling rate, the hysteresis value (small threshold before or after reaching the temperature), etc. In addition, the environmental conditions affect the system behaviour: the commands sent from the controller to the thermostat can be delayed or even lost due to the network overload, the temperature decrease/increase speed can vary depending on the time of the day, room humidity, etc. Considering all the parameters, are there scenarios when the system is not able to follow the schedule? What are the combinations of input parameters and environmental conditions that drive the system to an unsafe state? Finding answers to such questions motivated us in developing an automatic search based approach for CPS test case generation.
Ii Problem formulation
The behaviour of a hybrid system can be described with modes having continuous output dynamics and discrete mode switches  . Each hybrid system has input(s) , output(s) and state variables . The expected system behaviour is specified over a time interval . Mode switch occurs when the expected output requirements can’t be met by the system in a particular mode. The dynamics of system state, input and output variables , , in each mode is given by a corresponding mathematical model
, which can be derived from system execution data using system identification or machine learning techniques. Being dependent on the environmental conditions, the models can get very complex and taking a substantial amount of time to execute. We therefore surmise that each mode can be represented by a series of surrogate or simplified models , corresponding to certain environmental conditions . Therefore each series will contain models , where is the model series identifier and - model identifier. The test case (TC) generation problem can thus be thought of as finding a combination of models and input values maximizing the difference between simulated system behaviour over time interval and expected behaviour , with system variables satisfying a certain constraint :
Where is our fitness function computing the deviation between the expected and simulated behaviour. Evolution can be done in one way: for the fixed scenarios find a combination of models, violating the user requirements. Or in both ways: by changing models and system inputs, find the worst possible scenario.
To generate the initial test cases we suggest using hidden Markov chains. This idea is not new, in
for example, Markov chains are used to generate simulation scenarios for a wireless network. We decide to use the Markov chain for two main reasons. First, by running the chain for a number of times, the developer can estimate an average performance of the system. Secondly, in our experiments, the initial population for genetic algorithm (GA) generated with Markov chain provided semantically better test cases, than completely random initialization.
The parameters for the Markov chains, such as states and probabilities of state change, can be estimated from the data on typical system usage scenarios. In this case, ”states” correspond to the system modes. For each state, there is a set of possible output values the system can reach, the duration of being in the state and the model, accounting for the system behaviour corresponding to particular environmental conditions. Therefore, for scenario generation, operation in each state can be represented by a triplet:
where is desired system output in a particular state, - the duration of this state (the sum of should be equal to ) and - model to use, to describe the system behaviour. A test case is represented by a sequence of states:
Finally, we suggest using evolutionary algorithms to find the combination of the, and values maximizing the fitness function . For this study we used a single objective genetic algorithm.
Iii Case study
In our case study we consider an example of a wirelessly controlled thermostat, described in the Introduction.
For the thermostat two modes of operation can be defined: : “ON” and : “OFF”. The behaviour of the system can be represented by a sequence of switching between these modes, and time passed in each mode. The input variable is the goal temperature (expected behaviour) at a given point in time. The output variable is the value of output temperature controlled by the system. It can also have such state variables as - system start temperature in a certain mode, - time spent in a particular mode, etc. The constraints for the input variables are the temperature values between 16 and 25 degrees Celsius. The time intervals spent in each mode can range from 15 minutes to 6 hours.
Iii-a Model creation
To create the system model we used a system identification technique , where a model of a dynamical system is built from the data. The process requires the following steps:
Extract the data describing system behaviour in different modes.
Select a model structure.
Apply an estimation method to estimate values for the adjustable coefficients in the candidate model structure.
Evaluate the estimated model.
The wireless thermostat, controlling temperature in a closed room, is a part of our physical testbed of IoT network of more than 30 devices, based on a Z-wave protocol. Therefore, we extracted the data for creating the model from the experimental measurements. We selected the series of data points, corresponding to behaviour of the thermostat after ”switch on” and ”switch off” commands. One model includes two equations describing behaviour in ”on” and ”off” modes. In total, we could identify 15 models having different coefficients in the equations. Evidently, due to varying environmental conditions, i.e the opened door, higher or lower humidity, heat transfer from outside, the coefficients in the selected model structure had to be adjusted to better fit the original data. Creating one complex model, with high number of inputs, considering the environmental conditions, would make the execution of the model computationally expensive.
One of the challenges is to select the model structure. In our case, the heating and cooling of a closed space is guided by physical laws, such as Newton Law of cooling . The law has an exponential nature, therefore our experimentally selected model structure is based on increasing and decreasing exponential function.
We propose the following time-discreet model structure for the (”on”) mode:
and for the (”off”) mode:
Here , , , are the unique coefficients defining the model behaviour in a particular environment. - is the starting temperature and - the discreet time step value. We keep the coefficients in a table, such as table I. As an example, we show coefficients for the three obtained models.
To obtain the coefficients, the points from the data must be fitted by a curve with minimal deviation. We used python SciPy library, namely class, which is based on non-linear least squares method. The average root mean square error between original and approximated data did not exceed 0.5 degrees.
Iii-B Generating initial test cases
To automatically generate the test cases we represent the thermostat system as a Markov chain with two states ”on” and ”off”, which is shown in Fig.1. The probabilities of changing the states were estimated empirically, so that most of the generated test cases are semantically correct. A change of state occurs with probability of 0.9 and state remains the same with 0.1 probability.
After reaching a particular state, we randomly choose a temperature value the system is expected to reach, the time interval to be spent in the state and the the model coefficients to use, so that each state is represented by a triplet (temperature, duration, model), similar to (1). In this way, a test case represents a temperature schedule a user might define.
For each execution we indicate the expected duration of the test case as well as the number of states. We chose the duration to be 24 hours, representing one day, and having 5 to 12 states in each test case.
We implemented the algorithm in a python script, which saves the generated test cases in a ”json” format.
Iii-C Genetic algorithm description
To find the test cases maximizing the difference between the expected and simulated behaviour we implemented a genetic algorithm in Python with Pymoo framework . In our configuration the number of generations is = 90, mutation rate is = 0.4, crossover rate is = 0.9 and population size: = 100. These values were established experimentally and following the common practices.
Iii-C1 Solution representation
The solution is composed by at least one test case, containing from 5 to 12 states. The chromosomes are the test cases, represented in the software implementation as a dictionary, see Fig.2. They have a variable number of genes, where each gene corresponds to a system state.
We use the k-way tournament selection implemented in Pymoo to choose the parents.
Iii-C3 Crossover operators
We implemented a crossover operator that exchanges the states between two different test cases as shown in Fig.3 We use a one point crossover.
Iii-C4 Mutation operators
We define two mutation operators, similar to :
exchange operator: two states of a chromosome are randomly selected and exchanged the positions;
change of variable operator: a state in a chromosome is randomly selected, then for one of the state variables (temperature, duration, model) value is changed according to its type and maximum as well as minimum values.
Iii-C5 Fitness function
In our study the fitness function evaluates the root mean square error between the simulated and expected behaviour. The expected behaviour is specified in the test cases, which are given as an input to the system simulation. The test case is executed using the specified models and the values of system behaviour are calculated. The fitness of 1 signifies that the system can provide the temperature with the difference from the schedule of 1 degree on average, which might be acceptable for a typical user. As the Pymoo framework minimizes the fitness function, in our implementation we multiply its actual value by (-1).
To evaluate the performance of our GA implementation we ran it 50 times (each run contains 9000 evaluations). After each run we recorded the fittest individuals. We compared its performance with the random search (RS). We recorded the fittest individual after generating 9000 random individuals, repeating the process 50 times. The obtained boxplot is shown in the Fig. 4. In the boxplot we also report the fitness values of all randomly generated individuals during evaluation. We can see that GA always produces better results with an average fitness of -7.2, while the average fitness of the RS best individuals is -2.8. Considering all the randomly generated individuals, the average fitness is around 0.93. For one of the runs we also report the convergence of GA in Fig. 5, which confirms its good performance.
From this evaluation we conclude that our thermostat system performs well on average (the mean deviation from the schedule is around 1 degree) as shown by all randomly generated schedules. However, there are potential scenarios, which can lead to completely wrong system behaviour (deviation from the schedule for 7 degrees on average). It is up to developer to decide, whether the found test cases are realistic or not. If they aren’t, we recommend adjusting the search parameters and constraints.
V Discussion and conclusion
In this paper we suggested an approach for generating fault revealing test cases for hybrid CPS, taking into account variability of system behaviour in changing environmental conditions. It includes generation of models, initial test cases and genetic algorithm implementation in Pymoo framework. The results for the wireless thermostat case study prove the effectiveness of our implementation comparing to random search. With our approach we could evaluate the system performance as well as generate potentially dangerous scenarios. However, it is up to developers to judge if the test cases are pertinent and take further actions to prevent the failures.
The approach can be applied for a wide range of hybrid CPS, what we are going to demonstrate in our future case studies. We also plan to implement our approach as a complete test case generation tool.
-  (2017) Model-based testing of cyber-physical systems. In Cyber-Physical Systems, pp. 287–304. Cited by: §I.
-  (2015) Principles of cyber-physical systems. MIT press. Cited by: §II.
Search-based test case generation for cyber-physical systems.
2017 IEEE Congress on Evolutionary Computation (CEC), pp. 688–697. Cited by: §I, §III-C4.
-  (2020) Pymoo: multi-objective optimization in python. IEEE Access 8 (), pp. 89497–89509. Cited by: §III-C.
-  (2020) ARCH-comp 2020 category report: falsification. EPiC Series in Computing. Cited by: §I.
-  (2019) Modeling and analysis of wireless cyberphysical systems using stochastic methods. Wireless Communications and Mobile Computing 2019. Cited by: §II.
-  (1994) Modeling of dynamic systems. Prentice-Hall. Cited by: §III-A.
-  (2016) Automated test suite generation for time-continuous simulink models. In proceedings of the 38th International Conference on Software Engineering, pp. 595–606. Cited by: §I.
-  (2020) Approximation-refinement testing of compute-intensive cyber-physical models: an approach based on system identification. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE), pp. 372–384. Cited by: §II.
-  (2018) Search based model in the loop testing for cyber physical systems. In 2018 IEEE 16th International Conference on Embedded and Ubiquitous Computing (EUC), Vol. , pp. 22–28. External Links: Cited by: §I, §I.
-  (1999) Newton’s law of cooling. Contemporary Physics 40 (3), pp. 205–212. Cited by: §III-A.
-  (2020) Double cycle hybrid testing of hybrid distributed iot system. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops, pp. 529–532. Cited by: §I.