An online evolving framework is proposed to support modeling the safe Automated Vehicle (AV) control system by making the controller able to recognize unexpected situations and react appropriately by choosing a better action. Within the framework, the evolving Finite State Machine (e-FSM), which is an online model able to (1) determine states uniquely as needed, (2) recognize states, and (3) identify state-transitions, is introduced.
In this study, the e-FSM’s capabilities are explained and illustrated by simulating a simple car-following scenario. As a vehicle controller, the Intelligent Driver Model (IDM) is implemented, and different sets of IDM parameters are assigned to the following vehicle for simulating various situations (including the collision). While simulating the car-following scenario, e-FSM recognizes and determines the states and identifies the transition matrices by suggested methods.
To verify if e-FSM can recognize and determine states uniquely, we analyze whether the same state is recognized under the identical situation. The difference between probability distributions of predicted and recognized states is measured by the Jensen-Shannon divergence (JSD) method to validate the accuracy of identified transition-matrices. As shown in the results, the Dead-End state which has latent-risk of the collision is uniquely determined and consistently recognized. Also, the probability distributions of the predicted state are significantly similar to the recognized state, declaring that the state-transitions are precisely identified.
: Decision Making Process, Markov Chain, Finite State Machine, Automated Vehicle, Latent Risk Detection
2 1. Introduction
For the intelligent control system, it is required to accurately perceive the current state to react appropriately based on given criteria. For example, the automated vehicle (AV) controller needs to recognize the traffic situations precisely for the decision making of maneuver operation. However, it is a challenge to design the decision-making framework for AV due to unanticipated and complex traffic situations.
For many decades, rule-based, supervised learning, and unsupervised learning approaches have been proposed as a decision-making method to control the automated vehicle (AV) in different manners and levels. As a rule-based approach, the Hierarchical Finite State Machine (HFSM) has been implemented in the AV control framework. In the 2007 DARPA urban challenge, the Hybrid State System (HSS) was proposed to control the AV (OSU-ACT) as shown in[kurt2008hybrid, redmill2008ohio]. The HSS consists of a Discrete State System (DSS), a Continuous State System (CSS), and an Interface layer. The DSS consists of HFSM for the high-level decision-making while the CSS maneuvers the AV at a low-level. In [liu2007human]
, HFSM is implemented to create a decision-making module by analyzing the human driver’s behaviors for intersection driving. The module is embedded in HSS to estimate the human driver’s decision in[gadepally2013framework]. For a set of driving decisions in merging into a convoy, [kurt2011probabilistic] proposes to use the transition probability in FSM instead of conditions of state-transition. For automated driving on the highway, the driving strategy decision model which consists of 2-levels HFSM is proposed in [noh2017decision], and [zhang2017finite] suggests the FSM based automated driving controller with a stochastic gradient optimization method.
As a supervised-learning approach, deep neural-network (DNN) has been proposed as the AV controller.[al2017deep]
is implemented the GoogLeNet to obtain accurate affordance parameters that are used to determine the optimal control actions. The end-to-end learning which is mapping the camera images with optimal controls via a convolutional neural network (CNN) is claimed by[bojarski2016end]hoel2018automated]. The Q-learning algorithm is implemented to learn the optimal policies for various driving behaviors in [you2018highway]. Also, [hejase2018identification] introduced how to analyze the latent-risks by using a Backtracking Process Algorithm (BPA).
In spite of the fact that previously proposed decision-making methodologies for AV control system derive optimal controls in several scenarios, they have some limitations. Rule-based and supervised-learning methods cannot recognize unexpected situations so that the AV controller cannot react appropriately under unknown circumstances. In other words, the performance of rule or supervised-learning based decision-making is guaranteed only under initially anticipated situations. Through the unsupervised-learning method, it is possible to learn the best action under newly encountered states, but its decision-making process depends on a pre-designed reward function and the current state. Its limited capabilities bring the case: the best action at the current state could be worse for the future state.
To surpass the limitations, a combination of the evolving clustering method and Markov Chain, which is suggested in [filev2013generalized], is implemented to derive an evolving module named evolving Finite State Machine (e-FSM). The e-FSM is proposed for online state determination and recognition with identification of state-transitions. Also, an online evolving framework is introduced to show how e-FSM helps the AV controller to choose a better action regardless of the AV controller’s type. The rest of the paper is organized as follows: Section 2 describes how e-FSM determines and recognizes states identifying the state-transitions. In Section 3, e-FSM’s capabilities are validated via analysis of experimental results. An online evolving framework for the safe AV controller system is introduced in Section 4. Finally, the contributions of this study are summarized in Section 5.
3 2. evolving finite state machine Model
The fundamental concept of e-FSM is inspired by the general finite state machine (FSM), which consists of states (nodes) and transition-conditions (links). A framework of e-FSM is proposed as shown in Figure 1. e-FSM consists of states and conditional transition probability, but it can evolve its structure by creating new states via clustering observations over time. Also, state-transitions are represented by multiple matrices, and each matrix is correlated with each action. The matrices are identified or expanded as the following: once a new state is created, the dimension of all transition matrices is expanded. Otherwise, one of the transition matrices, which is correlated with a chosen action, is identified. The specific properties and configurations of state and transition matrices are discussed in the following sub-sections.
3.1 2.1. The properties of e-FSM
e-FSM has a set of states at time , a set of actions , and conditional transition probabilities in the matrix form where is a chosen action at time and ; is the total number of actions. As expected in the notations, the state set and transition matrices can be changed over time, whereas the action set is fixed.
3.1.1 2.1.1. State
The is initially empty, , but is getting to have the various number of states over time such that where is the total number of states at time . Due to that the number of states is not fixed, has the following properties: , otherwise . Each state is represented by a center of clustered observations at time , referring to an unique situation. The observation consists of number of variables which are continuous and/or discrete type of information. For instance, the observation can be defined to represent driving situations such as of which elements refer to a distance between vehicle () and its preceding vehicle (), velocity of , and position of . The recognized state at time is represented by the probability distributions such that . The detail steps to calculate the probability distributions are discussed in Section 2.2.
3.1.2 2.1.2. Action
The needs to be set by the fixed number of finite discrete actions such that where is the total number of actions and . The discrete action set can be obtained by encoding the continuous action set with an arbitrary chosen interval . For example, the continuous longitudinal acceleration set can be encoded to the discrete action set with such that where and . The controller chooses an optimal action in the action-set, and the chosen action is used to identify transition matrices in e-FSM.
3.1.3 2.1.3. State-Transitions
The transitions among determined states are represented by conditional probabilities which are illustrated as transition matrices. Each transition matrix is correlated with each action, therefore, the total number of transition matrices is identical to the total number of actions. Due to that the action set is fixed, the number of transition matrices is not changed, but their dimensions are varied over time based on the determination of new states. Given a chosen action , the transition probability from state to is determined such as . The q number of transition matrices are defined such that where , , and at time . The details for identification and expansion of transition matrices are discussed in Section 2.3.
3.1.4 2.1.4. Example of e-FSM’s evolving sequence
An example is illustrated in Figure 2 to explain how e-FSM is evolving over time. and are states, is a chosen action by the controller at time , and refers the transition probability from state to given a chosen action . After state is determined at time , no state is additionally determined until , therefore, only the transition probability from to are identified based on the chosen actions. At time , a new state is determined so that the dimension of all transition matrices is expanded from to . Then, four state-transitions are identified until when a new state is created.
3.2 2.2. The principle of the online state determination and recognition
Once a set of variables (or observations) is set for representing the state, e-FSM determines or recognizes the states. For the determination of the states, evolving Takagi-Sugeno (eTS) [angelov2004approach, filev2011real] which is one of the online-clustering methods is implemented. This is because that eTS is able to regulate whether the input-data (observation) defined such as Equation 1 is grouped into one of the existing clusters or becomes a center of the new cluster. This feature can be directly applicable to the state determination considering each cluster as a state.
3.2.1 2.2.1. Determination of the new states via eTS online clustering method
The eTS consists of three steps as the follows: First, it calculates a potential of the input-data (single or multiple variables) by Equation 2. Second, the potentials of all existing cluster centers are updated by Equation 6, where is a center of cluster. Lastly, it decides whether the input-data
should be classified to one of the existing clusters or be a center of the new cluster by considering the following two conditions:
Condition 1: , which means that a potential of current input-data is greater than the potentials of all existing cluster centers and is the total number of existing clusters.
Condition 2: , which means that the minimum euclidean distance between the input-data and the closest cluster center is less than , where . If and are satisfied, then one of the existing cluster center which is the closest to is replaced by . If only is satisfied, then a new cluster centered is created. is initialized by , and the first input-data is set as a center of the new cluster having instead of the potential calculation by Equation 2. and are arbitrarily assigned, which affects the frequency of the new cluster creation.
3.2.2 2.2.2. Recognition of the current state
The e-FSM recognizes the current state based on observation and existing states . The recognition of the current state is represented by the probability distributions over the existing states. When a new state is not created by eTS, the similarity function defined as Equation 7 is called to calculate how much is similar to the existing states , . Due to that each similarity is normalized, it is bounded 0 to 1, therefore, the probability distributions of the state at time , , is defined in e-FSM as shown in Equation 8 and Equation 9.
3.3 2.3. The principle of the online state-transition identification
The state-transition identification is critical for e-FSM to predict future states. Due to that transition matrices of which each is correlated with each action are implemented, it is possible to realize what kind of the state-transitions will be appeared based on a chosen action. The stochastic method is proposed to identify the transition-matrices, but also the logic is introduced to expand the dimension of transition-matrices because the number of states is not fixed increasing over time as needed in e-FSM.
3.3.1 2.3.1. Online identification method for a transition matrix in Markov Chain
In Markov Chain, a transition probability from to denoting is defined by Equation 10 and 11, where when state-transition from to is observed at time-step , and when state-transition is initiated from at time-step is observed. On top of the transition probability definition in Markov Chain, the online state-transition identification method as shown in Equation 12-14 is proposed by [filev2013generalized]
to implement Markov models for real-time modeling of continuous systems. In the equations,is a learning rate, and are probability distributions of states at time step and respectively, and
is N-dimensional ones-vector whereis the total number of states. For initialization of a transition matrix, and are set by and respectively; is a small non-negative constant for avoiding singularity; is a compatible-size matrix having unit elements.
3.3.2 2.3.2. The online identification and expansion methods for transition matrices in e-FSM
The online state-transition identification method proposed in [filev2013generalized] is implemented in e-FSM with some modifications. Due to that the new states are determined from time to time in e-FSM, the dimension of transition matrices should be expanded to represent state-transitions between all existing states. Recalling the notation meaning the total number of determined states by time , the dimension of the transition matrices should always be .
Because multiple transition matrices are implemented in e-FSM for representing state-transitions based on the chosen actions, Equation 12, 13, and 14 are re-defined by Equation 15, 16, and 17, where , and . For the initialization, is set by rather than . This is because the number of states in e-FSM is not fixed but is varied over time. Only one of the transition matrices which is correlated with a chosen action is identified by Equation 15 at a time, and others are kept without any changes.
As shown in Figure 1, when a new state (cluster) is created, the expansion of transition matrices is executed following two steps. First step is simply inserting a new row and column into matrix for all so that become where . Then, the elements in the new row and column are initialized by . Second step is the update of vector for all by adding a row so that is increased from to . In , is added to first elements and the last element is initialized by .
For instance, assuming the action set consists of two actions such that in the given example (Figure 2), two transition matrices at time , and , can be calculated by using Equation 15, 16 and 17, where
A new state is determined at time , therefore, the dimension of the two transition matrices needs to be expanded. First, the matrices, and , are expanded and initialized such that and . Second, the vectors, and , are updated such that , and .
To assist the controller’s decision making, e-FSM can provide the controller what state transitions will occur in the future based on chosen actions by calculating the probability distributions of the future state given each possible action. Given , , and , the probability distributions of the state at and can be obtained by Equation 18 and 19 respectively.
4 3. Experimental Setting and Results
Using the Simulation of Urban Mobility (SUMO), a car-following scenario is simulated to show whether the states are uniquely determined and recognized and to show how accurate the state-transitions are identified through e-FSM.
4.1 3.1. Experimental Settings
There is a moment that the driver does not have eligible actions to avoid accidents. For instance, while the following vehicle is driving with the short safe-distance, if the preceding vehicle takes a full-brake, there is no chance for the following vehicle to avoid a collision. Calling the inevitable collision state(s) by the Dead-End (DE) state(s), a simple car-following scenario is designed to make the DE state happen on the one-way road based on the following vehicle’s speed control as shown inFigure 3. The specific scenario settings are determined as the following: while the following and preceding vehicles, and , are driving on the one-way road, each vehicle’s speed, and , is controlled by the individual controller. When the preceding vehicle’s speed reaches its max-speed , it will take a full brake to stop like as the emergency-stop. The car-following scenario is terminated either when a collision is observed or after 35 seconds of simulation; the unit-time is 0.01 secs; therefore, 3500 steps are simulated for a single scenario.
The Intelligent Driver Model (IDM) which is a microscopic car-following model is implemented as the controller of the both vehicles because the model is designed and validated to create the realistic longitudinal car-following motions as discussed in [kesting2010enhanced, treiber2000congested]. The IDM (Equation 21 and 22) consists several parameters of which some are observations (, others () are driver’s preferences of the vehicle ; the desired maximum acceleration, desired velocity, desired headway, desired time-headway, and desired maximum deceleration are represented by , , , and respectively; ; refers a current headway of . By assigning different sets of IDM parameters, distinct driving styles of the controller which choose different actions (e.g., longitudinal acceleration in the scenario) under the identical situation are obtained like as more aggressive or normal type of the longitudinal speed controller. Two different sets of IDM parameters are pre-determined and used to obtain the different types of following vehicle, and a set of IDM parameters is assigned to the preceding vehicle. Therefore, the following vehicle encounters identical situations, but it reacts differently based on its controller type. The parameter set, , is set by , , and for the preceding vehicle, the aggressive following vehicle, and the normal type following vehicle respectively.
Simulating the car-following scenario, e-FSM determines and recognizes states with identifying transition matrices. The observation is defined by to represent the state in e-FSM, and , the variables of eTS, are set by 0.85 and 0.3 respectively, the initial speed of both vehicles is set by 0, and the continuous action set (longitudinal acceleration) is encoded by the range to the discrete action set which consists of 17 intervals such that , where , , .
4.2 3.2. Experiment Process and Results
In the car-following scenario, four cases are simulated by assigning different types of the following vehicle () controller such that: (case 1) ’s controller is set by the aggressive type, (case 2) ’s controller is set by the normal type, (case 3) ’s controller is initially set by the aggressive type then it is changed to the normal type at , and (case 4) ’s controller is initially set by the normal type then it is changed to the aggressive type at . While simulating each case 20 times repeatedly (80 times in total), e-FSM determines or recognizes states, but also expands or identifies the transition matrices.
It is observed that the number of uniquely determined states is increased 0 to 7 after four times of simulation, then no more state is additionally determined in 80 simulations. The Figure 4 shows the e-FSM’s state recognition results in simulating the car-following scenario with the four different settings of the following vehicle’s controller. In the figure, headway, speed, acceleration (the chosen continuous actions by IDM controller), the index of the recognized states by e-FSM, and the index of the interval-encoded chosen actions are shown.
4.2.1 3.2.1. The analysis of e-FSM’s state determination and recognition capabilities
It is focused on whether the DE state is uniquely determined and consistently recognized via e-FSM in the four cases rather than inspecting each state represents what unique situations. This is because that the DE state is recognizable by observing a collision so that the e-FSM’s capability can be validated by analyzing whether an identical state is recognized whenever a collision occurs. As shown in the results, a collision occurs at the end of simulating-horizon in case 1 and 4, whereas the following vehicle stops and goes without a collision in case 2 and 3. Also, e-FSM always recognizes the state #3 when the collision is observed during 80 simulations without exception. In addition, it is studied whether a collision occurs or not when the following vehicle’s state is recognized as the DE state (state #3) regardless of the preceding vehicle’s speed. Additional cases are set such that the preceding vehicle’s full-braking is initiated at and when the following vehicle’s state is recognized as a state #3, but the preceding vehicle doesn’t reach its max-speed. It is identical to case 1 except the initiating moment of the preceding vehicle’s emergency stop. The collision is observed in the both cases. Therefore, it is certified that the DE state is uniquely determined by the state #3 and consistently recognized through e-FSM.
Changing the type of the following vehicle’s controller in the middle of simulating-horizon derives state-transitions as shown in case 3 and 4. The recognized state is changed from the DE state to others after the controller’s type is changed from aggressive to normal in case 3, and vice versa in case 4. It is observed that the collision can be prevented by choosing a better action in advance. In Section 4, the evolving framework is introduced to show how e-FSM can assist the AV controller’s decision-making by providing the recognized latent-risks in advance.
4.2.2 3.2.2. The analysis of e-FSM’s state-transition identification capability
In e-FSM, the transition matrices are expanded or identified based on the determination of a new state or the observation of state-transitions. To show how accurate the transition matrices are identified through proposed methods, the probability distributions of predicted and recognized state, and , are compared. The and are known at every time , therefore, can be calculated via Equation 18 where is one of the identified transition matrices which is correlated with . Only for the first prediction, the uniform distribution and the marginal transition matrix are used such that . The Jensen-Shannon Divergence (JSD) method is implemented which can measure the difference between two probability distributions as described in [lin1991divergence] for the comparison of the two probability distributions.
Under the same scenario and settings, the differences between the two probability distributions are quantified by the JSD method as shown in Figure 5. In the results, the difference between and is relatively more significant than others in all cases because is calculated by using the uniform distribution and the marginal transition matrix. Except for the first prediction, the JSD values are less than 0.15 in all cases. Considering the JSD value is bounded 0 to 1, it is realized that e-FSM’s prediction of the future state is accurate, claiming that transition-matrices are precisely identified through the proposed methodologies. In this study, the prediction of the one-step-ahead state is shown, but e-FSM can predict the further future state by using Equation 18 and 19.
5 4. The overview of an online evolving framework
An online evolving framework for safe AV control is proposed as shown in Figure 6. The framework is independent of the type of controller consisting of two sub-frameworks, an AV control framework and an e-FSM framework. In this study, specific steps in the module are not explained, but it is introduced how e-FSM assists the AV controller to choose a better action. In the framework, possible actions are returned by the independent decision-making of AV controller. The module either improves the returned actions or chooses the best action for safe AV control by using e-FSM’s capabilities. As validated in the previous sections, e-FSM determines states uniquely and recognizes the state consistently, which can make the controller be able to detect initially unexpected dangerous situations. Also, e-FSM identifies state-transitions precisely so that the future states can be predicted accurately, which can support the controller to notice safer action for the future.
6 5. Conclusion
In this paper, specific properties and principles of e-FSM have been discussed, and its capabilities are validated under the simple car-following scenario. As shown in the experimental results, e-FSM can evolve its structure via the online state determination. The determined states represent unique situations, and the recognition of states is illustrated by the probability distributions. Through the proposed stochastic method, e-FSM identifies state-transitions precisely so that the accurate prediction of future states is possible. We claim that e-FSM can support the AV controller to determine unexpected situations, recognize states, and predict future states, which are required for better decision-making, in the online evolving framework.
This study is funded by the National Science Foundation (NSF) Cyber-Physical Systems (CPS) project under contract #60046665. Authors would like to thank for the support.
8 Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: T. Han, D. Filev, U. Özgüner; data collection: T. Han; analysis and interpretation of results: T. Han, D. Filev, U. Özgüner; draft manuscript preparation: T. Han, U. Özgüner. All authors reviewed the results and approved the final version of the manuscript.