We propose and evaluate a new technique for learning hybrid automata automatically by observing the runtime behavior of a dynamical system. Working from a sequence of continuous state values and predicates about the environment, CHARDA recovers the distinct dynamic modes, learns a model for each mode from a given set of templates, and postulates causal guard conditions which trigger transitions between modes. Our main contribution is the use of information-theoretic measures (1) as a cost function for data segmentation and model selection to penalize over-fitting and (2) to determine the likely causes of each transition. CHARDA is easily extended with different classes of model templates, fitting methods, or predicates. In our experiments on a complex videogame character, CHARDA successfully discovers a reasonable over-approximation of the character's true behaviors. Our results also compare favorably against recent work in automatically learning probabilistic timed automata in an aircraft domain: CHARDA exactly learns the modes of these simpler automata.READ FULL TEXT VIEW PDF
Effective control and prediction of dynamical systems often require
We propose a method of classifying the operation of a system into finite...
In this paper, we analyse the impact of delayed event detection on the
We cast new light on the existing models of 1-way deterministic topologi...
In this paper, an autonomous aerial manipulation task of pulling a plug ...
Automata learning is a popular technique for inferring minimal automata
Semantic representations in the form of directed acyclic graphs (DAGs) h...
Hybrid automata (HAs) combine discrete finite state machines with continuous variables [Alur et al.1993]. These continuous variables are updated at different rates in different states (also called modes) according to state-specific flow constraints. Transitions between states may be guarded on conditions involving (classically) the continuous variables or other predicates, and these transitions may update continuous variables to new values instantaneously. States may also have associated invariant conditions; if an invariant is violated, the state immediately exits along one of its available transitions.
Hybrid automata are a convenient notation for many different dynamical systems, and have at least semi-decision algorithms for a variety of interesting properties (e.g. satisfiability of LTL formulae, general reachability, and existence of optimal control policies) [Alur et al.1995, Henzinger et al.1995, Henzinger and Kopke1999]. Learning (or recovering) HAs from existing systems yields convenient abstractions for human analysis and high-level automated planning; moreover, these abstractions can be refined, possibly automatically (via new data or experimentation).
In this work we present CHARDA, Causal Hybrid Automata Recovery via Dynamic Analysis, a non-parametric framework that learns an HA from observations of a dynamical system. CHARDA has two phases: mode identification and causal guard learning. We identify modes via a dynamic programming approach that segments the trace and finds switchpoints where the dynamics of the system change. Then CHARDA learns causal guard conditions for mode-to-mode transitions using information-theoretic measures.
CHARDA’s segmentation requires no prior knowledge of the number of potential modes or the location of switchpoints, requiring only a set of potential model templates (e.g. or , read respectively as constant velocity or constant acceleration starting from a reset velocity value
). Although the models can take any form (so long as a likelihood function is available), here we use general linear models (multivariate linear regressions). CHARDA performs model selection and segmentation via a principled penalty function. In this work, we tried both the Bayesian Information Criterion (BIC) and Minimum Description Length (MDL), but CHARDA is also penalty-function-agnostic.
We demonstrate CHARDA in a novel domain: videogames, specifically Super Mario Bros (SMB). Games offer a unique set of challenges including non-physical dynamics and potentially very frequent mode transitions on the order of fractions of a second. As a domain, games lie somewhere between synthetic data and a physical robot or other cyber-physical system. Furthermore, games are interesting objects of analysis in their own right. In games specifically, CHARDA has some exciting applications:
In the General VideoGame (GVG) playing domain, an AI could derive HA models for game entities and then do planning on this abstracted space without relying on a forward model [Perez-Liebana et al.2016]
Model-checking/safety analysis of character automata without the overhead of manual modeling by human game designers [Smith et al.2009]
Automatic scraping of characters from existing games for a character behavior corpus, which could then be used for analysis or procedural generation as game levels are already [Summerville et al.2016]
The rest of the paper is structured as follows. First, we discuss other approaches to learning dynamical system models and how CHARDA fits into the existing work here. We then briefly introduce the concrete domain of interest and explain CHARDA’s design and implementation. Finally, we evaluate CHARDA in two domains: internally on the SMB domain, and externally in an aircraft tracking domain for comparison with another recent automaton learning algorithm.
Hybrid automata are an attractive computational model for analysis, control synthesis, and estimation of real-world systems. The inclusion of discrete behavior makes them expressive enough to describe many dynamical systems of interest, and although many classes of hybrid automaton have strong undecidability results[Henzinger et al.1995] there are efficient semi-decision procedures to determine configuration reachability or equivalence between automata [Alur et al.1995]. Hybrid automata, suitably constrained, can also be directly implemented in software or hardware, with proofs about the model translating to the implemented system (given assumptions of e.g. component failure rates and latencies).
Despite the general undecidability of many HA properties, it is possible to constrain models or carefully choose semantics to obtain different analysis characteristics: discretizing time or variable values evades undecidability by approximating the true dynamics [Jha et al.2007]; keeping these continuous but constraining the allowed flow and guard conditions admits geometric analysis [Frehse2005]; and one can always merge states together to yield an over-approximation, producing smaller and simpler models. There are also composable variations of hybrid automata that admit compositional analysis [Alur et al.2003] as well as a logical axiomatization [Platzer2008], not to mention the body of tools and research that already exist for synthesizing control policies, ensuring safety, characterizing reachable areas, et cetera.
Given the desirable properties of this class of model, and the ready availability of tools for dealing with them, many researchers have explored automatically recovering these high-level models from real-world system behaviors. CHARDA shares motivations with HyBUTLA [Niggemann et al.2012], which also aimed to learn a complete automaton from observational data. HyBUTLA seems able to learn only acyclic hybrid automata, since it works by constructing a prefix acceptor tree of the modes for each observation episode and then merges compatible modes from the bottom up. Moreover, HyBUTLA assumes that the segmentation is given in advance and that all transitions happen due to individual discrete events, presumably from a relatively small set. The overall structure of both algorithms—split the observations into a number of intervals in which mode functions are fit, then merge redundant modes—is similar, but CHARDA learns a larger class of automata and does not require data to be pre-split into episodes or segments.
Santana et al.
hybridmodels2015santana learned Probabilistic Hybrid Automata (PHA) from observation using Expectation-Maximization. At each stage of the EM algorithm a Support Vector Machine was trained to predict the probability of transitioning to a new mode. Unlike CHARDA, their work requires a priori knowledge about the number of modes.
The closest work to ours is that of Ly and Lipsonly2012learning which used Evolutionary Computation to perform clustered symbolic regression to find common modes with the Akaike Information Criterion uses to penalize model complexity. However, unlike CHARDA their work assumesa priori knowledge about the number of modes. Moreover, since their work assigns individual datapoints, not intervals, to a mode, their approach can only model stationary processes.
Several approaches have sought to learn models that describe dynamical systems’ behavior. Hidden Markov Models[Baum and Petrie1966] learn probabilistic state transitions between a hidden state and the observed data. The Infinite HMM [Beal et al.2002] extends this to an unbounded number of states which assumes a Chinese Restaurant Process governs the state space. These approaches do not characterize guard conditions, but instead learn the probability of taking state transitions at each instant.
Data segmentation has a natural connection to automaton learning, and CHARDA uses an approach based on least squares regression [Bellman and Roth1969]. Model-based recursive partitioning [Zeileis et al.2008] is an alternative family of techniques which fits a model to the entire dataset and then iteratively and greedily splits that model until reaching a threshold quality level or split count. Unfortunately, each split is only locally optimal so there are no guarantees about global optimality. The Forget-Me-Not-Process [Milan et al.2016] finds a partitioning of time segments that allows for models to be repeated across different partitioned segments; however, it only works for stationary processes, i.e. distributions that do not change over time.
In terms of finding abstract models specifically of Nintendo games, we were inspired by Murphy’s work murphy2016glend in automatically determining physical properties of game characters. That project, like ours, examined runtime memory structures to determine where objects were; they further explored, through experimentation, causal linkages between arbitrary locations in RAM and the visual position of characters on the screen. These relations were used to drive other experiments, e.g. to discover whether game characters fell due to gravity or whether their movement was obstructed by particular types of game objects. In a sense, their work is an ad hoc property-based testing approach to learning which of a fixed set of properties holds. Our work requires less domain knowledge and captures the characters’ behavior more precisely.
In the future we look forward to combining our more general approach with such knowledge-rich techniques to capture more complicated interactions between multiple agents and their environment. A recent publication by Summerville et al. qmark2017summerville similarly used games as their domain, attempting to find causal interactions shared by different entities, and we build on this approach for the causal guard learning.
CHARDA learns hybrid discrete/continuous behaviors of videogame characters or other agents whose inputs and movement behavior are observable. We obtain these inputs from an example playthrough of a game (e.g. SMB), assuming these inputs are representative of the character in question. Replaying this input sequence once through a software emulator of the game’s hardware platform, we read out high-level features from the simulated graphics hardware and assemble those into distinct agents whose positions are tracked over time (we elide the details for space). Importantly, characters may pop in and out of existence, collide with fixed or moving obstacles of various types, or perform other arbitrary (often non-physical) behaviors. We can only observe characters’ positions at a resolution of 1 pixel (a character is generally 8–32 pixels high); even then, the game world and our sensing are at a -second fixed discrete time interval. All our position readings are therefore inaccurate by up to one spatial unit, and these errors naturally propagate to velocity and other calculations.
The input to our automaton learning process for a single entity is: sequences of discrete variable values that are possible control inputs (e.g. button presses), continuous variable values, and sets of predicates describing facts in external theories such as collision (e.g., the character was touching an object with appearance at time
on one side or another). The goal is to go from that input data, presumed to be representative of the entity’s “true” behaviors, to an abstraction suitable for planning or other purposes. This type of data is not hard to obtain for cyber-physical systems under the analyst’s control or in cases where the possible causes for behavior change can be observed at some precision (even a probability distribution for these causes would suffice).
In this work, we look at learning a constrained class of hybrid automata from a combination of controlled (or at least witnessed) inputs and observed outputs. Specifically, though the learned automata may have any structure in terms of the number of modes and transitions, the modes may only have flows from a given set of model templates. In this specific work (and without loss of generality), every mode’s flow condition is a specialization of ; moreover, all transitions leading into a given mode are forced to have the same update function, either or the empty update. Finally, the set of guard conditions is currently assumed to be conjunctions of predicates from a given labeled set. Our causal learning component learns which of these predicates is most associated with the transitions, and prefers those predicates which are more strongly causal. There is no reason these guards could not also be learned as e.g. linear inequalities, since we know the set of modes and their active intervals at the time of cause assignment. Again, we focus here on learning reasonably small over-approximations of the true model: these can always be refined, but we don’t want to exclude any witnessed behaviors.
We break down the hybrid automaton learning process into two parts: Identifying modes and determining causes for transitions. Again, these algorithms operate over a sequence of continuous variable values and a sequence of sets of predicates describing the automaton’s environment at each instant. We roughly follow the classic dynamic programming solution to the segmented least squares problem [Bellman and Roth1969] with a number of distinctions:
Different model templates are considered for each segment, instead of a single least squares regression
A principled penalty instead of a hand-chosen constant
Merging segments if it results in a more optimal model.
The mode identification process first requires the construction of all possible models for all possible sub-intervals. Let be a table of model parameters with one entry for each interval and model template . Then we define ’s entries as:
where is the number of potential switchpoints, is the set of model templates, is the dataset, and is the model of template trained on data from the interval of to . For this work our set of models are all multivariate regressions, but our approach is general enough to work with any approach that supports a likelihood function .
The cost for a given model for sub-interval to is therefore:
given the penalty criterion . For this work we considered two penalties for model complexity. We wanted a principled measure for model complexity for the selection of a given sub-model for an interval, for when a break should occur (due to the inclusion of a switch point increasing model complexity), and for when a merging of modes should occur (due to the inherent fact that two similar but distinct modes are more complex than one mode). To that end we considered both the Bayesian Information Criterion (BIC) [Schwarz and others1978] and the Minimum Description Length (MDL) [Stine2004].
Where is the number of parameters in model and is the number of datapoints in dataset .
The two measures are very similar, being asymptotically the same, but differ in the constants applied to the penalty term. BIC assumes a Bayesian standpoint and determines which model from a set of models is the true model. It operates asymptotically as trends to , given a fixed loss for choosing the wrong model. MDL instead takes an information theoretic standpoint and assumes a spike-and-slab prior distribution for each parameter. Given that prior it takes approximately bits to encode the parameter, 1 bit for whether the parameter (i.e. is a slab) and bits to encode its value (i.e. if it is a spike).
For all segments that end at point we find the optimal model and segmentation that leads to that point. is the optimal cumulative cost of models across segments up to datapoint .
We use dynamic programming to work backwards from the last switch point, finding the optimal sequence of segments that produces the optimal set of models,
After segmentation, the segments’ models are merged if this will improve the overall attractiveness of the entire model, namely by reducing the number of parameters in the overall model by a large enough amount that the decrease in complexity is greater than the decrease in likelihood.
This is accomplished by constructing a new model from the data for segments concatenated to :
The overall sequence of models is improved by the merging if the following inequality holds:
From these merged modes, causal guarded transitions between modes are learned by finding probabilistically likely conditions where the direction of causality is known. Our target domain comes with some advantages for ascribing causality, namely we have inputs supplied by a player and we can be sure of the direction of causality regarding them; however, any domain that allows for instrumentation of exogenous inputs can utilize our same methodology. Another potential source of causal transition guards in our domain is collisions between visible entities, of which, again, we can be sure of the direction of causality. We also look at endogenous variables as a last resort (and then mainly qualitatively), since causality is much harder to ascertain: for example, if we enter a mode with flow it could be that is saturating at a terminal velocity, or it might be for some other reason.
For the SMB domain we consider the following set of predicates for guard condition learning:
Control (Pressed; Held; Released) — A change in the binary control input — Exogenous
Collision with from direction — Collision with another entity, , from a given direction — Exogenous
-in, -out by Sign - A zero crossing or touching in velocity and its characteristics (e.g. from negative to positive, or vice versa) — Endogenous
Velocity Extremum - - the velocity is roughly equal to the extremum for a given mode — Endogenous
Acceleration Sign — has the sign -1, 0, or 1 — Endogenous
Velocity Sign — has the sign -1, 0, or 1 — Endogenous
The Control and Collision predicates are given priority as we can be sure of their direction of causality.
Summerville et al. used Normalized Pointwise Mutual Information (NPMI) to learn semantic information about game objects qmark2017summerville, which led us to believe that we could determine transition guards using a similar technique. We calculate the NPMI of each transition from a predecessor mode to a successor mode with each predicate active during the predecessor mode. NPMI is a scaling of pointwise mutual information defined as:
NPMI for two events is when they never co-occur, when independent, and when they always co-occur. In this work we considered two different thresholds for NPMI, for universal (present all, or nearly all, the times that transition is taken) events and for relevant events. For example, to learn the cause for transitions from hypothetical mode A into mode B, we look at all time intervals where A is active, determine for each predicate how strongly correlated it is with the transition event , and take all those passing a threshold to be causes. These correspond to conjuncts in the guard condition. Those correspondences which are high enough to be of interest but do not meet the threshold are called relevant and are possible disjuncts in the guard condition (assuming it has the form ). If we have an exogenous explanation, we discard endogenous explanations.
We may have cases where out-transitions of a mode are non-deterministic: they have identical causes, or one’s causes subsume another. In these situations the offending target modes are merged, one pair at a time, re-connecting edges as necessary until a fixpoint is reached. This merging greedily abstracts the true automaton, but in practice it seems to work well for domains like game characters whose discrete state changes are generally strongly tied to control inputs or collisions; future work will explore more sophisticated approaches to resolving non-determinism.
To evaluate our work we considered two domains: Aircraft Dynamics Modeling and Mario’s Jump Dynamics from SMB.
We explore the use of CHARDA in aircraft modeling for a direct comparison with Santana et al. hybridmodels2015santana. Their approach used Expectation Maximization [Dempster et al.1977] to recover a hybrid automaton from observational data by iteratively refining an Interactive Multiple Model. Guard conditions were learned by applying support vector machines. As in Santana, we also include results for a Jump Markov Linear System (JMLS) which assumes Markovian transitions.
The aircraft model is given in two distinct scenarios: the first, “Lawnmower” (see Fig. 1), features an aircraft moving in a constant velocity for some period of time and then making a constant-rate turn to reverse heading, repeating this pattern for some number of iterations. In the second scenario, “Random,” the aircraft makes a given maneuver (either constant heading or constant turn) for 50 time steps and then changes to a random maneuver; this is repeated 17 times. We must note that this portion of our evaluation is only based on CHARDA’s segmentation algorithm and does not employ transition guard learning. As the observational data offers no causal information indicating why a mode transition might be made, we do not learn any causal transition guards (which would simply overfit the given observations).
As in Santana’s work we ran 32 trials and discarded the best and worst runs; the results are shown in Table 1. We see that for the Lawnmower domain that we outperform Santana et al., but both are close enough to the ground truth that the difference is negligible. In the Random domain we outperform the prior work dramatically because our segmentation is not based on learning linear guards; we instead find an optimal segmentation based on model accuracy and complexity. We must note again that there are no real causes for why the aircraft changes maneuvers, so it is impossible to learn true causal guards. Santana et al. learn correlative guards for a given training instance, but their learned guards are not applicable to unseen data because they are tuned to that specific training instance (for example, if the aircraft’s flight pattern was rotated or translated, all of their learned guards would be invalidated due to their training domain and linear nature). As such, we feel that it is only relevant to compare the segmentation portion of CHARDA to the prior work. CHARDA would be better-suited if the domain were framed as a control problem and the dataset contained features like operator controls and aircraft sensors.
For the Mario domain, we made no assumptions about the number of true modes and let the non-parametric nature of our approach attempt to recover the correct modes. This means that we are unable to compare to Santana et al. as it requires the number of modes a priori, so instead we compare our results to a manually-defined automaton based on human reverse-engineering of the game’s program code [jdaster642012] (see Fig. 2). We present the HAs learned by CHARDA in Figure 4. The Mario trace used for this work was 3772 frames in length, seconds. The learned HAs are over-approximations of the true HA. Whereas the true HA has 3 separate jump modes based on the state of at the time of transition, the learned HAs have only one such jump whose parameters are averages of the parameters of the true modes. Following from learning just one jump, CHARDA learns only a single falling mode. MDL does learn that releasing the A button while ascending leads to a different set of dynamics, but it considers this a change in gravity as opposed to a reset in velocity.
MDL produces the more faithful model of the true behavior, but is overzealous in its merging of the distinct jump mode chains into a single jump mode chain. As such, it only recovers 7 of the 22 modes; however, abstracting away the differences between the jump chains it learns 7 of 8 modes, only missing the distinction between hard bump and soft bump. A comparison of the modeled behaviors and the truth can be seen in figure 3.
Learned Mario HAs. Parameters as 95% confidence intervals.
We have presented CHARDA, a novel combination of techniques (dynamic programming with a grounded penalty for data segmentation, causal relationship learning) that can recover hybrid automata from observations of a dynamical system. CHARDA outperforms an existing HA learning algorithm in data segmentation, and in a well-suited domain can find causal (not merely correlative) transition guards. We have also demonstrated CHARDA in a novel domain, videogames, that comes with an interesting set of challenges (short time durations, non-physical dynamics) and benefits (full access to all command inputs).
The use of a well-founded penalty criterion in conjunction with the dynamic programming approach is only one of many possible segmentation techniques, and it remains future work to test the general framework of Segmentation + Guarded Transition learning with other techniques. However, the biggest source of error in the learned HAs comes not from mistakes in segmentation, but rather from overzealous merging of modes. The learned parameters at segmentation in fact do describe modes in line with Jump1 and Jump3 (i.e. vs ), but these modes are merged together since it improves the overall learned model according to the criterion. It remains for future work to determine if there is a different principled way to learn these similar but distinct modes. It is also future work to incorporate techniques from other approaches, such as mode assignment via a Chinese Restaurant Process or the Forget-Me-Not Process, to pool modes at segmentation time instead of a post-segmentation merge process.
Beyond improving segmentation, there are also possible improvements to learning guarded transitions. Assuming we had perfect segmentation and mode assignment, we would still not be able to fully capture the guarded transitions of Mario given that our transitions do not have knowledge of Mario’s horizontal velocity, nor are they able to learn transitions based on comparisons to arbitrary thresholds. In some domains, experimentation is possible: we might be able to control the dynamical system in question or to put it into situations where its behavior could be informative. We would like to explore this to improve the precision of our analysis, either by helping to split truly distinctive merged modes or by testing hypothesized guard conditions.
Statistical inference for probabilistic functions of finite state markov chains.The annals of mathematical statistics, 1966.
Proceedings of the twenty-seventh annual ACM symposium on Theory of computing. ACM, 1995.
Journal of Machine Learning Research, 13(Dec):3585–3618, 2012.
Thirtieth AAAI Conference on Artificial Intelligence, 2016.
Journal of Automated Reasoning, 2008.