1 Introduction
Relating observed and modelled process behavior is the lion’s share of conformance checking [9]. Observed behavior is often recorded in form of event logs, that store the footprints of process executions. Symmetrically, process models are representations of the underlying process, which can be automatically discovered or manually designed. With the aim of quantifying this relation, conformance checking techniques consider four quality dimensions: fitness, precision, generalization and simplicity [24]. For the first three dimensions, the alignment between a process model and an event log is of paramount importance, since it allows relating modeled and observed behavior [1].
Given a process model and a trace in the event log, an alignment provides the run in the model which mostly resembles the observed trace. When alignments are computed, the quality dimensions can be defined on top [1, 20]. In a way, alignments are optimistic: although observed behavior may deviate significantly from modeled behavior, it is always assumed that the least deviations are the best explanation (from the model’s perspective) for the observed behavior.
In this paper we present a somewhat symmetric notion to alignments, denoted as antialignments. Given a process model and a log, an antialignment is a run of the model that mostly deviates from any of the traces observed in the log. The motivation for antialignments is precisely to compensate the optimistic view provided by alignments, so that the model is queried to return highly deviating behavior that has not been seen in the log. In contexts where the process model should adhere to a certain behavior and not leave much room for exotic possibilities (e.g., banking, healthcare), the absence of highly deviating antialignments may be a desired property for a process model. Using antialignments one cannot only catch deviating behavior, but also use it to improve some of the current quality metrics considered in conformance checking. In this paper we highlight the strong relation of antialignments and the precision metric: a highlydeviating antialignment may be considered as a witness for a loss in precision. Current metrics for precision lack this ability of exploring the model behavior beyond what is observed in the log, thus being considered as shortsighted [2].
We cast the problem of computing antialignments as the satisfiability of a Boolean formula, and provide highlevel techniques which can for instance compute the most deviating antialignment for a certain run length, or the shortest antialignment for a given number of deviations.
Antialignments are related to the completeness of the log; a log is complete if it contains all the behavior of the underlying process [28]. For incomplete logs, the alternatives for computing antialignments grow, making it difficult to tell the difference between behavior not observed but meant to be part of the process, and behavior not observed which is not meant to be part of the process. Since there exists already some metrics to evaluate the completeness of an event log (e.g., [36]), we assume event logs have a high level of completeness before they are used for computing antialignments. Notice that in presence of an incomplete event log, antialignments can be used to interactively complete it: an antialignment that is certified by the stakeholder as valid process behavior can be appended to the event log to make it more complete.
This work is an extension of recent publications related to antialignments: in [10] we established for the first time the notion of antialignments based on the Hamming distance, and proposed a simple metric to estimate precision. Then, the work in [30]
elaborated the notion of antialignments, heuristically computing them for the Levenshtein distance by adapting the
search technique, and proposed two new notions for tracebased and logbased precision, that can be combined to estimate precision of process models. However, as it was claimed recently in a survey paper advocating for properties precision metrics should have [26], it was not known the satisfiability of the properties for the aforementioned metrics.The contributions of the paper with respect to our previous work are now enumerated.

We show how antialignments can be computed in an optimal way for the Levenshtein distance, without increasing the complexity class of the problem. Moreover we relate the two available distance encodings (Hamming and Levenshtein), and show the implications of using each one for antialignment based precision.

We adapt the precision metrics from [30] to not depend on a particular length defined apriori.

We prove the adherence of one of the new metrics proposed in this paper to most of the properties in [26].

A novel implementation is provided, with several improvements, which makes it able to deal with larger instances.

A new evaluation section is reported, that show empirically the capabilities of the proposed technique for large and reallife instances.
The remainder of the paper is organized as follows: in the next section, a simple example is used to emphasize the importance of antialignments and its application to estimate precision is shown. Then in Section 3 the basic theory needed for the understanding of the paper is introduced. Section 4 provides the formal definition of antialignments, whilst Section 5 formalizes the encoding into SAT of the problem of computing antialignments. In Section 6, we define a new metric, based on antialignments, for estimating precision of process models. Experiments are reported in Section 7, and related work in Section 8. Section 9 concludes the paper and gives some hints for future research directions.
2 A Motivating Example
Let us use the example shown in Figure 1 for illustrating the notion of antialignment. The example was originally presented in [31], and in this paper we present a very abstract version of it in Figure 1(a): The modeled process describes a realistic transaction process within a banking context. The process contains all sort of monetary checks, authority notifications, and logging mechanisms. The process is initiated when a new transaction is requested, opening a new instance in the system and registering all the components involved. The second step is to run a check on the person (or entity) origin of the monetary transaction. Then, the actual payment is processed differently, depending of the payment modality chosen by the sender (cash, cheque and payment). Later, the receiver is checked and the money is transferred. Finally, the process ends registering the information, notifying it to the required actors and authorities, and emitting the corresponding receipt.
Assume that a log covering all the three possible variants (corresponding to the three possible payment methods) with respect of the model in Figure 1(a) is given. The three different variants for this log will be:
ort, cs, pcap, cr, tm, nct
ort, cs, pchp, cr, tm, nct
ort, cs, pep, cr, tm, nct
where we use the acronym for each one of the actions performed, e.g., ort stands for open and register transaction.
For this pair of model and log, most of the current metrics for precision (e.g., [2]) will rightly assess a very high precision. In fact, since no deviating antialignment can be obtained because every model run is in the log, the antialignment based precision metric from this paper will also assess a high (in our case, perfect) precision.
Now assume that we modify a bit the model, adding a loop around the alternative stages for the payment. Intuitively, this (malicious) modification in the process model may allow to pay several times although only one transfer will be done. The modified highlevel overview is shown in Figure 1(b). The aforementioned metric for precision will not consider this modification as a severe one: the precision of the model with respect to the log will be very similar to the one for the model in Figure 1(a).
Remarkably, this modification in the process model comes with a new highly deviating antialignment denoting a run of the model that contains more than one iteration of the payment:
ort, cs, pcap, pchp, pchp, pep, pcap, cr, tm, nct
Clearly, this model execution where five payments have been recorded is possible in the process of Figure 1(b). Correspondingly, the precision of this model in describing the log of only three variants will be significantly lowered in the metric proposed in this paper, since the antialignment produced is very different from any of the three variants recorded in the event log.
3 Preliminaries
Definition 1 ((Labeled) Petri net)
A (labeled) Petri Net [21] is a tuple , where is the set of places, is the set of transitions (with ), is the flow relation, is the initial marking, is the final marking, is an alphabet of actions and labels every transition by an action.
A marking is an assignment of a nonnegative integer to each place. If is assigned to place by marking (denoted ), we say that is marked with tokens. Given a node , its preset and postset are denoted by and respectively.
A transition is enabled in a marking when all places in are marked. When a transition is enabled, it can fire by removing a token from each place in and putting a token to each place in . A marking is reachable from if there is a sequence of firings that transforms into , denoted by . We define the language of as the set of full runs defined by . A Petri net is kbounded if no reachable marking assigns more than tokens to any place. A Petri net is bounded if there exist a for which it is bounded. A Petri net is safe if it is 1bounded. A bounded Petri net has an executable loop if it has a reachable marking and sequences of transitions , , such that .
An event log is a collection of traces, where a trace may appear more than once. Formally:
Definition 2 (Event Log)
An event log (over an alphabet of actions ) is a multiset of traces .
Process mining techniques aim at extracting from a log a process model (e.g., a Petri net) with the goal to elicit the process underlying a system . is considered a language for the sake of comparison. By relating the behaviors of , and , particular concepts can be defined [8]. A log is incomplete if . A model fits log if . A model is precise in describing a log if is small. A model represents a generalization of log with respect to system if some behavior in exists in . Finally, a model is simple when it has the minimal complexity in representing , i.e., the wellknown Occam’s razor principle.
4 AntiAlignments
The idea of antialignments is to seek in the language of a model what are the runs which differ considerably with all the observed traces. Hence, this is the opposite of the notion of alignments [1] which is central in process mining: for many tasks in conformance checking like process model repair or decision point analysis, one needs indeed to find the run which is the most similar to a given log trace [28]. In this paper, we are focusing on precision and for this, traces which are not similar to any observed trace in the log serve as witnesses for bad precision. All these notions anyway depend on a definition of distance between two traces (typically a model trace, i.e. a run of the model, and an observed log trace). We assume a given distance function computable in polynomial time and such that ^{1}^{1}1Actually, we do not require that satisfies the usual properties of distance functions like symmetry or triangle inequality.

for every , ,

for every , converges to when diverges to .
Definition 3 (Antialignment)
For a distance threshold , an antialignment of a model w.r.t. a log is a full run such that , where is defined as the .^{2}^{2}2Since the function takes its values in , we define by convention .
For the following examples, we show antialignments w.r.t. two possible choices of distance : Levenshtein’s distance and Hamming distance.
Definition 4 (Levenshtein’s edit distance )
Levenshtein’s edit distance between two traces is based on the minimum number of deletions and insertions needed to transform to . In order to get a normalized distance between 0 and 1, we define Levenshtein’s edit distance .
Example 1
Consider the Petri net and log shown in Figure 2. With Levenshtein’s distance, the full run is at distance from the log trace (two deletions and one insertion). It is at larger distance from the other log traces. Therefore, it is a antialignment.
Another interesting choice is Hamming distance. It is in general less informative than Levenshtein’s distance for relating observed and modelled behavior, but it has the interest of being very simple to compute. Variants of Hamming distance can also provide good compromises. In Sections 5 and 6, we will show how to efficiently compute antialignments for Hamming distance using SAT solvers.
Definition 5 (Hamming distance )
For two traces and , of same length , define . For longer than , we define , where
is a special padding symbol (
for ‘wait’); we proceed symmetrically when is shorter than .Lemma 1
Observe that, for every and (assuming one of them at least is nonempty), .
Proof
Assume w.l.o.g. . Let . We have, . We have also because one way to transform to is to replace by (one deletion and one insertion) at each position where they differ ( editions), and then to insert the letters ( editions). It remains to see that , which implies
∎
Example 2
Consider the Petri net and log shown in Figure 2. With Hamming distance, the full run is at distance from the log trace ( and do not match with and , and is shorter than , which counts for the third mismatch). It is at larger distance from the other log traces. Therefore, it is a antialignment.
Lemma 2
For every log and finite model , we have:

if the model has finitely many full runs, then there exists (at least) one maximal antialignment of w.r.t. , i.e. that maximizes the distance ;

if the model has infinitely many full runs, then there exist antialignments with arbitrarily close to . Yet, there may not exist any antialignment, i.e. there is no guarantee that the limit is reached for any .
Proof
If the model has finitely many full runs, then one of them must be a maximal antialignment.
Conversely, if is finite and has infinitely many runs, then there must exist arbitrary long full runs; more formally, there exists an infinite sequence of full runs of strictly increasing length. For every , the sequence of converges to . Since there are finitely many , converges to as well. ∎
Maximal antialignments will be used in Section 6 to define our precision metric. The case of models with executable loops will be discussed in Subsection 6.1.1.
Lemma 3
The problem of deciding, given a finite model and a log , whether there exists a antialignment of w.r.t. , has the same complexity as reachability for Petri nets.
Proof
This is equivalent to checking whether , i.e. whether is reachable from .∎
By definition, a antialignment of w.r.t. is a full run satisfying the trivial inequality . The same problem with a strict inequality is also of interest. We will need it in Section 6.1.1.
Lemma 4
The problem of deciding, given a finite model and a log , whether there exists a full run satisfying (or equivalently deciding if ), has the same complexity as reachability for Petri nets.
Proof
The reachability problem reduces to the existence of a full run satisfying : indeed for every .
Conversely, deciding if reduces to deciding reachability of in the synchronous product of with a deterministic Petri net which represents as a tree the log traces sharing their common prefixes, and, from the leaves, marks a sink place , as illustrated in Figure 3. Hence, every full run of , when synchronized with the Petri net representation of the log , leads to a marking of the form , and iff . ∎
The problem of reachability in Petri nets is known to be decidable, but nonelementary [12].
Yet, the complexity drops to NP if a bound is given on the length of the antialignment.
Lemma 5
The problem of deciding, for a Petri net , a log , a rational distance threshold and a bound , if there exists a antialignment such that , is NPcomplete. We assume that is encoded in unary.^{3}^{3}3 Since has typically the same order of magnitude as the length of the longest traces in the log, encoding in unary does not significantly affect the size of the problem instances.
Proof
The problem is clearly in NP: checking that a run is a antialignment of w.r.t. takes polynomial time (remember that we consider distance functions computable in polynomial time).
For NPhardness, we propose a reduction from the problem of reachability of a marking in a 1safe acyclic^{4}^{4}4a Petri net is acyclic if the transitive closure of its flow relation is irreflexive. Petri net , known to be NPcomplete [25, 11]. Notice that, since is acyclic, each transition can fire only once; hence, the length of the firing sequences of is bounded by the number of transitions . Finally, is reachable in iff there exists a of length less or equal to which is a antialignment of (with as final marking) w.r.t. the empty log. ∎
5 SATencoding of AntiAlignments
In this section, we give hints on how SAT solvers can help to find antialignments. We detail the construction of a SAT formula , where is a Petri net, a log, and two integers. This formula will be used in the search of antialignments of w.r.t. for Hamming distance (see Section 5.3 for the encoding using the Levenshtein distance). The formula characterizes precisely the full runs of of length which differs in at least positions with every log trace in .
5.1 Coding Using Boolean Variables
The formula is coded using the following Boolean variables:

for , (remind that is the special symbol used to pad the log traces, see Definition 5) means that transition .

for , means that place is marked in marking (remind that we consider only safe nets, therefore the are Boolean variables).

for , , means that the ^{th} mismatch with the observed trace is at position .
The total number of variables is .
Let us decompose the formula .

The fact that is coded by the conjunction of the following formulas:

Initial marking:

Final marking:

One and only one for each :

The transitions are enabled when they fire:

Token game (for safe Petri nets):


Now, the constraint that deviates from the observed traces (for every , ) is coded as:
with the correctly affected w.r.t. and :
and that for , the ^{th} and ^{th} mismatch correspond to different ’s (i.e. a given mismatch cannot serve twice):
5.2 Size of the Formula
In the end, the first part of the formula () is coded by a Boolean formula of size , with .
The second part of the formula (for every , ) is coded by a Boolean formula of size .
The total size for the coding of the formula is
5.3 SATencoding of AntiAlignments for Levenshtein’s Edit Distance
Our SATencoding of antialignments for Levenshtein’s edit distance uses the same boolean variables as the SATencoding of antialignments for Hamming distance of the previous section, completed with variables used to encode the edit distance.
Our encoding is based on the same relations that are used by the classical dynamic programming recursive algorithm for computing the edit distance between two words and :
We encode this computation in a SAT formula over variables , for , and . Formula will have exactly one solution, in which each variable is true iff and differ by at least editions.
In order to test equality between the and , we use variables and , for , and , and we set their value such that is true iff , and is true iff . Hence, the test becomes in our formulas: . For readability of the formulas, we refer to this coding by . We also write similarly .
In the following, we describe the different clauses of the formula of our SAT encoding of the edit distance.
(1)  
(2)  
(3)  
(4)  
(5) 
Example 3
At instants and of words and , the letters are the same, then, by (4), the distance is only higher or equal to 0 : .
However at instants and , the letters and are different. A step before, and are true because of the length of the subwords. Then, by (5), the distance at instants and is higher or equal to 2 : . The result is understandable because the edit distance costs the deletion of and the addition of to transform to .
In order to insert this encoding of Leventshein’s edit distance into our formulas for antialignments, we need to compute the edit distance between the expected antialignment and every trace of the log, which requires to use variables for , , , to represent the fact that and differ by at least editions.
5.4 Solving the Formula in Practice
In practice, the coding of the formula can be done using the Boolean variables , and .
Then we need to transform the formula in conjunctive normal form (CNF) in order to pass it to the SAT solver. We use Tseytin’s transformation [27] to get a formula in conjunctive normal form (CNF) whose size is linear in the size of the original formula. The idea of this transformation is to replace recursively the disjunctions (where the are not atoms) by the following equivalent formula:
where are fresh variables.
In the end, the SAT solver tells us if there exists a run which differs by at least editions with every observed trace . If a solution is found, we extract the run using the values assigned by the SAT solver to the Boolean variables .
6 Using AntiAlignments to Estimate Precision
In this section we show how to use antialignments to estimate precision of process models. Remarkably, we show how to modify the definitions of [10, 30] so that the new metric does not depend on a predefined length. In Section 6.2 we dive into the adherence of the metric with respect to a recent proposal for properties of precision metrics [26].
6.1 Precision
Our precision metric is an adaptation of our previous versions presented in [10, 30]. It relies on antialignments to find the model run that is as distant as possible to the log traces. Like antialignments, the definition of precision is parameterized by a distance . In the examples, we will specify each time if we use Levenshtein’s edit distance (Definition 4) or Hamming distance (Definition 5).
Definition 6 (Precision)
Let be an event log and a model. We define precision as follows:
For instance, consider the model and log shown in Figure 2. With Levenshtein’s distance, the full run is a maximal antialignment. It is at distance to any of the log traces, and hence .
6.1.1 Handling Process Models with Loops
Notice that a model with arbitrary long runs (i.e., a process model that contains loops) may cause the formula in Definition 6 to converge to 0. This is a natural artifact of comparing a finite language (the event log), with a possibly infinite language (the process model). Since process models in reality contain loops, an adaptation of the metric is done in this section, so that it can also handle this type of models without penalizing severely the loops.
Definition 7 (Precision for Models with Loops)
Let be an event log and a model. We define precision as follows:
with some which is a parameter of this definition.
Informally, the formula computes the antialignment that provides maximal distance with any trace in the log, and at the same time tries to minimize its length. The penalization for the length is parametrized over the ^{5}^{5}5Although, admittedly, is a parameter that should be decided apriori, in practice one can use a particular value to this parameter thorough several instances, without impacting significantly the insights obtained through this metric.. Observe that is precisely the precision of Definition 6. By making Definition 7 not dependant on a predefined length, it deviates from the logbased precision metrics defined in previous work [10, 30].
Let us now consider the model of Figure 4, and the log . Assume that . A possible antialignment is which is at least at Levenshtein’s distance to any of the log traces. For the value of the formula is . Another possible antialignment is which is at least at distance to any of the log traces. For the value of the formula is . Hence, since the antialignment that maximizes the second term of the formula is , the precision computed is . If instead, is set to a lower value, e.g., , the corresponding value of the formula for the antialignment will be the mainimal, and therefore it will be selected as the antialignment resulting in .
6.1.2 Computing
By incorporating the parameter in the definition of precision, now the metric can deal with models containing loops without predefining the length of the antialignment. In this section we show that the proposed extension is welldefined and can be computed, and provide some complexity results of the algorithms involved.
Lemma 6
For every finite model , log and , the supremum in the definition of is reached, i.e. there exists a full run such that .
Proof
Two cases have to be distinguished: if , then the supremum equals , is obviously reached by any , and ; otherwise, let and let ; we show that the supremum in the definition of becomes now a maximum over a finite set of runs, bounded by a given length that depends on and :
with . Indeed, for every strictly longer than , we have , which also shows that . Hence is considered in our , and then . ∎
Lemma 6 gives us the key for an algorithm to compute .
Algorithm 1
Algorithm for computing :

if , then

select

let

explore the reachability graph of until depth and return ;


else return (the model has perfect precision).
The correctness of this algorithm follows directly from Lemma 6. Its complexity resides essentially in the initial test, which corresponds to simply deciding if , whose complexity is given by the following lemma:
Lemma 7
The problem of deciding, for a finite model and a log , if , is equivalent to deciding reachability in Petri nets.
Proof
We simply observe that iff . Deciding this is equivalent to deciding reachability in Petri nets, as showed in Lemma 4. ∎
However, in practice, one would generally skip the first check and jump directly to the exploration until some depth , possibly computed form a given threshold , like the one given by the in Algorithm 1. Notice that the algorithm explores less deep (i.e. is smaller) when is large (close to 1), i.e. is close to the optimal antialignment. We can summarize this with the following variation of Algorithm 1:
Algorithm 2
Algorithm for estimating using a threshold as input:

explore the reachability graph of until depth

if the exploration finds a full run

then output “”

else output “”.
Lemma 8
For any fixed , the problem of deciding, for a finite model , a log and a rational constant , if , is NPcomplete.
Proof
The proof is similar to the one of Lemma 6; here, the bound is given directly, and we have the same equality
with . This means, in order to check that , it suffices to guess a full run of length , where depends linearly on the size of the representation of (number of bits in the numerator and denominator). Then one can check in polynomial time that .
For completeness, we proceed like in Lemma 5: we reduce reachability of in a 1safe acyclic Petri net to with and . ∎
6.2 Discussion about Reference Properties for Precision
Recently, an effort to consolidate a set of desired properties for precision metrics has been proposed [26]. Five axioms are described that establish different features of a precision metric . Summarizing, the axioms proposed in [26] are:

: A precision metric should be a function, i.e. it should be deterministic.

: If a process model allows for more behavior not seen in a log than another model does, then should have a lower precision than regarding :

: Let be a model that allows for the behavior seen in a log , and at the same time its behavior is properly included in a model whose language is ^{6}^{6}6Actually, [26] writes “”, with for powerset, but we believe this is a mistake. (called a flower model). Then the precision of on should be strictly greater than the one for .

: The precision of a log on two language equivalent models should be equal:

: Adding fitting traces to a fitting log can only increase the precision of a given model with respect to the log:
In the aforementioned paper, it is shown that the previous version of our antialignmentbased precision metric (from [30]) does not satisfy axiom (the satisfaction of the rest of axioms are declared as unknowns in the paper). With the new version of the metric presented in this paper, we here provide proofs for these axioms, except for . But at the same time, we show that any precision metric can be adapted in order to satisfy .
Lemma 9
The metric (for any fixed ) satisfies .
Proof
Everything in our definitions is functional. ∎
Lemma 10
The metric satisfies .
Proof
Let . The definitions take the of an expression which does not depend on the model. Since , the ranges over a larger set than the . Therefore the result cannot be smaller, and we get . ∎
Our metrics may not satisfy the strict inequality required by : they satisfy only a weaker version of with nonstrict inequality, but, as observed in [26], this is then simply subsumed by . The authors of [26] precisely introduced after arguing that, in case of a flower model, a strict inequality should be required.
Anyway, we show in Lemma 11 that any precision metric can be modified so that it satisfies this requirement of strict inequality for the flower models.
Lemma 11
Let be any precision metric. It is possible to define a metric from such that satisfies : it suffices to set the precision of the flower models to a value smaller than all the other precision values (after possibly extending the target set of the function). This guarantees and preserves all the other axioms.
Proof
The new metric satisfies by construction. Moreover, if is deterministic (), then also is. For preservation of , and , it suffices to study the different cases (separate flower model and others) to show that the equality and nonstrict inequalities are preserved. ∎
We consider that satisfying is a very artificial issue. However, if really the transformation defined in Lemma 11 had to be implemented, it would imply that, in order to compute the precision, one would have to decide if the model is a flower model, i.e. if . This is known as the universality problem. This problem is, in theory, highly intractable^{7}^{7}7 This universality problem is PSPACEcomplete for nondeterministic finite state automata (NFSA) [17], and here the NFSA to consider would be the reachability graph of (for bounded), which is exponential in the size of . Hence, deciding universality for bounded labeled Petri nets is in EXPSPACE. . But again, this is very artificial: in practice it suffices to explore at a very short finite horizon to detect many many nonflower models.
Lemma 12
The metric satisfies .
Proof
This trivially holds since both metrics are behaviorally defined. Also, we copied this axiom from [26], but observe that it is a simple corollary of as soon as . ∎
Lemma 13
Metrics (for any ) satisfies .
Proof
With , for every , we have , so the cannot be smaller for than for . The rest does not depend on the log. ∎
7 Tool Support and Experiments
In this section we present the new tool implementing the results of this paper, and both a qualitative and quantitative evaluation on stateoftheart benchmarks from the literature. To compare the different distances based results, we denoted Leventshein distance based antialignment precision by and Hamming distance based antialignment precision by .
7.1 da4py: A Python Library Supporting AntiAlignments
Several tools implement antialignments. Darksider, an Ocaml command line software, has already been presented in [30]. It creates the SAT formulas and calls the solver Minisat+ [14] to get the result. ProM software [33] also has an antialignment plugin, that computes antialignments in a brute force way. Recently, we have created a Python library in order to make our technique more accessible: da4py ^{8}^{8}8https://github.com/BoltMaud/da4py, a Python version of Darksider. Thanks to the use of the SAT library PySAT [16], da4py allows one to run different stateoftheart SAT solvers. Moreover, this SAT library uses an implementation of the RC2 algorithm [15] in order to get MaxSAT solutions, a variant that improves a lot the efficiency of computing antialignment. Finally, da4py is compatible with the library pm4py [7], and uses the same data objects.
Remarkably, in order to deal with large logs (as the ones shown in the quantitative evaluation part), da4py has a variant that allows to compute a prefix of antialignments, thus alleviating the complexity by not requiring a full run but only a prefix. Accordingly, the corresponding precision measure is then a variant, that is normalized by the length of the antialignment prefix computed. Furthermore, for antialignments based on Levenshtein’s distance, another simplification is to add a threshold on the number of editions (max_d attribute) between the run and the traces, to compute a lowerbound for the antialignment instead of the complete antialignment.
7.2 Qualitative Comparison
Trace 

A set of examples are taken from page 64 of [23], and consist of the simple event log shown in Table 1 aligned with 10 different process models. The log consists of only five different traces, with various frequencies. The models in Figures 5 to 8 are four examples of models often used to show the differences between fitness, precision and generalization. The model in Figure 5 shows the “ideal” process discovery result, i.e. the model that is fitting, fairly precise and properly generalizing. The models in Figures 9 to 12 present the same set of activities with varying loop and/or parallel constructs. Two new process models that describe particularly different routing logic from the previous models are depicted in Figures 13 and 14.
Model 
