Matching Logic  (hereafter shorthanded as ML) is a novel framework which is currently used for specifying programming languages semantics [11, 12, 19, 8] and for reasoning about programs [23, 10, 9, 26, 27, 14, 5]. The logic is inspired from the domain of programming language semantics and it aims to use the operational semantics of a programming language as a basis for both execution and verification of programs.
On the program verification side, ML has some advantages over the existing program verification logics. The logic is parametric in the operational semantics of a language. One can execute the semantics against test suites and then use the same semantics for verification. Therefore, one can detect issues in the semantics at an early stage and fix them right away, thus, providing additional trust in the semantics. The proof system of ML is proved sound and (relatively) complete for all languages, unlike in the existing Floyd-Hoare logics, where the soundness of proof systems needs to be proved separately for each language. Moreover, ML eliminates the need to prove consistency relations between the operational semantics (used for execution) and the axiomatic semantics (used for verification) as it is often the case when using the traditional approaches.
The ML formulas, called patterns, are built using variables, symbols, connectives and quantifiers. A pattern is evaluated to the set of values that matches it. ML makes no distinction between function symbols and predicate symbols. Not having this distinction increases the expressivity of the language, where various notions (e.g., function, equality) can be specified using symbols that satisfy some axioms.
An example of such a ML formula is below: it matches over the set of lists that start at address and store the sequence which contains an even number on the third position:
Basically, the novelty in ML w.r.t. first-order logics is that structural components are formulas as well. In our example, is a conjunction of a structural component – that is, a list that starts at address which stores a sequence implemented as an array (encoded using the - axioms ), – and a constraint . In ML, the structural components are called term patterns, whereas the constraints are called predicates patterns.
The conjunction of two ML patterns may produce a new pattern with more than one structural component, as shown here:
Finding a set of elements that matches the conjunction is not necessarily an easy task mainly because both structural components and need to be matched simultaneously. In theory, this set is the intersection of the sets that match and independently.
In practice, dealing with multiple structural components in one formula is cumbersome. Reasoning with such formulas is a burden for larger formulas. Also, when mixing multiple structural components in one formula we lose the separation between structure and constraint. This separation is essential when implementing a ML prover, where the constraints can be handled separately using existing SMT solvers. In our examples above, the constraints of both and can be dealt with using existing SMT solvers like Z3  or CVC4  since they provide theories for handling arrays and quantifiers. A more convenient approach would be to work with formulas that have only one structural component.
In ML, the semantics of is the largest set of elements matching and . Thus, the conjunction of two patterns can be seen as a semantic unification of the two patterns. So, it makes sense to relate syntactic unification to this notion of semantic unification . Let us consider the particular case when , where is a term pattern and is a predicate pattern, . In this case:111For the sake of presentation, we assume here that all patterns have the same sort. Also, the last equality in the sequence holds because of a lemma which is presented in the technical section of the paper.
The predicate patterns expressing the equality of two term patterns cannot be handled, e.g., by SMT solvers. Therefore, it would be more convenient to reduce it to a simpler equivalent predicate , which can be handled using external provers. In addition, it would be worth to produce a formal proof of the equivalence between and .
At a first sight, unification of terms seems to be useful here. If is the most general unifier of and , seen as first-order terms, then . Unifiers are substitutions, and substitutions can be transformed into ML formulas .
In our list example, and have as the most general unifier. Translating to a formula results in . For this particular case, the term pattern equality is equivalent to . Moreover, the semantic unifier is also equivalent to . This form is now convenient since it has only one structural component and a constraint manageable by an SMT solver.
We show that can be obtained using the most general unifier of and , whenever it exists. The proof of the equivalence between and is not trivial and, surprisingly, it depends on the algorithm used to compute the most general unifier. Our proof uses the syntactic unification algorithm proposed by Martelli and Montanari . Since the equivalence is proved only for the case when the most general unifier exists, we say that this algorithm is sound for semantic unification in ML.
Unfortunately, this algorithm is not complete for semantic unification: if the terms and are not syntactically unifiable, then there are no guarantees that is a ”contradiction” in ML. We present a detailed analysis of this aspect and we provide a counterexample.
Finally, a provableness property of the Martelli-Montanari unification algorithm is shown: we provide a sound strategy to generate a proof certificate of the equivalence between and with the most general unifier of and . This proof uses the rules of the ML proof system , and the main idea is to transform the steps of the unification algorithm into sequences of proof steps. The proposed approach is validated by a Coq encoding, which mechanically checks the correctness of the applied strategy.
All these contributions explicitly establish the relationship between syntactic unification and semantic unification in ML, as summarised be the next table:
1.0.1 Paper organisation.
In Section 2.1 we recall the main notions and notations from the unification theory that we use in this paper. Section 2.2 includes a concise presentation of Matching Logic based on . In Section 3 we show how to find the convenient representation of our semantic unifiers using the syntactic unification algorithm. We prove that the unification algorithm is sound for semantic unification and we discuss why this algorithm is not complete for semantic unification. In Section 4 we describe sound strategies for generating proofs that can be further used to generate proof certificates.
2.1 Syntactic Unification
Let be a set of sorts. We consider a (countably) infinite S-indexed set of variables and a signature, i.e., a (finite or countably infinite) S-indexed set of function symbols, . By we denote the algebra of ground terms and by the corresponding term algebra generated by . To keep the presentation simple (as in ) we do not explicitly show the sorts of the terms unless they cannot be inferred from context. This does not restrict in any way the generality and will be handled properly when transferring all these to Matching Logic.
We use the typical conventions and notations. Letters denote variables and denote symbols. Terms are either variables or compound terms of the form ; means that has arity , that is, for each , the subterm is of sort and the sort of is . If then is a constant and the term is simply denoted by . By we denote the set of variables occurring in a term . Substitutions are denoted by symbols or directly as a set of bindings . We use to denote the identity substitution. The application of a substitution to a term is denoted 222Although substitutions are defined only over a set of variables, it is well-known that they can be extended to terms. Also, if a substitution is not defined for a variable, say , then we consider .. The composition of substitutions and is denoted as . If and then . Two substitutions and are equal, written , if they are extensionally equal: for every variable . A substitution is more general than a substitution , written as , if there is a substitution such that .
Let us consider a sort and a signature that includes the symbols , where and . Then is a ground term, is a term with variables and . A substitution applied to produces . If then , because there is such that .
Definition 1 (Unifier, Most General Unifier)
A substitution is a unifier of two terms and if . A unifier is the most general unifier (hereafter shorthanded as mgu) if for every unifier of and we have .
If and are terms then is a unifier of and : .
Whenever there exists a unifier for two given terms we say that the terms are unifiable. It is not always the case that, given two terms, we can find unifiers for them. For example, recall from Example 2 and consider . Then and are not unifiable because it is impossible to find a substitution such that . In the particular context of syntactic unification, for every two unifiable terms there exists a most general unifier.
Definition 2 (Unification problem, Solution, Solved form)
An unification problem is either a set of pairs of terms or a special symbol . A substitution is a solution of a unification problem if is a unifier of and , for every . A unification problem is in solved form if or with for all .
Let denote the set of solutions of . If then . Each unification problem in solved form defines a substitution .
Among the well-known algorithms for finding the most general unifier we encounter the unification by recursive descent , and a rule-based approach for finding the mgu [13, 16]. The latter is presented in Figure 1 and it consists of a set of transformation rules of the form applied over unification problems and .
|Occurs check:||, if|
We recall from  the main properties of the unification algorithm in Figure 1. 333It is not the purpose of this paper to prove these results. The interested reader is referred to  for complete proofs and details. If as a unification problem then:
Progress: If is not in solved form, then there exists such that .
Solution preservation: If then .
Termination: There is no infinite sequence .
Most general unifier: If is a solution for P, then for any maximal sequence of transformations either is or . If there is no solution for then is .
The properties listed in Remark 1 essentially say that the algorithm in Figure 1 produces the most general unifier when it exists. Note that this algorithm does not impose any strategy to apply the rules.
Recall and from Example 2. Consider the unification problem . Using the unification algorithm we obtain:
The obtained unification problem is in solved form; the corresponding substitution is the most general unifier of and .
When it exists, the most general unifier is not unique. By composition with renaming substitutions we can generate an infinite set of mgus. In general, we say that mgus are unique up to a composition with a renaming substitution.
2.2 Matching Logic
Matching Logic [22, 24] started as a logic over a particular case of constrained terms [23, 26, 9, 25, 27, 5, 14], but now it is developed as a solid program logic framework. Here we recall from  the particular definitions and notions of ML that we use in this paper. This subsection is longer than an usual one for preliminaries. Since Matching Logic is a quite recent research contribution including new atypical concepts and results, we decided to present it with more details and examples. This makes the paper self-content.
ML formulas are defined over a many-sorted signature , where is a -indexed set of symbols. The formulas in ML are patterns:
Definition 3 (ML Formula)
A pattern -pattern of
sort is defined by:
where ranges over the variables of sort (), ranges over , and ranges over the set of variables (of any sort).
The derived patterns are defined as expected: ( of sort ), 444Note that is different from the (bold) symbol used in Section 3., , , .
Let be a sort and a signature which includes symbols and . Then, , , , , , are all ML patterns.
When sorts are not relevant or can be inferred from the context we drop the sort subscript ( becomes ).
Definition 4 (ML model)
A ML model -model consist of:
S-sorted sets for each , where is the carrier of sort of M;
a function (note the use of the powerset as the co-domain) for each symbol .
Recall the signature from Example 4. A possible -model includes a set , a constant function which evaluates to the singleton set , and a function which returns a singleton set containing the successor of the given natural number. Here, the interpretation functions have only singleton sets as results. This is not always the case. Let us enrich with a new symbol . We can choose the following interpretation function for the symbol: , such that if is less or equal than , and otherwise.
The meaning of patterns is given by using valuations as in first-order logic, but the result of the interpretation is a set
of elements that the pattern “matches”, similar to the worlds in modal logic.
Definition 5 (M-valuations)
If is a variable valuation and a pattern, then the extension of to patterns is inductively defined as follows:
, where the sort of is ;
, where and have the same sort;
, where and is the valuation s.t. for all , and .
When a functional symbol is a constant (case in Def. 5) we let . Additional constructs can be handled similarly (e.g. ).
An interesting pattern is since it matches over the entire set . Indeed, if we consider any valuation , then .
A particular type of patterns are M-predicates. These are meant to capture the usual meaning of predicates, i.e., patterns that can be either true or false.
Definition 6 (M-predicates)
The pattern is an M-predicate iff for any valuation , is either or . Also, is called a predicate iff it is a -predicate in all models M.
The pattern (from Example 6) is an -predicate because for all we have .
The pattern is also an -predicate because .
Definition 7 (Satisfaction relation, validity)
A model satisfies , written , if for each variable valuation . A pattern is valid (written ) iff for all models M.
Recall the model from Example 5. since, for all we have .
Proposition 1 (Proposition 2.6 in )
Let and be two ML formulas and M a ML model. Then:
iff for all .
iff for all .
Definition 8 (ML specifications)
A matching logic specification is a triple , where contains -patterns. The -patterns in are axiom patterns. We say that is a semantical consequence of , written , iff implies , for each -model .
An important ingredient of ML is the definedness symbol , with the following intuitive meaning: if is matched by some values of sort then is , otherwise it is . This interpretation is enforced by including the axiom pattern in the set of axioms . This symbol and its associated pattern are used to define:
conjunction of patterns with different sorts: for instance, if the symbol , then the pattern is not syntactically correct, because has sort whereas has sort . Using definedness we can now write a syntactically correct formula ;
membership pattern: with , where is another pattern that evaluates to a single value;
equality pattern: .
In ML there is no distinction between function and predicate symbols. However, there is a way to specify that certain symbols are interpreted as functions. These symbols are called functional symbols.
Definition 9 (Functional patterns)
A pattern is functional in a model M iff for any valuation . The pattern is functional in F iff it is functional in all models M such that .
The following technical result was proved in  and establishes the link between equivalence and equality of functional patterns:
Proposition 2 (Proposition 5.9 in )
If , are patterns of sort then:
iff , for any .
iff , for any .
iff , for any model M.
It is worth noting that the Proposition 2 holds only for functional patterns. When functional patterns have the same sort, the proposition below holds:
Proposition 3 (Proposition 5.24 in )
If and are two functional patterns of the same sort then .
Definition 10 (Term patterns)
If is a symbol such that contains the pattern then is a functional symbol. Term patterns are formulas containing only functional symbols.
If are symbols in and and are variables in , then is a term pattern if , , and are semantical consequences of the axioms .
Sometimes we need to use substitution over ML patterns directly. We use to denote the formula obtained by substituting for variable in (we assume and have the same sort):
; when .
, if ; otherwise, a renaming is required.
Our main result use the following technical lemma. For the particular case when the equivalence and the equality are the same it is a consequence of Proposition 5.10 from . We include its proof here as an example of Matching Logic reasoning.
If is a pattern, is a term pattern, and is a variable such that , then .
By induction on , we show: for all and , :
, which (by Proposition 2) holds since ;
which, by Definition 3 is . Here, we use the inductive hypothesis which says that for all , and we obtain ;
using from the inductive hypothesis;
by the inductive hypothesis: , ;
, with and - the inductive hypothesis.
Since for all and , (by Prop. 2) we have . ∎
2.2.2 The proof system of Matching Logic.
Matching Logic provides a proof system that is sound and complete (Figure 3). The notation denotes the pattern obtained from by replacing all free occurrences of with . Note that the propositional calculus reasoning is subsumed by rules 3-3 of the proof system. According to , 3 is in fact a set of rules that includes a version of the implicational propositional calculus (proposed by Łukasievicz ) shown in Fig. 2.
|2.||Modus ponens: and imply|
|3.||, when does not occur free in|
|4.||Universal generalization: implies|
|12.||, with and distinct|
2.2.3 Unification in Matching Logic.
In , unification has a semantical definition. More precisely, it is defined in terms of conjunctions of patterns. In order to explain this better, let us consider two ML patterns: and . Both patterns can be matched by (possibly infinite) sets of elements, say and , given some variable valuation . In this context, finding a unifier is the same as finding a pattern that matches over a set of elements included in both and , that is,