1 Introduction
A data flow framework is an abstract representation of a program, used in program analysis and compiler optimizations [1]. As detection of semantic equivalence of expressions at each point in a program is unsolvable [6], all known algorithms try to detect a weaker, syntactic notion of expression equivalence over the set of all possible expressions, called Herbrand equivalence. Stated informally, Herbrand equivalence treats operators as uninterpreted functions, and two expressions are considered equivalent if they are obtained by applying the same operator on equivalent operands [4, 8, 11, 12].
The pioneering work of Kildall [7], which essentially is an abstract interpretation [3] of terms, showed that at each program point, Herbrand equivalence of expressions that occur in a program could be computed using an iterative refinement algorithm. The algorithm models each iteration as the application of a monotone function over a meet semilattice, and terminates at a fixpoint of the function [5, 6]. Subsequently, several problems in program analysis have been shown to be solvable using iterative fixpoint computation on lattice frameworks. (see [1] for examples). Several algorithms for computation of Herbrand equivalence of program expressions also were proposed in the literature [2, 4, 9, 10, 11, 12].
Although algorithmic computation to detect Herbrand equivalence among expressions that appear in a program proceeds via iterative fixpoint computation on an abstract lattice framework, the classical mathematical definition of Herbrand equivalence uses a meet over all path formulation over the (infinite) set of all possible expressions (see [12, p. 393]). The main difficulty in constructing a fixpoint based definition for Herbrand equivalence of expressions at each program point is that it requires consideration of all program paths and equivalence of all expressions  including expressions not appearing in the program. Consequently, such a characterization of Herbrand equivalence cannot be achieved without resorting to set theoretic machinery.
It may be noted that, while the algorithm presented by Steffen et. al. [12] uses an iterative fixpoint computation method, their definition of Herbrand equivalences was essentially a meet over all paths (MOP) formulation. Though the MOP based definition of Herbrand equivalences given by Steffen et. al. [12] is known since 1990, proving the completeness of iterative fixpoint based algorithms using this definition is nontrivial. For instance, the algorithm proposed by the same authors [11] was proven to be incomplete [4] after several years, though it was initially accepted to be complete. In comparison with an MOP based definition of Herbrand equivalences, a fixpoint characterization will render completeness proofs of iterative fixpoint algorithms for computing the equivalence of program expressions simpler. The completeness proofs would now essentially involve establishing an equivalence preserving continuous homomorphism from the infinite concrete lattice of all Herbrand congruences to the finite abstract lattice of congruences of expressions that are relevant to the program, and proceed via induction.
In this paper, we develop a lattice theoretic fixpoint characterization of Herbrand equivalences at each program point in a data flow framework. We define the notion of a congruence relation on the set of all expressions, and show that the set of all congruences form a complete lattice. Given a data flow framework with program points, we show how to define a continuous composite transfer function over the fold product of the above lattice, such that the maximum fixpoint of the function yields the set of Herbrand equivalence classes at various program points. This characterization is then shown to be equivalent to a meet over all paths formulation of expression equivalence over the same lattice framework.
Section 2 introduces the basic notation. Sections 3 to 7 develop the basic theory of congruences and transfer functions, including nondeterministic assignment functions. Section 8 and Section 9 deal with the application of the formalism of congruences to derive a fixpoint characterization of Herbrand equivalence at each program point. Section 10 describes a meetoverallpaths formulation for expression equivalence and establishes the equivalence between the fixpoint characterization and the meetoverallpaths formulation.
2 Terminology
Let be a countable set of constants and be a countable set of variables. For simplicity, we assume that the set of operators . (More operators can be added without any difficulty). The set of all terms over , is defined by , with and . (Parenthesis is avoided when there is no confusion.) Let be a partition of . Let (or simply when there is no confusion) denote the equivalence class containing the term . If , we write (or simply ). Note that is reflexive, symmetric and transitive. For any , denotes the set of all terms in in which the variable appears and denotes the set of all terms in in which does not appear. In particular, for any , is the set of all terms containing the variable and denotes the set of terms in which does not appear. [Substitution] For , , substitution of with in , denoted by is defined as follows:

If , then .

If ,

If then .
In the rest of the paper, complete proofs for statements that are left unproven in the main text are provided in the appendix. Proofs for some elementary properties of lattices, which are used in the main text, are also given in the appendix, to make the presentation self contained.
3 Congruences of Terms
We define the notion of congruence (of terms). The notion of congruence will be useful later to model equivalence of terms at various program points in a data flow framework. [Congruence of Terms] Let be a partition of . is a Congruence (of terms) if the following conditions hold:

For each , if then . (No two distinct constants are congruent).

For , and if and only if . (Congruences respect operators).

For any , , if then either or . (The only nonconstant terms that are allowed to be congruent to a constant are variables).
The motivation for the definition of congruence is the following. Given the representation of a program in a data flow framework (to be defined later), we will associate a congruence to each program point at each iteration. Each iteration refines the present congruence at each program point. We will see later that this process of refinement leads to a well defined “fix point congruence” at each program point. We will see that this fix point congruence captures Herbrand equivalence at that program point. The set of all congruences over is denoted by . We first note the substitution property of congruences.
Observation (Substitution Property).
Let be a congruence over . Then, for each , if and only if for all and , .
Proof.
One direction is easy. If for all and , , then setting we get . Conversely, suppose . Let and be chosen arbitrarily. To prove , we use induction. If , then . If , then . Otherwise, if , then . ∎
The following observation is a direct consequence of condition (3) of the definition of congruence.
Observation .
Let , and let , with . Then .
We define a binary confluence operation on the set of congruences, .
A confluence operation transforms a pair of congruences into a congruence.
[Confluence]
Let and
be two congruences. For all and , define . The confluence of and is defined by:
.
If and are congruences, then is a congruence.
4 Structure of Congruences
In this section we will define an ordering on the set and then extend it it to a complete lattice. [Refinement of a Congruence] Let , be congruences over . We say (read is a refinement of ) if for each equivalence class , there exists an equivalence class such that . The partition in which each term in belongs to a different class is defined as: . The following observation is a direct consequence of the definition of .
Observation .
is a congruence. Moreover for any , .
A partially ordered set is a meet semilattice if every pair of elements has a greatest lower bound (called the meet of and ). is a meet semilattice with meet operation and bottom element . The following lemma extends the meet operation to arbitrary nonempty collections of congruences. The proof relies on the axiom of choice. Every nonempty subset of has a greatest lower bound. Next, we extend the meet semilattice by artificially adding a top element , so that the greatest lower bound of the empty set is also well defined. The lattice is defined over the set with for each . In particular, is the greatest lower bound of and for every . Hereafter, we will be referring the element as a congruence. It follows from Lemma 4 and the above definition that every subset of has a greatest lower bound. Since a meet semilattice in which every subset has a greatest lower bound is a complete lattice (Theorem B), we have: is a complete lattice. [Infimum] Let be an arbitrary collection of congruences in ( may be empty or may contain ). The infimum of the set , denoted by or , is defined as the greatest lower bound of the set .
5 Transfer Functions
We now define a class of unary operators on called transfer functions. A transfer function specifies how the assignment of a term to a variable transforms the congruence before the assignment into a new one. [Transfer Functions] Let and . (Note that does not appear in ). Let be an arbitrary congruence. The transfer function transforms to a congruence given by the following:

For each , let : .

: , .
It follows from the above definition that . That is, will contain all terms in in which does not appear. See Figure 1 for an example. In the following, we write instead of to avoid cumbersome notation. The following is a direct consequence of Definition 5.
Observation .
For any , if and only if .
To make Definition 5 well founded, we need to establish the following: If is a congruence, then for any , , is a congruence. Next, we extend the definition of transfer functions to . Let and . Let . The extended transfer function transforms to given by the following:

If , .

.
To simplify the notation, we often write (or even simply ) instead of , and refer to extended transfer functions as simply transfer functions, when the underlying assignment operation is clear from the context.
6 Properties of Transfer Functions
In this section we show that transfer functions are continuous over the complete lattice .
Consider the (extended) transfer function , where , . Let and be congruences in , not necessarily distinct. [Distributivity] . Since distributive functions are monotone, we have: [Monotonicity] If , then . We next show that distributivity extends to arbitrary collections of congruences. Let and be complete lattices. A function is continuous if for each , , where and denote the infimum operations in the lattices and respectively. The definition of continuity given above is more stringent than the standard definition in the literature, which requires continuity only for subsets that are chains. Moreover, the definition above exempts the continuity condition to hold for the empty set, because otherwise even constant maps will fail to be continuous. The proof of the next theorem uses the axiom of choice. Let , where , . For arbitrary collections of congruences , The notation denotes the set . [Continuity] For any , .
7 Nondeterministic assignment
Next we define a special kind of transfer functions corresponding to input statements in the program. This kind of transfer functions are called nondeterministic assignments. Let and let . The transfer function transforms to a congruence , given by: for every , if and only if both the following conditions are satisfied:


For every .
Since for every pair of terms the above definition precisely decides whether or not, is the unique relation containing exactly those pairs satisfying both the conditions in Definition 7. The definition asserts that two terms that were equivalent before a nondeterministic assignment, will remain equivalent after the nondeterministic assignment to if and only if the equivalence between the two terms is preserved under all possible substitutions to .
To make the above definition well founded, we need to prove that is a congruence.
If is a congruence, then for any , is a congruence.
We write
to denote the set . The next theorem shows that each nondeterministic assignment
may be expressed as a confluence of (an infinite collection of) transfer function operations.
If is a congruence, then for any ,
.
Next, we extend the definition of nondeterministic assignment transfer functions to the complete lattice .
Let and . The extended transfer function
transforms to given by the following:

If , .

.
The following theorem involves use of the axiom of choice. We will write instead of to simplify notation.
[Continuity]
For any , , where
.
In the following, we derive a characterization for nondeterministic assignment that does not depend on the axiom of choice. Condition (3) of the definition of congruence
(Definition 3) is necessary to derive this characterization. We first note a lemma which states that if the equivalence
between two terms is preserved under substitution of with any two distinct constants chosen arbitrarily,
then the equivalence between the two terms will be preserved under substitution of with any other term in which does not appear.
Let . Let and with . Then
and if and only if
for all .
The above observation leads to a characterization of nondeterministic assignment that does not involve the axiom of choice.
Let and let , . Then, for any ,
It follows from the above theorem that nondeterministic assignments can be characterized in terms of just three
congruences (instead of dealing with infinitely many as in Theorem 7).
8 Data Flow Analysis Frameworks
We next formalize the notion of a data flow framework and apply the formalism developed above to characterize Herbrand equivalence at each point in a program. A control flow graph is a directed graph over the vertex set for some satisfying the following properties:

is called the entry point and has no predecessors.

Every vertex , is reachable from and has at least one predecessor and at most two predecessors.

Vertices with two predecessors are called confluence points.

Vertices with a single predecessor are called (transfer) function points.
A data flow framework over is a pair , where is a control flow graph on the vertex set and is a collection of transfer functions over such that for each function point , there is an associated transfer function , and . Data flow frameworks can be used to represent programs. An example is given in Figure 2. In the sections that follow, for any , we will simply write to actually denote the extended transfer function (see Definition 5 and Definition 7) without further explanation.
9 Herbrand Equivalence
Let be a data flow framework over . In the following, we will define the Herbrand Congruence function .
For each vertex , the congruence will be called
the Herbrand Congruence associated with the vertex of the data flow framework . The function will be defined as the maximum fixpoint of a continuous
function . The function will be called the composite transfer function associated with the data flow
framework .
[Product Lattice]
Let a positive integer. The product lattice,
is defined as follows:
for ,
,
if for each ,
and .
For , the notation will be used to denote the least upper bound of in the product lattice.
By Theorem 4, Theorem B and Corollary B, we have: The product lattice satisfies the following properties:

is a complete lattice.

If is nonempty, with , where for . Then .
As preparation for defining the composite transfer function, we introduce the following functions: [Projection Maps] Let be a positive integer. For each ,

The projection map to the coordinate, is defined by
for any . 
The confluence map is defined by
for any .
In addition to the above functions, we will also use the constant map which maps each element in to . The following observation is a consequence of Theorem B.
Observation .
Constant maps, projection maps and confluence maps are continuous.
For each , denotes the set of predecessors of the vertex in the control flow graph . [Composite Transfer Function] Let be a data flow framework over . For each , define the component map as follows:

If , the entry point, then . ( is the constant function that always returns the value ).

If is a function point with , then , where is the (extended) transfer function corresponding to the function point , and the projection map to the coordinate as defined in Definition 9.

If is a confluence point with , then , where is the confluence map as defined in Definition 9.
The composite transfer function of is defined to be the unique function (Observation B) satisfying for each . The purpose of defining is the following. Suppose we have associated a congruence with each program point in a data flow framework. Then specifies how a simultaneous and synchronous application of all the transfer functions/confluence operations at the respective program points modifies the congruences at each program point. The definition of conservatively sets the confluence at the entry point to , treating terms in to be inequivalent to each other at the entry point. See Figure 2 for an example. The following observation is a direct consequence of the above definition.
Observation .
The composite transfer function (Definition 9) satisfies the following properties:

If , the entry point, then .

If is a function point with , then , where is the (extended) transfer function corresponding to the function point .

If is a confluence point with , then .
The following lemma is a consequence of Observation 9. Let be a data flow framework over and . Let , where is the composite transfer function of .

If , the entry point, then for all , hence .

If is a function point with , then for all ,

If is a confluence point with , then for all ,
By Theorem B, Observation B and Corollary B, we have: The following properties hold for the composite transfer function (Definition 9):

is monotone, distributive and continuous.

The component maps are continuous for all .

has a maximum fixpoint.

If , then is the maximum fixpoint of .
The objective of defining Herbrand Congruence as the maximum fix point of the composite transfer function is possible now. [Herbrand Congruence] Given a data flow framework over , the Herbrand Congruence function is defined as the maximum fix point of the composite transfer function . For each , the value is referred to as the Herbrand Congruence at program point . The following is a consequence of Theorem 9 and the definition of Herbrand Congruence.
Observation .
For each , .
Proof.
∎
The definition of Herbrand congruence must be shown to be consistent with the traditional meetoverallpaths description of Herbrand equivalence of terms in a data flow framework. The next section addresses this issue.
10 MOP characterization
In this section, we give a meet over all paths characterization for the Herbrand Congruence at each program point. This is essentially a lattice theoretic reformulation of the characterization presented by Steffen et. al. [12, p. 393]. In the following, assume that we are given a data flow framework over , with . [Path] For any nonnegative integer , a program path (or simply a path) of length to a vertex is a sequence satisfying , and for each . For each , denotes the initial segment of up to the vertex, given by . Note that the vertices in a path need not be distinct under this definition. The next definition associates a congruence in with each path in . The path function captures the effect of application of transfer functions along the path on the initial congruence , in the order in which the transfer functions appear along the path. [Path Congruence] Let be a path of length to vertex for some . We define:

When , .

If and , where is a function point, then , where is the extended transfer function associated with the function point .

If and is a confluence point, then .

.
The congruence is defined as the path congruence associated with the path . For and , let denote the set of all paths of length less than from the entry point to the vertex . In particular, , for all . The following observation is a consequence of the definition of .
Observation .
If and ,

If is the entry point, then , the set containing only the path of length zero, starting and ending at vertex .

If is a function point with , then
. 
If is a confluence point with , then
.
For , we define the congruence to be the meet of all path congruences associated with paths of length less than from the entry point to vertex in . Stated formally,
Observation .
If , and hence , for all . Further, and , for . In general, if there are no paths of length less than from to in .
We define to be the set of all paths from vertex to vertex in , i.e., and . (The second equality follows from Lemma B and Observation 10.) The congruence is the meet of all path congruences associated with paths in .
Our objective is to prove for each so that captures the meet over all paths information about equivalence of expressions in . We begin with the following observations. For each and

If , the entry point, then .

If is a function point with , then , where is the (extended) transfer function corresponding to the function point .

If is a confluence point with , then .
Proof.
For each and , .
Proof.
Let be chosen arbitrarily. We prove the lemma by induction on .
When , , by Observation 10, as required. Otherwise,

If is a function point with and is the (extended) transfer function corresponding to the function point , then

If is a confluence point with , then
∎
Finally, we show that the iterative fixpoint characterization of Herbrand equivalence and the meet over all paths characterization coincide. Let be a data flow framework. Then, for each ,
Comments
There are no comments yet.