Enhanced Regular Corecursion for Data Streams

07/31/2021 ∙ by Davide Ancona, et al. ∙ 0

We propose a simple calculus for processing data streams (infinite flows of data series), represented by finite sets of equations built on stream operators. Furthermore, functions defining streams are regularly corecursive, that is, cyclic calls are detected, avoiding non-termination as happens with ordinary recursion in the call-by-value evaluation strategy. As we illustrate by several examples, the combination of such two mechanisms provides a good compromise between expressive power and decidability. Notably, we provide an algorithm to check that the stream returned by a function call is represented by a well-formed set of equations which actually admits a unique solution, hence access to an arbitrary element of the returned stream will never diverge.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Applications often deal with data structures which are conceptually infinite, among those data streams (infinite flows of data series) are a mainstream example: as we venture deeper into the Internet of Things (IoT) era, stream processing is becoming increasingly important. Indeed, all main IoT platforms provide embedded and integrated engines for real time analysis of potentially infinite flowing data series; such a process occurs before the data is stored for efficiency and, as often happens in Computer Science, there is a trade-off between the expressive power of the language, the efficiency of its implementation and the decidability of properties important to guarantee reliability and tractability.

Another related important problem is data stream generation, which is essential to test complex distributed IoT systems; the deterministic simulation of sensor data streams through a suitable language offers a practical solution to IoT testing and favors early detection of some kinds of bugs that can be fixed more easily before the deployment of the whole system.

A well-established solution to data stream generation and processing is lazy evaluation, as supported, e.g., in Haskell, and most stream libraries offered by mainstream languages, as java.util.stream. In this approach, conceptually infinite data streams are the result of a function or method call, which is evaluated according to the call-by-need strategy. For instance, in Haskell we can define one_two = 1:2:one_two, or even represent the list of natural numbers as from , where from n = n:from(n+1). However, such a great expressive power comes at a cost; let us consider, for instance, the definition bad_stream = 0:tail bad_stream. The Haskell compiler does not complain about this definition, and no problem arises at runtime as long as the manipulation of bad_stream requires only its first element to be accessed; anyway, any operation which needs to inspect bad_stream at a deeper level is deemed to diverge. Unfortunately, it is not decidable to check, even at runtime, whether the stream returned by a Haskell function is well-defined, that is, all of its elements can be computed111This is what is also known as a productive corecursive definition [5].; indeed, the full expressive power of Haskell can be used to define streams by means of recursive functions. For similar reasons, it is not decidable to check at runtime whether the streams returned by two Haskell functions are equal.

More recently, a complementary approach has been considered in different programming paradigms — functional [10], logic [16, 4, 8], and object-oriented [1] — based on the following two ideas:

  • Infinite streams can be finitely represented by finite sets of equations involving only the stream constructor, e.g., . Such a representation corresponds to what has been called by Courcelle in its seminal paper [6] a regular, a.k.a. rational, tree, that is, a tree with possibly infinite depth but a finite set of subtrees.

  • Functions are regularly corecursive, that is, execution keeps track of pending function calls, so that, when the same call is considered the second time, this is detected, avoiding non-termination as happens with ordinary recursion in the call-by-value evaluation strategy.

In this way, the Haskell stream one_two can be equivalently obtained by the call222Differently from Haskell, for simplicity in our calculus functions are uncurried, hence they take as arguments possibly empty tuples, delimited by parentheses. one_two(), with the function one_two defined by one_two() = 1:2:one_two(). Indeed, with regular corecursion the result of this call is the value corresponding to the unique solution of the equation . On the other hand, since the expressive power is limited to regular streams, it is not possible to define a corecursive function whose call returns the stream of natural numbers, as happens for the from  Haskell example. However, there exist procedures for checking well-defined streams and their equality, even with tractable algorithms.

In this paper, we propose a simple calculus of numeric streams which supports regular corecursion and goes beyond regular streams by extending equations with other typical stream operators besides the stream constructor: tail and pointwise operators can be contained in stream equations and are therefore not evaluated.

In this way, we are able to achieve a good compromise between expressive power and decidability. Notably:

  • the extended shape of equations allows the definition of functions which return non-regular streams; for instance, it is possible to obtain the stream of natural numbers as from(0), by defining from(n)=n:(from(n)[+]repeat(1)), with [+] the pointwise addition on numeric streams and repeat the function defined by repeat(n)=n:repeat(n);

  • there exists a decidable procedure to dynamically check whether the stream returned by a corecursive function is well-defined;

  • however, it is not possible to express all streams computable with the lazy evaluation approach, but only those which have a specific structure (that is, can be expressed as the unique solution of a set of equations built with the above mentioned operators).

In Sect. 2 we formally define the calculus, in Sect. 3 we show many interesting examples, and in Sect. 4 we provide an operational characterization of well-defined streams, proved to be a sufficient and necessary condition for an access to an arbitrary index to never diverge. In Sect. 5 we discuss related and further work. The Appendix contains more examples of derivations.

2 Stream calculus

Fig. 1 shows the syntax of the calculus.

Figure 1: Stream calculus: syntax

A program is a sequence of (mutually recursive) function declarations, for simplicity assumed to only return streams. Stream expressions are variables, conditional expressions, expressions built by stream operators, and function calls. We consider the following stream operators: constructor (prepending a numeric element), tail, and pointwise arithmetic operations. Numeric expressions include the access to the -th333For simplicity, here indexing and numeric expressions coincide, even though indexes are expected to be natural numbers, while values in streams can range over a larger numeric domain. element of a stream. We use to denote a sequence of function declarations, and analogously for other sequences.

The operational semantics, given in Fig. 2, is based on two key ideas:

  1. (some) infinite streams are represented in a finite way

  2. evaluation keeps trace of already considered function calls


 

Figure 2: Stream calculus: operational semantics

To obtain (1), our approach is inspired by capsules [11], which are essentially expressions supporting cyclic references. That is, the result of the evaluation of a stream expression is a pair , where s is an (open) stream value, built on top of stream variables, numeric values, the stream constructor, the tail destructor and the pointwise arithmetic operators, and is an environment mapping a finite set of variables into stream values. In this way, cyclic streams can be obtained: for instance, denotes the stream constantly equal to n.

We denote by the set of variables occurring in , by the set of its free variables, that is, , and say that is closed if , open otherwise, and analogously for a result .

To obtain point (2) above, evaluation has an additional parameter which is a call trace, a map from function calls where arguments are values (dubbed calls for short in the following) into variables.

Altogether, the semantic judgment has shape , where e is the expression to be evaluated, the current environment defining possibly cyclic stream values that can occur in e, the call trace, and the result. The semantic judgments should be indexed by an underlying (fixed) program, omitted for sake of simplicity. Rules use the following auxiliary definitions:

  • is the union of two environments, which is well-defined if they have disjoint domains; is the environment which gives s on x, coincides with elsewhere; we use analogous notations for call traces.

  • is obtained by parallel substitution of variables with values .

  • returns the pair of the parameters and the body of the declaration of f, if any, in the assumed program.

Moreover, the rules are parametric in the following other judgments, for which different definitions will be discussed in Sect. 4:

  • , that is, by adding the association to the (well-defined) environment , we still get a well-defined environment.

  • , that is, the two values are equivalent in the given environment444This equivalence is assumed to be the identity on numeric and boolean values.. Then, is the extension of up to equivalence in : iff there exist such that and for all .

Intuitively, a closed result is well-defined if it denotes a unique stream (infinite sequence of numeric values), and a closed environment is well-defined if, for each , is well-defined. In other words, the corresponding set of equations admits a unique solution. For instance, the environment is not well-defined, since it is undetermined (any stream satisfies the equation ); the environment is not well-defined as well, since it is undefined (the two equations admit no solutions for ). Finally, two stream values s and such that the results and are closed and well-defined are equivalent if they denote the same stream.

These notions can be generalized to open results and environments, assuming that free variables denote unique streams, as will be formalized in Sect. 4.

Rules for values and conditional are straightforward. In rules (cons), (tail) and (pw), arguments are evaluated, while the stream operator is applied without any further evaluation; the fact that the tail and pointwise operators are treated as the stream constructor is crucial to get results which denote non-regular streams as shown in Sect. 3. However, when non-constructors are allowed to occur in values, ensuring well-defined results become more challenging, because the usual simple syntactic constraints that can be safely used for constructors [5] no longer work (see more details in Sect. 4 and 5).

The rules for function call are based on a mechanism of cycle detection, similar to that in [1]. They are given in a modular way. That is, evaluation of arguments is handled by a separate rule (args), whereas the following two rules handle (evaluated) calls.

Rule (invk) is applied when a call is considered for the first time, as expressed by the first side condition. The body is retrieved by using the auxiliary function fbody, and evaluated in a call trace where the call has been mapped into a fresh variable. Then, it is checked that adding the association from such variable to the result of the evaluation of the body keeps the environment well-defined. If the check succeeds, then the final result consists of the variable associated with the call and the updated environment. For simplicity, here execution is stuck if the check fails; an implementation should raise a runtime error instead.

Rule (corec) is applied when a call is considered for the second time, as expressed by the first side condition (note that cycle detection takes place up to equivalence in the environment). The variable x is returned as result. However, there is no associated value in the environment yet; in other words, the result is open at this point. This means that x is undefined until the environment is updated with the corresponding value in rule (invk). However, x can be safely used as long as the evaluation does not require x to be inspected; for instance, x can be safely passed as an argument to a function call.

For instance, if we consider the program f()=g()  g()=1:f(), then the judgment , with , is derivable; however, while the final result is closed, the derivation contains also judgments with open results, as happens for and . For the full derivation, see Fig. 5 in the appendix.

As another example, if we consider the program f()=g(2:f())  g(s)=1:s, then the derivation of the judgment with is built on top of the derivation of , corresponding to the evaluation of g(2:x) where x is an operand of the stream constructor whose result is passed as argument to the call to g, despite x is not defined yet. For the full derivation, see Fig. 6 in the appendix.

Finally, rule (at) computes the -th element of a stream expression. After evaluation of the arguments, the numeric result is obtained by the auxiliary judgment , inductively defined in the bottom part of the figure. If the stream value is a variable ((at-var)), then the evaluation is propagated to the associated stream value in the environment, if any. If, instead, the variable is free in the environment, then execution is stuck; again, an implementation should raise a runtime error instead. Fig. 7 in the appendix shows an example of stuck derivation. If the stream value is built by the constructor, then the result is the first element of the stream if the index is ((at-cons-0)); otherwise, the evaluation is recursively propagated to its tail with the predecessor index ((at-cons-n)). Conversely, if the stream is built by the tail operator ((at-tail)), then the evaluation is recursively propagated to the stream argument with the successor index. Finally, if the stream is built by a pointwise operation ((at-pw)), then the evaluation is recursively propagated to the operands with the same index and then the corresponding arithmetic operation is computed on the results.

3 Examples

First we show some simple examples, to explain how regular corecursion works. Then we provide some more significant examples.

Consider the following function declarations:

repeat(n) = n:repeat(n)
one_two() = 1:two_one()
two_one() = 2:one_two()

With the standard semantics of recursion, the calls, e.g., repeat(0) and one_two() lead to non-termination. Thanks to regular corecursion, instead, these calls terminate, producing as result , and , respectively. Indeed, when initially invoked, the call repeat(0) is added in the call trace with an associated fresh variable, say x. In this way, when evaluating the body of the function, the recursive call is detected as cyclic, the variable x is returned as its result, and, finally, the stream value is associated in the environment with the result x of the initial call. The evaluation of one_two() is analogous, except that another fresh variable y is generated for the intermediate call two_one(). The formal derivations are given below.

For space reasons, we did not report the application of rule (value). In both derivations, note that rule (corec) is applied, without evaluating the body once more, when the cyclic call is detected.

The following examples show function definitions whose calls return non-regular streams, notably, the natural numbers, the natural numbers raised to the power of a number, the factorials, the powers of a number, the Fibonacci numbers, and the stream obtained by pointwise increment by one.

nat() = 0:(nat()[+]repeat(1))
nat_to_pow(n) =                  //nat_to_pow(n)(i)=i^n
  if n <= 0 then repeat(1) else nat_to_pow(n-1)[*]nat()
fact() = 1:((nat()[+]repeat(1))[*]fact())
pow(n) = 1:(repeat(n)[*]pow(n)) //pow(n)(i)=n^i
fib() = 0:1:(fib()[+]fib()^)
incr(s) = s[+]repeat(1)

The definition of nat uses regular corecursion, since the recursive call nat() is cyclic. Hence the call nat() returns . The definition of nat_to_pow is a standard inductive one where the argument strictly decreases in the recursive call. Hence, the call, e.g., nat_to_pow(2), returns

The definitions of fact, pow, and fib are regularly corecursive. For instance, the call fact() returns . The definition of incr is non-recursive, hence always converges, and the call incr(s) returns . The following alternative definition

incr_reg(s) = (s(0)+1):incr_reg(s^)

relies, instead, on regular corecursion. Note the difference: the latter version ensures termination only for regular streams, as in incr_reg(one_two()), since, eventually, in the recursive call, the expression s^ turns out to denote the initial stream; however, the computation does not terminate for non-regular streams, as in incr_reg(nat()), which, however, converges with incr.

The following function computes the stream of partial sums of the first elements of a stream , that is, sum()():

sum(s) = s(0):(s^[+]sum(s))

Such a function is useful for computing streams whose elements approximate a series with increasing precision; for instance, the following function returns the stream of partial sums of the first elements of the Taylor series of the exponential function:

sum_expn(n) = sum(pow(n)[/]fact())

Function sum_expn calls sum with the argument pow(n)[/]fact() corresponding to the stream of all terms of the Taylor series of the exponential function; hence, by accessing the -th element of the stream, we have the following approximation of the series:

sum_expn(n)()

Lastly, we present a couple of examples showing how it is possible to define primitive operations provided by IoT platforms for real time analysis of data streams; we start with aggr(n,s), which allows aggregation (by addition) of contiguous data in the stream s w.r.t. a frame of length n:

aggr(n,s) = if n<=0 then repeat(0) else s[+]aggr(n-1,s^)

For instance, aggr(3,s) returns the stream s.t. . On top of aggr, we can easily define avg(n,s) to compute the stream of average values of s in the frame of length n:

avg(n,s) = aggr(n,s)[/]repeat(n)

4 Well-defined environments and equivalent streams

In the semantic rules, we have left unspecified two notions: well-defined environments, and equivalent streams. We provide now a formal definition in abstract terms. Then, we provide an operational definition of well-defined environments.

Semantically, a stream is an infinite sequence of numeric values, that is, a function which returns, for each index , the -th element . Given a result , we get a stream by instantiating variables in s with streams, in a way consistent with , and evaluating operators. To make this formal, we need some preliminary definitions.

A substitution is a function from a finite set of variables to streams. We denote by the stream obtained by applying to s, and evaluating operators, as formally defined below.




Given an environment and a substitution with domain , the substitution is defined by:

Then, a solution of is a substitution with domain such that .

A closed environment is well-defined if it has exactly one solution, denoted . For instance, and are well-defined, since their unique solutions map x to the infinite stream of ones, and y to the stream of natural numbers, respectively. Instead, for there are no solutions. Lastly, an environment can be undetermined: for instance, a substitution mapping x into an arbitrary stream is a solution of .

An open environment is well-defined if, for each with domain , it has exactly one solution such that . For instance, the open environment is well-defined.

Given a closed result , with well-defined, we define its semantics by for . Then, two stream values s and are semantically equivalent in if .

We now consider the non-trivial problem of ensuring that a closed environment is well-defined; if environments would be allowed to contain only the stream constructor, then it would suffice to require all non-free variables to be guarded by the stream constructor [5]. For instance, the environment satisfies such a syntactic condition, and is well-defined, while in the non well-defined environment variable x is not guarded by the constructor.

However, when non constructors as the tail and pointwise operators come into play, the fact that variables are guarded by the stream constructor no longer ensures that the environment is well-defined; let us consider for instance corresponding to the definition of bad_stream shown in Sect. 1: is not well-defined since it admits infinite solutions (all streams starting with 0), although variable x is guarded by the stream constructor.

To ensure well-defined environments a more complex check is needed: in Fig. 3 we provide an operational characterization of well-defined environments.


 

Figure 3: Operational definition of well-defined environments

The judgment used in the side condition of rule (invk) holds if holds. The judgment means that a result is well-defined. That is, restricting the domain of to the variables reachable from s (that is, either occurring in s, or, transitively, in values associated with reachable variables) we get a well-defined environment; thus, holds if adding the association of s with x preserves well-definedness of .

The additional argument in the judgment is a map from variables to natural numbers. We write and for the maps , and , respectively.

In rule (main), this map is initially empty. In rule (wf-var), a variable x defined in the environment is added in the map, with initial value , the first time it is found. In rule (wf-corec), when it is found the second time, it is checked that more constructors than tail operators have been traversed. In rule (wf-fv), a free variable is considered well-defined.555Indeed, non-well-definedness can only be detected on closed results. In rules (wf-cons), (wf-tail), and (wf-pw), the value associated with a variable is incremented/decremented by one each time a constructor and tail operator are traversed, respectively.

As an example of derivation of well-definedness and access to the -th element, in Fig. 4 we consider the result , obtained by evaluating the call nat() with nat defined as in Sect. 3.


Figure 4: Derivations for .

In Fig. 8 in the Appendix we consider a trickier example, that is, the result . Its semantics is the stream .

We show now that well-definedness of a result is a necessary and sufficient condition for termination of access to an arbitrary index. To formally express and prove this statement, we introduce some definitions and notations.

First of all, since the numeric value obtained as result is not relevant for the following technical treatment, for simplicity we will write rather than . We call derivation an either finite or infinite proof tree.

We write to mean that is a premise of a (meta-)rule where is the consequence, and for the reflexive and transitive closure of this relation. Moreover, , with , means that in the path there can be nodes of shape only for and non-repeated. We use analogous notations for the judgment .

Lemma 1
  1. If , then , for each .

  2. A judgment has no derivation iff the following condition holds:
    (wf-stuck) for some , , and s.t. .

  3. The derivation of is infinite iff the following condition holds:
    (at-) for some , , and .

Proof
  1. Immediate by induction on the rules.

  2. For each there is exactly one applicable rule, unless in the case with . Since is a finite set, the derivation cannot be infinite. Hence, there is no derivation for iff there is a finite path from of judgments on variables in , and a (first) repeated variable, that is, of the shape below, where and .

    That is, condition (wf-stuck) holds, with , and .

  3. For each there is exactly one applicable rule, unless in the case with . Moreover, since has finite domain, the derivation is infinite iff there is an infinite path from of judgments on variables in , and a (first) repeated variable with a greater or equal index, hence, thanks to Lemma 1-(1), of the shape below, where :

    That is, condition (at-) holds, with , and .

Lemma 2

For , the following conditions are equivalent:

  1. for some

  2. for some such that .

Proof
1

The proof is by induction on the length of the path in .

Base

The length of the path is , hence we have . We also have , and , as requested.

Inductive step

By cases on the rule applied to derive .

(at-var)

We have , with since the length of the path is , and . Moreover, we can derive by rule (wf-var), and by inductive hypothesis we also have , and , hence we get the thesis.

(at-cons-0)

Empty case, since the derivation for does not contain a node .

(at-cons)

We have , and . Moreover, we can derive by rule (wf-cons), and by inductive hypothesis we also have , with , hence we get the thesis.

(at-tail)

This case is symmetric to the previous one.

(at-pw)

We have , and either , or . Assume the first case holds, the other is analogous. Moreover, we can derive by rule (wf-pw), and by inductive hypothesis we also have , with , hence we get the thesis.

2 1

The proof is by induction on the length of the path in .

Base

The length of the path is , hence we have . We also have, for an arbitrary , , and , as requested.

Inductive step

By cases on the rule applied to derive .

(wf-var)

We have , with , since , and . By inductive hypothesis we have for some such that . Moreover, since , by rule (at-var), hence we get .

(wf-corec)

Empty case, since the derivation for would not contain a node .

(wf-fv)

Empty case, since the derivation for would not contain a node .

(wf-cons)

We have , and . By inductive hypothesis we have for some such that . Moreover, by rule (at-cons-n), hence we get with , as requested.

(wf-tail)

We have , and . By inductive hypothesis we have for some such that . We can assume thanks to Lemma 1-(1). Hence, by rule (at-tail), hence we get with , as requested.

(wf-pw)

We have , and either , or . Assume the first case holds, the other is analogous. By inductive hypothesis we have , for some such that . Moreover, we can derive by rule (at-pw), hence we get the thesis.

Lemma 3

For , the following conditions are equivalent:

  1. for some

  2. for some such that .

Proof

Easy variant of the proof of Lemma 2.

Theorem 4.1

is derivable iff, for all , either has no derivation or a finite derivation.

Proof

We prove that has an infinite derivation for some iff has no derivation.

By Lemma 1-(3),we have that the following condition holds:

(at-)
for some , , and .

Then, starting from the right, by Lemma 3 we have for some such that ; by rule (wf-var) we have , and finally by Lemma 2 we have:
(wf-stuck) for some , , and s.t. .
hence we get the thesis.

By Lemma 1-(2),we have that the condition (wf-stuck) above holds. Then, starting from the left, by Lemma 2 we have for some such that ; by rule (at-var) we have , and by Lemma 3 we have for some . If , , then by Lemma 1-(1) we have

If , , then by Lemma 1-(1) we have

.

In both cases, the derivation of