A Euclidean Algorithm for Binary Cycles with Minimal Variance

04/04/2018
by   Luca Ghezzi, et al.
University of Bologna
ABB
0

The problem is considered of arranging symbols around a cycle, in such a way that distances between different instances of a same symbol be as uniformly distributed as possible. A sequence of moments is defined for cycles, similarly to the well-known praxis in statistics and including mean and variance. Mean is seen to be invariant under permutations of the cycle. In the case of a binary alphabet of symbols, a fast, constructive, sequencing algorithm is introduced, strongly resembling the celebrated Euclidean method for greatest common divisor computation, and the cycle returned is characterized in terms of symbol distances. A minimal variance condition is proved, and the proposed Euclidean algorithm is proved to satisfy it, thus being optimal. Applications to productive systems and information processing are briefly discussed.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

03/31/2020

Comparison of Latitude Distribution and Evolution of Even and Odd Sunspot Cycles

We study the latitudinal distribution and evolution of sunspot areas fro...
06/28/2019

Extending de Bruijn sequences to larger alphabets

A circular de Bruijn sequence of order n in an alphabet of k symbols is ...
08/22/2020

On Cycles of Generalized Collatz Sequences

We explore the cycles and convergence of Generalized Collatz Sequence, w...
12/03/2019

On the central levels problem

The central levels problem asserts that the subgraph of the (2m+1)-dimen...
06/25/2022

Covering b-Symbol Metric Codes and the Generalized Singleton Bound

Symbol-pair codes were proposed for the application in high density stor...
02/03/2022

A New Approach to Determine the Minimal Polynomials of Binary Modified de Bruijn Sequences

A binary modified de Bruijn sequence is an infinite and periodic binary ...
12/04/2021

On Complexity of Computing Bottleneck and Lexicographic Optimal Cycles in a Homology Class

Homology features of spaces which appear in applications, for instance 3...

1. Introduction

A cycle, or a cyclic order [1, 2]

, is a very intuitive structure describing the way some given collection of objects may be orderly arranged around a circle. Said objects can be represented by a suitable alphabet of symbols, describing the prototypes of classes of similar objects. The number of instances for each symbol, or the multiplicity vector, denotes the number of elements in each class. Since two paths are always available to connect two points around a circle, an order relation is uniquely determined up to specifying the positive direction around the cycle. After this done, for each symbol instance in the cycle it is possible to define what other symbol instance is following next. Moreover, it is also possible to count the distance, in steps, from each symbol instance to the next instance of the same symbol. This induces naturally the mean and variance of the cycle. One may now seek for cycles which are variance minimizers.

The problem, which we generally term Cyclic Sequencing Problem

, pertains to constrained combinatorial optimization and may be addressed in the general case by means of Mixed Integer Quadratic Programming (MIQP) techniques

[3], with a more than algebraically growing computational burden. Nonetheless, the case of a binary alphabet, that is, when only two symbols are concerned, though with arbitrary multiplicities, allows for a direct, constructive, algorithmic solution, with linear overhead. Specifically, a sequencing algorithm is here proposed, whose structure essentially coincides with the celebrated Euclidean method to compute the greatest common divisor (gcd) of two natural numbers, and which is additionally enriched of instructions to compile an admissible cycle. Such cycle may be completely characterized in terms of symbol distances. A famous analysis from Lamé [4] proves that the Euclidean algorithm requires at most a number of steps which is five times the number of base-10 digits of the smaller number in the couple whose gcd is sought for [5, 6]. Consequently, also the proposed Euclidean Sequencing Algorithm is in the worst case.

The analysis of the involved mathematical programs allows deducing necessary and sufficient conditions for variance minimality. It is seen that the proposed Euclidean sequencing algorithm satisfies these conditions and it is therefore optimal. The reason behind the successfulness of the algorithm is readily found in the strong penalization operated by squares (or, more generally, by higher than linear powers) over deviations of distances from their average value. Additionally, a remarkably simple argument, ultimately connected to the invariance of the number of steps required to complete a round trip around the cycle, shows that the mean is invariant to cycle permutations. This two facts, mean invariance and square penalization, easily allow to prove the theory here developed.

Euclid’s original algorithm was introduced around 300 B.C. in the celebrated Elements [7], Book VII, Proposition 2, to find the greatest common measure of two given numbers not relatively prime (according to the geometrically inspired terminology of the times). Also due to the geometrical identification of numbers and segment lengths, the exact presentation of the algorithm resorts on repeated subtractions in place of divisions and it thus formally differs from, but is equivalent to the vest currently adopted.

Some few applications to even and cyclically repeated distribution of symbols are found in the literature. Toussaint [8] connects the problem with music and shows that traditional musical rhythms are generated by the Euclidean algorithm (with a binary alphabet, if expressed in the terminology of the present work) and are therefore dubbed Euclidean rhythms. The approach is algorithmic and neither mathematical formalism to express an evenness metric, nor proofs of optimality are there found.

The same problem has been analyzed by Bjorklund [9, 10] in connection with spallation neutral source (SNS) accelerators as used in nuclear physics. Translated in the terminology of the present work, the problem is there to evenly sequence symbols from a binary alphabet, where one symbol is interpreted as 1 and the other as 0. Despite the natural approach with Boolean symbols, the nullity of 0 seems to have induced the Author to choose to only consider the 1’s, neglecting the 0’s when defining suitable evenness metrics. On the other hand, variance as a metric for evenness, as defined in the present work, is computed with reference to all symbols in the alphabet, as opposed to Bjorklund’s approach, which leads to an unsatisfactory metric. It is easily seen that the former approach (but not the latter, as correctly pointed out in [9]) manages to fulfill Bjorklund’s very natural requirements for a good evenness metrics (viz., invariance under rotation, efficient computation, null value for perfectly even distributions), with the additional advantage of simplicity; see section §7 for examples.

A mathematical analysis of the problem at hand is found in Demaine et Al. [11], with remarkable theoretical results applying to the product of the Euclidean sequencing algorithm. Also in this case, evenness metrics are computed with reference to one sole symbol 1 in a binary alphabet. We argue that this be a bias induced by the underlying applicative contexts, where 1 stands for a physical event (a pulse, or a musical note), while 0 stands for nothingness, just waiting for an event to happen. Contrarily, the inspiring applicative problem for the present work resides in industrial manufacturing systems, where symbols stand for different product types, that is, real objects, which cannot be associated with nothingness. Moreover, the formal symmetry in considering all symbols in the alphabet, when defining metrics, is also for mathematical beauty, as well as extending immediately to more than binary alphabets. Demaine et Al. approach the problem of evenness in terms of maximization of a metric constituted by the sum of chordal distances between 1 symbols.

Relationships with so-called Euclidean strings (not relevant to the present work) are also discussed in the references above; see, e.g., Ellis et Al. [12].

1.1. Contributions of this paper

In this paper, we introduce a new problem, the Cyclic Sequencing Problem (CSP), motivated by a real application from industrial manufacturing systems. Our distinct contributions in this paper are as follows:

  • We propose a novel mathematical programming formulation for the CSP and a relaxation that is used to derive valid lower bounds.

  • For the special case of binary cycles, we propose an algorithm, the Euclidean Sequencing Algorithm (ESA), that is similar to the algorithm proposed by Demaine et al. [11]. In contrast, the present analysis approaches the problem of evenness in terms of minimization of a suitably defined variance in the distribution of all symbols in the alphabet. One may immediately appreciate the difference between the former approach, which is metrical and, as such, set in Euclidean spaces (as a special case of Hilbert spaces), and the latter approach, which is essentially combinatorial. In other terms, the metrical properties of a circle, including chordal distances, are not essential to the present analysis. Since the Euclidean algorithm in the case of a binary alphabet returns a solution which is optimal for the metrics defined in [11] as well as the one here defined, a further connection between all of them is thus encountered.

  • Also for the special case of binary cycles, we prove a minimal variance condition and we show that the proposed ESA satisfies it, thus being optimal.

The outline of the paper is as follows. After introducing the standard notation adopted, including the concept of a cycle, in Section 2 (raw) moments are defined for cycles, along with the problem of sequencing around a cycle a set of symbols from a given alphabet and with prescribed multiplicities. The mean is then introduced as the first moment, and its invariance under permutation is proved, in Section 3. Central moments, and particularly variance, are defined in Section 4, where the variance minimization problem is also introduced and shown to be equivalent to the minimization of the second (raw) moment. The Euclidean sequencing Algorithm (ESA) is introduced and exemplified in Section 5, where the cycle returned is completely characterized in terms of the distances for the more abundant and the less abundant symbol in the binary alphabet. The optimality condition is introduced and proved in section 6, together with the crucial result that the ESA satisfies said condition and is optimal for the problem at hand. Finally, applications are discussed in section 7.

2. Notation and definitions

Following the standard notation, is the set of natural numbers (0 is excluded), , is the ring of integers, is the field of rationals, and is the half-line of non negative reals. Sets of symbols (order does not matter, repetitions are not accounted for) are represented by curly brackets (e.g., ), while sequences (order matters, repetitions are accounted for) are represented as vectors, i.e., by square brackets (e.g., ). Let denote the symmetric group on a finite set of symbols, i.e., the set of all permutations of distinct symbols, forming a (generally non Abelian) group with reference to function composition . For , the classic algebraic notations stands for divides , i.e., , and stands for does not divide , i.e., .

Let be an alphabet of distinct symbols and a vector of positive integers representing prescribed multiplicites of said symbols, in such a way that

(1)

represents the repetitions of symbol . The couple shall be termed a cyclic sequencing problem. Let be the total number of all symbols in the cyclic sequencing problem, accounting for repetitions. It follows from the definitions that all symbols in the alphabet are used at least once: the case of some null multiplicity may always be reduced to a smaller alphabet comprising only those symbols with positive multiplicity.

Let be the quotient group with reference to addition modulo , i.e., the set of cosets , for . Clearly, is isomorphic to the group , which is the same as . Let be a mapping from the group of rest classes modulo to the alphabet, with for brevity. Any such mapping is termed a cyclic order, i.e., informally, a way of sequencing a set of symbols around a circle. A set with a cyclic order is termed a cycle and, with a little abuse, we shall treat the latter two terms as synonyms. Cycles are characterized by a cyclic, asymmetric, transitive, total ternary order relation , indicating that lies after and before . Among the two possible verses, we shall conventionally refer to the one induced by increasing in , with no loss of generality. In the very interesting case, we shall refer to a binary alphabet and binary cycles.

Let be the counter image of the th symbol in the alphabet. Let be the set of admissible cycles, i.e., those agreeing with all prescribed multiplicities. Said set is not empty, for it contains at least the base cycle

(2)

where stands for concatenation. The adjective base is because any other cycle may be expressed as the action on of some permutation, i.e., for some , according to the following diagram, which is commutative by construction.

Notice that, in (2), we have used the same symbol for both the application and the result produced by the application, with a little abuse. Owing symbol multiplicity, the number of distinct admissible cycles does not exceed and, precisely, . The tally may be further lowered if the theory has to developed up to the addition of some rest class (i.e., modulo the symmetry induced by rotations of the circle) and/or up to the verse (i.e., modulo the symmetry induced by the two possible orderings over the circle).

Definition 1.

Let a step be a unit move around the cycle, that is, from to for some . Let be the distance, in steps, from the th entry in the cycle (i.e., , for some ) to the next instance in the cycle of the same symbol . In case , i.e., if some symbol appears only once, the definition applies with reference to a full round trip from and to the unique instance, with .

In analogy with other well-known contexts, like the theory of probability density functions in statistics, let us introduce the following definitions.

Definition 2 (Sub-moments of a cycle).

For a given cycle and , let

(3)

be its th sub-moment.

Definition 3 (Raw moment of a cycle).

For a given cycle , let

(4)

be its th (raw) moment.

Remark 1.

The additive decomposition in terms of the th sub-moments follows from linearity of summation. Despite such decomposition being straightforward, it will prove remarkably helpful developing the following results.

3. Mean of cycle and the round trip lemma

Definition 4 (Mean of a cycle).

For a given cycle , let

(5)

be its mean.

The first moment is termed mean of the cycle because it measures the mean distance in steps between symbols of the same kind. Contrarily to other contexts, the mean is an invariant for the cyclic sequencing problem, meaning that it only depends on the (cardinality of the) alphabet , but not on how symbols are sequenced. Moreover, the mean is immediately known a priori and, maybe surprisingly, it is always integer, despite its definition includes a ratio. The reason for this as simple as stunning result lies in the following

Lemma 1 (Round trip lemma).

Let be a cyclic sequencing problem. Then

(6)

Moreover, for any admissible cycle the mean is invariant under symbol permutations, integer valued and equal to the number of distinct symbols, that is,

(7)
Proof.

Result (6) follows immediately because an exactly complete round trip over the cycle is obtained by stepping from one symbol instance to the next instance of the same symbol, and then forward to the next instance and so on, visiting exactly once any instance of the same symbol until the starting instance is reached again, with exactly steps.

Next, let be a generic admissible cycle, with total entries. Let be the distance in steps from the th entry to the next occurrence of the same symbol. After sub-moment decomposition and applying (6) to any symbol , one gets

It follows that one may write instead of for , or simply whenever clear from the context. Moreover, consistently with Lemma 1, we shall use the value for .

Example 1.

Let , meaning 8 instances of symbol and 4 instances of symbol , for a total items. Considering, e.g., the three admissible cycles

where the distances have been reported on top of symbols, one may easily verify that , or also , so that in all cases.

4. Variance and the minimality problem

It is now natural to introduce the following definitions.

Definition 5 (Central moment of a cycle).

For a given cycle , let

(8)

be its central moment of order .

Definition 6 (Variance of a cycle).

For a given cycle , let

(9)

be its variance.

The second central moment is termed variance because it measures the dispersion around the mean of the distribution of the distances, in steps, between instances of the same symbols. Following a standard argument, central moments may be connected to (raw) moments of the kind (4), for which decomposition in sub-moments holds. In the case of variance, the following simple result holds.

Lemma 2.

For any cycle ,

(10)
Proof.

Squaring the r.h.s. in (9),

from which the result follows thanks to the definition (5) of mean, the latter being equal to for round trip Lemma 1. ∎

We now appreciate a fundamental consequence of the round trip Lemma 1: due to the invariance of , variance and the second moment only differ by an additive constant, pointless in optimization problems. This implies the following

Corollary 1.

Let be a cyclic sequencing problem. The search space being , minimization of variance is equivalent to minimization of the second moment, or

(11)

subject to any common (and possibly empty) set of constraints applied to both problems.

We are interested in cycles of minimal variance, that is, to solve

(12)

Problem (12) can be explicitly stated as a mixed integer quadratic program (MIQP) as follows. Let us first consider the ordered sequence , where stands for , for brevity. The base cycle maps to (2). Given the repetitions of symbols of according to multiplicities , it is convenient to partition so to put in evidence those indexes mapped by to a same , . To this end, let , , so that . Then, let and . It is apparent that and mark, respectively, the initial and final position in of all indexes mapped by to symbol . Consequently, , and . Clearly, by construction.

Let , for suitable , be the permuted image of through . Without loss of generality, to reduce the number of equivalent permutations up to indistinguishable reshuffles of instances of a same symbol, we assume that (i.e., is chosen as origin in the cycle) and that for any pair in a same , precedes in . Let , ,

, be a (0-1) binary variable equal to 1 if

immediately follows in , 0 otherwise. In addition, let , , be an arbitrary real number, such that if item is sequenced th after item 1 in , then , with . The term suggests the usual angular coordinate to span the circle.

The mathematical formulation of (12) is as follows:

(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)

The plurality of nonnegative variables in program (13)-(22) represent the distance in between item and the next item mapped to the same symbol and therefore correspond exactly with the distance as in Definition 1, with . The objective function (13) states to minimize the second moment (corresponding to total variance minimization, for Corollary 1). Constraints (14),(15) together with constraints (16) impose a cycle in solution. Constraints (16) also define the values of variables . Constraints (18),(19) define the values of distance variables , .

Remark 2.

Formulating the problem in the domain of , i.e., in , instead of in the image of , i.e., in , brings the benefit that all the involved summations and references are in terms of index sets known a priori than solving the problem.

5. The Euclidean Sequencing Algorithm (ESA)

Let us consider the case of binary cycles of minimal variance. Despite the presence of integrality constraints, this special case is surprisingly simple, to the extent that it may be directly solved algorithmically, according to a procedure resembling the celebrated one for the gcd of two integers and attributed to Euclid. Therefore the algorithm, which holds for a cyclic sequencing problem with , is termed the Euclidean Sequencing Algorithm (ESA). Before presenting the computational machinery, we discuss the underlying rationale by means of examples. The proof for minimal variance follows in the next section. For the sake of brevity, we shall use the formal product formalism and introduce formal powers of the kind . Clearly, this formal product is not commutative.

Example 2.

The cyclic sequencing problem , with and , admits only the cycle , which is trivially of minimal variance.

Example 3.

Let us start with an intuitive case. Let , meaning 8 instances of symbol and 4 instances of symbol , for a total items. Here and in the sequel, let (possibly with a subscript) denote the symbol with the greatest number of repetitions (here, ) and (possibly with a subscript) the other (here ). Intuitively, minimal variance means the regular repetition of a fixed scheme. Since is a perfect divisor of and , this is achieved by, e.g., , or by other cycles obtained by applying a rotation, like, e.g., .

Figure 1. The ESA applied to .
Example 4.

We now face a case where perfect divisibility is not met; see Table 1. Let , meaning 18 instances of symbol (denoted by ) and 14 instances of symbol (denoted by ), for a total items. We notice that , where is the quotient of the integer division and is the rest of that division (it is an elementary result that this decomposition is unique and that the rest is always less than the divisor). We can now think of addressing the smaller cyclic sequencing problem , with 14 repetitions of the new symbol , using all instances of but only 14 out of 18 instances of , and the remaining 4 repetitions of symbol. By reasoning similarly, since , one gets the smaller cyclic sequencing problem , with 4 instances of and 2 instances of . We have met the perfect divisibility condition, since . Then one either closes as in Example 3, or a further iteration is worked out with , where , so to close as in Example 2. In both cases one finds the cycle , that is, , which has minimal variance in, resp., or .

Backtracking is straightforwardly accomplished by backward substitution of the symbols progressively defined, and we claim that variance minimality be preserved for every . A possible way to keep a log during algorithm execution is reported in Table 1. In the case at hand,

0
1 32 18 14 1 4
2 18 14 4 3 2
3 6 4 2 2 0
Table 1. Application of the Euclidean Sequencing Algorithm (ESA).

A graphical representation of the ESA applied to , i.e., and , is shown in Figure 1, where the stages of the iterative algorithm are shown in progressively smaller and inner annuli. In the rationale behind this simple machinery resides the essence of the theory being developed in the sequel. One may notice that: As for the more abundant symbol, here , there are 14 instances with (i.e., they directly precede instances of followed by one other instance of ) and 4 instances with (i.e., they directly precede some other instance of ); As for the less abundant symbol, here , there are 4 instances with and 10 instances with , the former having exactly one distance step more than the latter because of the presence of exactly 4 instances of symbol (see Figure 1, second annulus from the outside), which increase exactly by one the distance between two consecutive instances of symbol (see Figure 1, first annulus from the outside); Moreover, said 4 instances of symbol are due to the repetitions of symbol in , which are in the number of , multiplied by the repetitions of symbol in , which are in the number of .

Data:
Result: such that ; see Section §6
// initialization
; ;
; ;
; ;
// looping is iterated until a null rest is found
while  do
       ;
       ;
       ;
       ;
       ;
       ;
       ;
       ;
      
end while
// finalization
;
Algorithm 1 The Euclidean Sequencing Algorithm (ESA).

In general terms, the ESA may be described as follows; see Algorithm 1, where a pseudo-code implementation is sketched. As for the initialization, denotes the symbol , , with the highest multiplicity, and denotes the other symbol. (If , and may be used in any of the two possibilities, without prejudicing the algorithm.) Let (resp., ) be the multiplicity of the symbol denoted by (resp., ).

The algorithm then proceeds iteratively. At the th iteration, a cyclic sequencing problem is adressed, where is the alphabet and is the multiplicity vector. Let be the total number of items. The quotient and the rest of the integer division are denoted by and , respectively. The decomposition , with , always exists and is unique. If , the new symbols and are defined, with multiplicities and , respectively, and a new iteration is run.

Otherwise, i.e., when , the stopping criterion is reached and the final cycle is obtained as , where (at the last iteration only) is deduced by comparison with the Euclidean algorithm for the gcd of two integers.

Remark 3.

Since , at any iteration the algorithm always attributes to the symbol with more instances, because

(23)
Remark 4.

As intuitive, the cyclic sequencing problem size decreases iteration wise (stagnation is not possible) and the algorithm terminates in a finite number of steps. In fact, (23) and yield the strict inequality

(24)
Remark 5.

It is well-known from the theory of the classical Euclidean algorithm [7], and easily proved, that , , where is the number of steps required to reach the end.

Definition 7.

Some instances of a same symbol are said to be unclustered in a cycle if they are well separated in , i.e., if for all such instances.

Remark 6.

Clearly, if some symbol is unclustered in the cycles , then is unclustered also in any product cycle , for , , including the case of .

Lemma 3.

The Euclidean Sequencing Algorithm sets unclustered all instances of the less abundant symbol.

Proof.

Induction on the number of steps required by the algorithm to end. As for the base of the induction (i.e., , standing for a case with perfect divisibility),

so that the less abundant symbol, , is always unclustered in . To prove the induction step, we consider a case requiring iterations and notice that, similarly,

We now want to build a first auxiliary cyclic sequencing problem characterized in that: i) steps are required to end, i.e., ; ii) For , the very same (and then the very same and ) are produced as in ; iii) , so that the cycle is produced by the ESA. As noticed in Remark 5, the latter requirement implies that , where and are the components of . Let us then set , so that and . Consequently, the last iteration (i.e., ) is fully determined. Backtracking, one then computes , , , for from back to 2, respecting the required constraints and thus producing the sought for auxiliary problem . All of the instances of are unclustered in by the inductive hypothesis.

Similarly, we now want to build a second auxiliary cyclic sequencing problem characterized in that: i) steps are required to end, i.e., ; ii) For , the very same (and then the very same and ) are produced as in ; iii) , so that the cycle is produced by the ESA. As noticed in Remark 5, the latter requirement implies that , where and are the components of . Let us then set , so that and . Consequently, the last iteration (i.e., ) is fully determined. Backtracking, one then computes , , , for from back to 2, respecting the required constraints and thus producing the sought for auxiliary problem . All of the instances of are unclustered in by the inductive hypothesis.

As noticed in Remark 6, all of the instances of are unclustered in . ∎

Corollary 2.

Let be a cyclic sequencing problem with . Then the Euclidean Sequencing Algorithm arranges the more abundant symbol in such a way that instances have , while the remaining have .

Proof.

Let be the less abundant symbol and the more abundant. Owing to Lemma 3, the instances of are unclustered. Therefore, there is the same number of symbol instances which are exactly before an instance of symbol. Such symbol instances have . All others symbol instances are clustered, and thus they have , and there are such instances. ∎

Definition 8.

Let be a cyclic sequencing problem. If , let

(25)

be the floor and ceiling, resp., of the ratio .

Lemma 4.

Let be an admissible cycle for a cyclic sequencing problem and let be a symbol such that . If is such that, for symbol , the distance may only take the two values and , then

(26)

are the number of symbol instances with and with , respectively.

Proof.

Notice first that, under the assumption, . In order to grant feasibility, the linear system

must hold, where the first equation counts the total instances of symbol , while the second equation expresses the round trip Lemma 1. The unique solution to the system reads and , as prospected. From their definition, since and since is a ring, then and . Then, from the definitions of floor and ceiling, , so that and , and therefore the latter are admissible as quantities of symbol instances. ∎

Remark 7.

Notice that a solution in is found to a system of linear equations with coefficients in , w