# Optimal Metastability-Containing Sorting Networks

When setup/hold times of bistable elements are violated, they may become metastable, i.e., enter a transient state that is neither digital 0 nor 1. In general, metastability cannot be avoided, a problem that manifests whenever taking discrete measurements of analog values. Metastability of the output then reflects uncertainty as to whether a measurement should be rounded up or down to the next possible measurement outcome. Surprisingly, Lenzen and Medina (ASYNC 2016) showed that metastability can be contained, i.e., measurement values can be correctly sorted without resolving metastability first. However, both their work and the state of the art by Bund et al. (DATE 2017) leave open whether such a solution can be as small and fast as standard sorting networks. We show that this is indeed possible, by providing a circuit that sorts Gray code inputs (possibly containing a metastable bit) and has asymptotically optimal depth and size. Concretely, for 10-channel sorting networks and 16-bit wide inputs, we improve by 48.46 delay and by 71.58 straightforward transistor-level optimization is likely to result in performance on par with standard (non-containing) solutions.

## Authors

• 6 publications
• 15 publications
• 8 publications
11/01/2019

### Optimal Metastability-Containing Sorting via Parallel Prefix Computation

Friedrichs et al. (TC 2018) showed that metastability can be contained w...
02/23/2021

### Optimal Sorting Circuits for Short Keys

A long-standing open question in the algorithms and complexity literatur...
12/08/2020

### An Answer to the Bose-Nelson Sorting Problem for 11 and 12 Channels

We show that 11-channel sorting networks have at least 35 comparators an...
06/01/2018

### Joint Size and Depth Optimization of Sorting Networks

Sorting networks are oblivious sorting algorithms with many interesting ...
10/15/2020

### Sorting Short Keys in Circuits of Size o(n log n)

We consider the classical problem of sorting an input array containing n...
02/01/2019

### A note on self-improving sorting with hidden partitions

We study self-improving sorting with hidden partitions. Our result is an...
03/18/2020

### A Generalization of Self-Improving Algorithms

Ailon et al. [SICOMP'11] proposed self-improving algorithms for sorting ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Metastability is one of the basic obstacles when crossing clock domains, potentially resulting in soft errors with critical consequences [8]. As it has been shown that there is no deterministic way of avoiding metastability [13], synchronizers [9]

are employed to reduce the error probability to tolerable levels. Besides energy and chip area, this approach costs time: the more time is allocated for metastability resolution, the smaller is the probability of a (possibly devastating) metastability-induced fault.

Recently, a different approach has been proposed, coined metastability-containing (MC) circuits [6]. The idea is to accept (a limited amount of) metastability in the input to a digital circuit and guarantee limited metastability of its output, such that the result is still useful. The authors of [2, 12] apply this approach to a fundamental primitive: sorting. However, the state-of-the-art [2] are circuits that are by a factor larger than non-containing solutions, where is the bit width of inputs. Accordingly, the authors pose the following question:

“What is the optimum cost of the primitive?”

We argue that answering this question is critical, as the performance penalty imposed by current MC sorting primitives is not outweighed by the avoidance of synchronizers.

#### Our Contribution

We answer the above question by providing a -bit MC circuit of depth and gates. Trivially, any such building block with gates of constant fan-in must have this asymptotic depth and gate count, and it improves by a factor of on the gate complexity of [2]. Furthermore, we provide optimized building blocks that significantly improve the leading constants of these complexity bounds. See Figure 1 for our improvements over prior work; specifically, for -bit inputs, area and delay decrease by up to and respectively.

Plugging our circuit into (optimal depth or size) sorting networks [3, 4, 10], we obtain efficient combinational metastability-containing sorting circuits, cf. Table 8. In general, plugging our circuit into an -channel sorting network of depth with elements [1], we obtain an asymptotically optimal MC sorting network of depth and gates.

#### Further Related Work

Ladner and Fischer [11] studied the problem of computing all the prefixes of applications of an associative operator on an input string of length . They designed and analyze a recursive construction which computes all these prefixes in parallel. The resulting parallel prefix computation (PPC) circuit has depth of and gate count of (assuming that the implementation of the associative operator has constant size and constant depth). We make use of their construction as part of ours.

## 2 Model and Problem

In this section, we discuss how to model metastability in a worst-case fashion and formally specify the input/output behavior of our circuits.

We use the following basic notation. For , we set . For a binary -bit string , denote by its -th bit, i.e., . We use the shorthand . Let denote the parity of , i.e, .

#### Reflected Binary Gray Code

Due to possible metastability of inputs, we use Gray code. Denote by the decoding function of a Gray code string, i.e., for , . As each -bit string is a codeword, the code is a bijection and the decoding function also defines the encoding function . We define -bit binary reflected Gray code recursively, where a -bit code is given by and . For , we start with the first bit fixed to and counting with (for the first codewords), then toggle the first bit to , and finally “count down” while fixing the first bit again, cf. Table 1. Formally, this yields

 rgB(x):={0rgB−1(x)if x∈[2B−1]1rgB−1(2B−1−x)if x∈[2B]∖[2B−1].

We define the maximum and minimum of two binary reflected Gray code strings, and respectively, in the usual way, as follows. For two binary reflected Gray code strings , and are defined as

 (maxrg{g,h},minrg{g,h}) :={(g,h)if ⟨g⟩≥⟨h⟩(h,g)if ⟨g⟩≤⟨h⟩.

#### Valid Strings

In [12], the authors represent metastable “bits” by M. The inputs to the sorting circuit may have some metastable bits, which means that the respective signals behave out-of-spec from the perspective of Boolean logic. Such inputs, referred to as valid strings, are introduced with the help of the following operator.

###### Definition 2.1 (The ∗ operator [12]).

For , define the operator by

 ∀i∈{1,…,B}:(x∗y)i:={xiif xi=yi{M}else.
###### Observation 2.2.

The operator is associative and commutative. Hence, for a set of -bit strings, we can use the shorthand

 \raisebox−0.7pt$∗$S:=\raisebox−0.7pt$∗$x∈Sx:=x(1)∗x(2)∗…∗x(k).

We call the superposition of the strings in .

Valid strings have at most one metastable bit. If this bit resolves to either or , the resulting string encodes either or for some , cf. Table 2.

###### Definition 2.3 (Valid Strings [12]).

Let and . Then, the set of valid strings of length is

 SBrg:=rgB([N])∪⋃x∈[N−1]{rgB(x)∗rgB(x+1)},

where for a set we abbreviate .

As pointed out in [2], inputs that are valid strings may, e.g., arise from using suitable time-to-digital converters for measuring time differences [7].

###### Observation 2.4.

For any and , , i.e., is a valid string, too.

###### Proof.

Follows immediately from Observation 3.1. ∎

#### Resolution and Closure

To extend the specification of and to valid strings, we make use of the metastable closure [6], which in turn makes use of the resolution.

###### Definition 2.5 (Resolution [6]).

For ,

 res(x):={y∈{0,1}B|∀i∈{1,…,B}:xi≠{M}⇒yi=xi}.

Thus, is the set of all strings obtained by replacing all Ms in by either or : M acts as a “wild card.”
We note the following for later use.

###### Observation 2.6.

For any , . For any , .

The metastable closure of an operator on binary inputs extends it to inputs that may contain metastable bits. This is done by considering all resolutions of the inputs, applying the operator, and taking the superposition of the results.

###### Definition 2.7 (The M Closure [6]).

Given an operator , its metastable closure is defined by

 f{M}(x):=\raisebox−0.7pt$∗$f(res(x)).

#### Output Specification

We want to construct a circuit that outputs the maximum and minimum of two valid strings, which will enable us to build sorting networks for valid strings. First, however, we need to answer the question what it means to ask for the maximum or minimum of valid strings. To this end, suppose a valid string is for some , i.e., the string contains a metastable bit that makes it uncertain whether the represented value is or . This means that the measurement the string represents was taken of a value somewhere between and . Moreover, if we wait for metastability to resolve, the string will stabilize to either or . Accordingly, it makes sense to consider “in between” and , resulting in the total order on valid strings given by Table 2.
The above intuition can be formalized by extending and to valid strings using the metastable closure.

###### Definition 2.8 ([2, 12]).

For , a circuit is specified as follows.

• Input: ,

• Output: ,

• Functionality: , .

As shown in [2], this definition indeed coincides with the one given in [12], and for valid strings and , and are valid strings, too. More specifically, and are the and operators w.r.t. the total order on valid strings shown in Table 2, e.g.,

• ,

• ,

• .

#### Computational Model

We seek to use standard components and combinational logic only. We use the model of [6], which specifies the behavior of basic gates on metastable inputs via the metastable closure of their behavior on binary inputs. For standard implementations of and gates, this assumption is valid: if M represents an arbitrary, possibly time-dependent voltage between logical and , an gate will still output logical if the respective other input is logical . Similarly, an gate with one input being logical suppresses metastability at the other input, cf. Table 3.

As pointed out in [2], any additional reduction of metastability in the output necessitates the use of non-combinational masking components (e.g., masking registers), analog components, and/or synchronizers, all of which are outside of our computational model. Moreover, other than the usage of analog components, these alternatives require to spend additional time, which we avoid in this paper.

## 3 Preliminaries on Stable Inputs

We note the following observation for later use. Informally, it states that removing prefixes and suffixes from the code results in (repetition) of binary reflected Gray codes.

###### Observation 3.1.

For -bit binary reflected Gray code, fix , and consider the sequence of strings obtained by (i) listing all codewords in ascending order of encoded values, (ii) replacing each codeword by , and (iii) deleting all immediate repetitions (i.e., if two consecutive strings are identical, keep only one of them). Then the resulting list repeatedly counts “up” and “down” through the codewords of -bit binary reflected Gray code.

###### Proof.

When removing the first bit of -bit binary reflected Gray code, the claim follows directly from the definition. By induction, we can confirm that the last bit of -bit code toggles on every second up-count, and . Thus, the claim holds if we either remove the first or last bit. As the same arguments apply when we have a list counting “up” and “down” repeatedly, we can inductively remove the first bits and the last bits to prove the general claim. ∎

#### Comparing Stable Gray Code Strings via an FSM

The following basic structural lemma leads to a straightforward way of comparing binary reflected Gray code strings.

###### Lemma 3.2.

Let such that . Denote by the first index such that . Then (i.e., ) if and (i.e., ) if .

###### Proof.

We prove the claim by induction on , where the base case is trivial. Now consider -bit strings for some and assume that the claim holds for bits. If , again the claim trivially follows from the definition. If , we have that . Denote and . If , then and . Thus, as by assumption, the claim follows from the induction hypothesis. If , and . Note that and satisfy that and that their first differing bit is . By the induction hypothesis, we have that if and, accordingly, if . As , , and the claim follows. ∎

Lemma 3.2 gives rise to a sequential representation of as a Finite state machine (FSM), for input strings in . Consider the state machine given in Figure 2. Its four states keep track of whether with parity (state encoding: ) or (state encoding: ), respectively, (state encoding: ), or (state encoding: ). Denoting by its state after steps (where is the initial state), Lemma 3.2 shows that the output given in Table 4 is correct: up to the first differing bits , the (identical) input bits are reproduced both for and , and in the -th step the state machine transitions to the correct absorbing state.

#### The ⋄ Operator and Optimal Sorting of Stable Inputs

We can express the transition function of the state machine as an operator taking the current state and input as argument and returning the new state. Then , where is given in Table 5.

###### Observation 3.3.

is associative, that is,

 ∀a,b,c∈{0,1}2:(a⋄b)⋄c=a⋄(b⋄c).

We thus have that

 s(i)=i\scalerel*⋄∑j=1gjhj:=g1h1⋄g2h2⋄…⋄gihi,

regardless of the order in which the operations are applied.

###### Proof.

First, we observe the following for every : (1) , (2) , (3) , and (4) . We prove that is associative by considering these four cases for the first operand . If , associativity follows from the “absorbing” property of cases and . If , then . We are left with the case that . Then the LHS equals , while the RHS equals . Checking Table 5, one can directly verify that in all cases. ∎

An immediate consequence is that we can apply the results by [11] on parallel prefix computation to derive an -gate circuit of depth computing all , , in parallel. Our goal in the following sections is to extend this well-known approach to potentially metastable inputs.

## 4 Dealing with Metastable Inputs

Our strategy is the same as outlined in Section 3 for stable inputs, where we replace all involved operators by their metastable closure: (i) compute for , (ii) determine and according to Table 4 for , and (iii) exploit associativity of the operator computing the to determine all of them concurrently with depth and gates (using [11]). To make this work for inputs that are valid strings, we simply replace all involved operators by their respective metastable closure. Thus, we only need to implement and the closure of the operator given in Table 4 (both of constant size) and immediately obtain an efficient circuit using the PPC framework [11].

Unfortunately, it is not obvious that this approach yields correct outputs. There are three hurdles to take:

1. Show that first computing and then the output from this and the input yields correct output for all valid strings.

2. Show that behaves like an associative operator on the given inputs (so we can use the PPC framework).

3. Show that repeated application of actually computes .

Killing two birds with one stone, we first show the second and third point in a single inductive argument. We then proceed to prove the first point.

### 4.1 Determining s(i){M}

Note that for any and , we have that . Hence, for valid strings and , we have that

 s(i){M}=\raisebox−0.7pt$∗$\scalerel∗⋄∑ij=1res(gjhj),

and for convenience set . Moreover, recalling Definition 2.7,

 x⋄{M}y=\raisebox−0.7pt$∗$x′y′∈res(xy){x′⋄y′}=\raisebox−0.7pt$∗$(res(x)⋄res(y)). (1)

The following theorem shows that the desired decomposition is feasible.

###### Theorem 4.1.

Let and . Then

 gihi⋄{M}gi+1hi+1⋄{M}…⋄{M}gjhj =\raisebox−0.7pt$∗$j\scalerel*⋄∑k=ires(gkhk), (2)

regardless of the order in which the operators are applied.

###### Observation 4.2.

Let and . If

 \raisebox−0.7pt$∗$j\scalerel*⋄∑k=ires(gkhk)=%M{M},

there is an index such that and . Conversely, if there is no such index, then .

###### Proof.

Abbreviate . By Observation 2.4, w.l.o.g.  and . Recall that, for any resolutions and , indicates whether (), (), with (), or with (). For , we must have that there are two pairs of resolutions , that result in (i) outputs and , respectively, or (ii) in outputs and , respectively. It is straightforward to see that this entails the claim (cf. Table 2). ∎

We now prove the claim of the theorem by induction on , i.e., the length of the strings we feed to the operators. For , we trivially have .

For the induction step, suppose and the claim holds for all shorter valid strings. As, by Observation 2.4, and are valid strings, w.l.o.g.  and . Consider the operator (at the position between index and ) on the left hand side that is evaluated last; we indicate this by parenthesis and compute

 (g1h1⋄{M}…⋄%Mgℓhℓ)⋄{M}(gℓ+1hℓ+1⋄{M}…⋄{M}gBhB) = (\raisebox−0.7pt$∗$ℓ\scalerel*⋄∑k=1res(gkhk))⋄{M}(\raisebox−0.7pt$∗$B\scalerel*⋄∑k=ℓ+1res(gkhk)) \raisebox−0.7pt$∗$(res(\raisebox−0.7pt$∗$ℓ\scalerel*⋄∑k=1res(gkhk))⋄res(\raisebox−0.7pt$∗$B\scalerel*⋄∑k=ℓ+1res(gkhk))) = \raisebox−0.7pt$∗$(res(a)⋄res(b))=:x,

where and .

By the induction hypothesis, and do not depend on the order of evaluation of the operators. Thus, it suffices to show that equals the right hand side of Equality (2).

We distinguish three cases. The first is that the right hand side of (2) evaluates to MM. Then, by Observation 4.2, there is a (unique) index so that and . If , we have (again by Observation 4.2) that , i.e., . Checking Table 5, we see that each column contains both and . Hence, regardless of , . On the other hand, if , then and . Checking the and rows of Table 5, both of them contain and , implying that .

The second case is that the right hand side of (2) does not evaluate to MM, but . Then, by Observation 4.2 and the fact that and are valid strings, and . W.l.o.g., assume . Then and the state machine given in Figure 2 determines output for inputs and . As the FSM outputs , we conclude that

 \raisebox−0.7pt$∗$B\scalerel*⋄∑k=1res(gkhk) =\raisebox−0.7pt$∗$g′h′∈res(gh){B\scalerel*⋄∑k=1g′kh′k} =\raisebox−0.7pt$∗$g′h′∈res(gh){01}=01

as well. Checking the row of Table 5, we see that , too, regardless of .

The third case is that the right hand side of (2) does not evaluate to MM and . By Observation 4.2, also . Accordingly, . We claim that this implies that

 res(a) =ℓ\scalerel*⋄∑k=1res(gkhk),res(b)=B\scalerel*⋄∑k=ℓ+1res(gkhk).

This can be seen by noting that, for any set , (i) and (ii) necessitates that , as otherwise and thus . We conclude that

 x =\raisebox−0.7pt$∗$(res(a)⋄res(b)) =\raisebox−0.7pt$∗$((ℓ\scalerel*⋄∑k=1res(gkhk))⋄(B\scalerel*⋄∑k=ℓ+1res(gkhk))) =\raisebox−0.7pt$∗$g′∈res(g)h′∈res(h)((ℓ\scalerel*⋄∑k=1g′kh′k)⋄(B\scalerel*⋄∑k=ℓ+1g′kh′k)) =\raisebox−0.7pt$∗$g′∈res(g)h′∈res(h)(B\scalerel*⋄∑k=1g′kh′k)=\raisebox−0.7pt$∗$(B\scalerel*⋄∑k=1res(gkhk)),

as desired. ∎

We remark that we did not prove that is an associative operator, just that it behaves associatively when applied to input sequences given by valid strings. Moreover, in general the closure of an associative operator needs not be associative. A counter-example is given by binary addition modulo :

 (0{M}+{M}01)+{M}01={M}{M}≠1{M}=0{M}+{M}(01+{M}01).

Since behaves associatively when applied to input sequences given by valid strings, we can apply the results by [11] on parallel prefix computation to any implementation of .

### 4.2 Obtaining the Outputs from s(i){M}

Denote by the operator given in Table 4 computing out of and . The following theorem shows that, for valid inputs, it suffices to implement to determine and from , , and .

###### Theorem 4.3.

Given valid inputs and , it holds that

 out{M}(s(i−1){M},gihi)=maxrg{M}{g,h}iminrg{M}{g,h}i.
###### Proof.

By definition of , does not depend on bits . As by Observation 2.4 , we may thus w.l.o.g. assume that . For symmetry reasons, it suffices to show the claim for the first output bit only; the other cases are analogous.

Recall that for , is the state of the state machine given in Figure 2 before processing the last bit. Hence,

 out{M}(s(B−1){M},gBhB)1=out(s(B−1),gBhB)1 = maxrg{g,h}B=maxrg{M}{g,h}B.

Our task is to prove this equality also for the case where or contain a metastable bit.

Let be the minimum index such that or . Again, for symmetry reasons, we may assume w.l.o.g. that ; the case is symmetric. If , suppose w.l.o.g. (the other case is symmetric) that . Then and the state machine is in absorbing state. Thus, regardless of further inputs, we get that and

 out{M}(s(B−1){M},gBhB)1=out{M}(10,gBhB)1=gB.

Hence, suppose that ; we consider the case that first, i.e., . By Observation 2.4, and thus w.l.o.g. . If ,

 out{M}(s(B−1){M},gBhB)1 =out{M}(00,{M}hB)1={1if h1=1{M}otherwise,

which equals (we simply have a -bit code). If , the above implies that