I Introduction
Analogtodigital converters (ADCs) are an essential component in any device that manipulates analog signals in a digital manner. While digital systems have benefited tremendously from scaling, their analog counterparts have become increasingly challenging. Consequently, it is often the case that the ADC constitutes the main bottleneck in a system, both in terms of power consumption and real estate, and in terms of the quality of the system’s output. Developing more efficient ADCs is therefore of great interest [1, 2].
The quality of an ADC is measured via the tradeoff between various parameters such as power consumption, size, cost of manufacturing, and the distortion between the input signal and its digitallybased representation. For the sake of a unified, technologyindependent, discussion, it is convenient to restrict the characterization of an ADC quality to three basic parameters: 1) The number of analog samples per second ; 2) The number of “raw” output bits the ADC produces per sample (before subsequent possible compression); 3) The mean squared error (MSE) distortion between the input signal and a reconstruction that is based on the output of the ADC.
While different applications may require different tradeoffs between , and , it is always desirable to design the ADC such that all three parameters are as small as possible. The focus of this work is on the quantization rate . For a given sampling frequency , and a given target distortion , our goal is to design ADCs that use the smallest possible number of raw output bits per sample.
The problem of analogtodigital conversion can be seen as an instance of the lossy source coding/lossy compression problem [3, 4, 5], as the output of an ADC is a binary sequence, which represents the analog source. A unique key feature of the analogtodigital conversion problem is that the encoding of the source is carried out in the analog domain, while the decoding procedure is purely digital. Given the limitations of analog processing, it is therefore generally only practical to exploit the source structure at the decoder. Hence, the type of source coding schemes that are suitable for data conversion, are those that approach fundamental limits without requiring knowledge of the source structure at the encoder. In addition, latency and complexity constraints in data conversion, typically preclude the use of schemes other than those based on scalar quantization.
The input signal to an ADC is often known to have structure that could be exploited to reduce the overall bit rate of its representation, . In our analysis, it will be convenient to express this structure using a stochastic model for the input. Consequently, throughout the paper, we will model the input to the ADC as a stationary stochastic Gaussian process
, whose power spectral density (PSD) encapsulates the assumed structure. More generally, we will sometimes also consider the problem of analogtodigital conversion of a vector
of jointly stationary stochastic Gaussian processes, via parallel ADCs, the input to each one of them is one of the processes.Under such stochastic modeling, ratedistortion theory [3] provides the fundamental lower bound for any ADC (and corresponding decoder) that achieves distortion , where is the ratedistortion function of the process in bits per second. In general, achieving the ratedistortion function of a source requires using sophisticated highdimensional quantizers, whereas analogtodigital conversion is invariably done via scalar uniform quantizers. Thus, achieving this lower bound with ADCs seems overly optimistic. Nevertheless, as we shall see, approaching the ratedistortion bound, up to some inevitable loss due to the onedimensional nature of the quantization, is sometimes possible by a simple modification of the scalar uniform quantizer, namely, a modulo ADC, followed by a digital decoder that efficiently exploits the source structure.
Instead of sampling and quantizing the process , a modulo ADC samples and quantizes the process , where the modulo size is a design parameter. See Figure 1. Equivalently, a modulo ADC can be thought of as a standard uniform scalar ADC with stepsize and an arbitrarily large dynamic range/support, but that outputs only the least significant bits in the description of each sample, where . The benefit of applying the modulo operation on is in reducing its dynamic range/support, which in turn enables a reduction of the number of bits per sample produced by the ADC, without increasing the quantizer’s stepsize. This operation, which corresponds to disregarding coarse information about , will otherwise substantially degrade the source reconstruction. However, by properly accounting for the modulo operation and appropriately choosing its parameter
, we can unwrap the modulo operation with high probability using previous samples of
and exploiting the (redundant) structure in the signal.Following standard system design methodology, in the performance analysis of a modulo ADC, we distinguish between two events: 1) The nooverload event where the decoder was able to correctly unwrap the modulo operation. We require the MSE distortion, conditioned on this event, to be at most ; 2) The overload event where the decoder fails in unwrapping the modulo operation. We require the probability of this event to be small, but do not concern ourselves with the MSE distortion conditioned on the occurrence of this event.
Ia Our Contributions
This work further develops the modulo ADC framework in three complementary directions, as specified below.
IA1 Oversampled Modulo ADC
We show that a modulo ADC can be used as an alternative to converters. A converter is based on oversampling the input process , i.e., sampling above the Nyquist rate, in conjunction with noiseshaping, which pushes much of the energy of the quantization noise to high frequencies, where there is no signal content. See Figure 2. The noise shaping operation requires incorporating an elaborate mixed signal feedback circuit. In particular, the circuit first generates the quantization noise, which necessitates using not only an ADC, but also an accuratelymatched digitaltoanalog converter (DAC), and then applies an analog filter. The analog nature of the signal processing makes it challenging to use filters of highorders, which in turn limits performance.
We develop an alternative architecture (Section III) that shifts much of the complexity to the decoder, whereas the “encoder” is simply a modulo ADC. See Figure 3. The parameter in the modulo ADC, as well as the coefficients of the prediction filter in Figure 3, depend only on the bandwidth of the input process
and on its variance
, and not on the other details of its PSD. Similarly, the MSE distortion between the input process and its reconstruction, depends only on and . Thus, the developed architecture is as agnostic as converters to the statistics of the input process. Furthermore, for a flatspectrum process, the distortion is within a small gap, due to onedimensionality of the encoder, from the information theoretic limit.IA2 A PhaseDomain Implementation of Modulo ADC via Ring Oscillators
We develop a modulo ADC implementation that performs the modulo reduction inherently as part of the analog signal acquisition process. As the phase of a periodic waveform is always measured modulo , a natural class of candidates are ADCs that first convert the input voltage into phase, and then quantize that phase. A notable representative within this class, which has been extensively studied in the literature[6, 7], is the ring oscillator ADC.
Consider a closedloop cascade of inverters, where
is an odd number, all controlled with the same voltage
, see Figure 4. This circuit, which will be described in detail in Section IV, oscillates between states, corresponding to the values (‘low’ or ‘high’, represented by ‘’ or ‘’) of each of the inverters. See Figure 5. The oscillation frequency is controlled by . Due to the oscillating nature of the circuit, if we sample its state every seconds, we cannot tell how many “state changes" occurred between two consecutive samples, but we are able to determine this number modulo . Thus, by setting to , where is the analog signal to be converted to a digital one and is a function to be specified, we obtain a modulo ADC. The inputoutput relation of this modulo ADC is characterized in Section IV, and depends on the response time of the inverters to change in their input, as a function of .In practice, the modulo operation realized in this way deviates from the ideal characteristic of Figure 1 in a variety of ways. Accordingly, we perform several numerical experiments to evaluate and optimize the performance of an oversampled ring oscillator modulo ADC, and compare it to the performance of an ideal modulo ADC as well as to a converter. The results demonstrate that despite the nonidealities in the ring oscillator implementation, in some regimes, this architecture holds substantial potential for improvement over existing ADCs.
IA3 Modulo ADCs for Jointly Stationary Processes
In many applications the number of sensors/antennas observing a particular process is greater than the number of degreesoffreedom (per time unit) governing its behavior. Thus, there is a redundancy at the receiver that can be exploited. However, as this redundancy can be spread across time and space, traditional ADC architectures, as well as the modulo ADC architectures described in Section
IIA and IIB, are insufficient. In this part of the paper, we show how to address this problem via a natural extension of the modulo ADC framework.As an example we will consider the problem of wireless communication. It is by now well established that using receivers, as well as transmitters, with multiple antennas, dramatically increases the achievable communication rates over wireless channels [8, 9]. However, adding antennas comes with the price of requiring multiple expensive and power hungry RF chains. For traditional ADC architectures, power and cost scale linearly with the number of receive antennas, which motivates an alternative solution.
It is often the case, that the signals observed by the different receive antennas are highly correlated, in time and in space. As an illustrative example, consider the case where the transmitter has one antenna, whereas the receiver has antennas. We can model the signal observed at each of the antennas, after sampling, as
(1) 
where is the process emitted by the transmitter, is the th channel impulse response, and are independent additive white Gaussian noise (AWGN) processes.
Since all output processes in (1) are noisy and filtered versions of of the same input process, they will typically be highly correlated. However, this correlation may be spread in time (the axis) and in space (the axis). As an extreme example, assume is an iid process, and the filters simply incur different delays, i.e., for . While each individual process is white, and each vector , has a scaled identity covariance matrix, the vector process is highly correlated. One must therefore jointly process the time and the spatial dimensions in order to exploit this correlation.
This phenomenon, where the signals observed by the different ADCs are highly correlated, is not unique to the wireless communication setup, and appears in many other applications, e.g., multiarray radar. It is, however, taken to the extreme in massive MIMO [10], where the number of antennas at the base station is of the order of tens or even hundreds, while the number of users it supports may be substantially fewer.
In Section VI we develop an architecture that uses modulo ADCs, one for each receive antenna, in order to exploit the spacetime correlation of the processes. We develop a lowcomplexity decoding algorithm for unwrapping the modulo operations. This algorithm combines the idea of performing prediction in time, of the quantized vector process from its past, with that of integerforcing source decoding [11], which is used for exploiting spatial correlations in the prediction error vector. See Figure 6. In the limit of small , the loss of the developed analogtodigital conversion scheme with respect to the information theoretic lower bound on , is shown to reduce to that of the integerforcing source decoder.
IB Related Work
The idea of using modulo ADCs/quantizers for exploiting temporal correlations within the input process towards reducing the quantization rate , dates back, at least, to [12], where a quantization scheme, called moduloPCM, was introduced. A decoding scheme for unwrapping the modulo operation, based on maximumlikelihood sequence detection [13], was further proposed in [12]
, and a heuristic analysis was performed, based on prediction of
from its past, which shows that moduloPCM can approach the Shannon lower bound under the highresolution assumptions. In Section IIA, we develop a more complete analysis of modulo quantization, the details of which are required for the application we discuss in Section III.The architecture from Figure 3 is based on using a prediction filter at the decoder, as a part of the modulo unwrapping process, as was hinted at in [12] (see also [14]). In agreement with the literature on differential pulsecode modulation (DPCM) at the late 1970s (see e.g. [15]), the authors in [12] proposed to design the prediction filter as the optimal onestep predictor of the unquantized process from its past. As shown in [16], this design criterion is suboptimal, and the “correct” design criterion is to take this filter as the onestep predictor of the quantized process from its past. The difference between the two design criteria is significant for oversampled processes, which are the focus of Section III, whose PSD is zero at high frequencies, as in those frequencies the signaltodistortion ratio is zero, no matter how small the quantization noise is. Our analysis in Section III reveals that designing the modulo size and the prediction filter with respect to a quantized flatspectrum input process, results in a universal system. This means, that this system attains the same distortion for all input processes that share the same support for the PSD and the same variance.
The use of modulo ADCs/quantizers was also studied by Boufounos in the context of quantization of oversampled signals [17] (see also [18]). In particular, it is shown in [17] that by randomly embedding a measurement vector in onto an dimensional subspace, and using a modulo ADC for quantizing each of the coordinates of the result, one can attain a distortion that decreases exponentially with the oversampling ratio, with high probability. In Section III we consider a similar setup, where an oversampled analog signal, with oversampling ratio , i.e. is times greater than the Nyquist frequency, is digitized by a modulo ADC. In the language of [17], this corresponds to embedding to an
dimensional space by zeropadding followed by interpolation, which is indeed a linear operation. We show that for this particular “embedding” not only is the decay of MSE distortion exponential in the oversampling ratio, but the attained distortion is informationtheoretically optimal, up to a constant loss, which is explicitly characterized, due to the scalar nature of the quantizer. Moreover, under this “embedding”, a simple lowcomplexity decoding algorithm exists, whereas for the random projection case studied in
[17], no computationally efficient decoding algorithm was given. One advantage, on the other hand, of the approach from [17], is that it is applicable to bit modulo ADCs, whereas the performance of the scheme from Section III typically becomes attractive starting from bits per sample.Very recently, Bhandari et al. have addressed the question of what is the minimal sampling rate that allows for exact recovery of a bandlimited finiteenergy signal, from its moduloreduced sampled version [19] (see also [20]). They have found that a sufficient condition for correct reconstruction is sampling above the Nyquist rate by a factor of , regardless of the size of the modulo interval. The analysis in [19] did not take quantization noise into account, which corresponds to and in our setup.
The merits of a modulo ADC for distributed analogtodigital conversion of signals correlated in space, but not in time, were demonstrated in [11]. A lowcomplexity decoding algorithm, for unwrapping the modulo operation, was proposed and its performance was analyzed. It was demonstrated via numerical experiments that the performance is usually quite close to the information theoretic lower bounds (See also [21]). In Section IIB, we summarize the decoding scheme from [11] and the corresponding performance analysis, as those will be needed in Section VI, where we develop a modulo ADC architecture for analogtodigital conversion of jointly stationary processes. The decoding algorithm for this setup, as well as its performance analysis, is inspired by the ideas and techniques from Sections IIA and IIB.
In a broader sense, modulo quantization is closely related to WynerZiv’s source coding with side information setup and to its channel coding dual, which is the Gel’fandPinsker setup [22]. In the latter context, we further note that modulo quantization is widely used for communication over intersymbol interference channels [23, 24]. Recently, Hong and Caire [25] considered modulo ADCs as potential candidates for the front end of receivers in a cloud radio access network (CRAN), employing computeandforward [26] based protocols.
Note that the although the concept of modulo ADC is reminiscent of folding ADCs [27], an important difference is that unlike the latter, the former does not keep track of the number of folds that occurred and, moreover, its functionality does not depend on this number, i.e., it does not saturate for large inputs. In unwrapping the modulo operation at the decoder, the missing information about number of folds is recovered, and we are able to attain the same with smaller rate.
Finally, another related line of work, is that of compressed sampling, see, e.g., [28, 29, 30], where the goal is to design universal and efficient ADCs with a small sampling frequency , under the assumption that the input signal occupies only a small portion of its total bandwidth, but the exact support is unknown.
IC Organization
The rest of the paper is organized as follows. In Section II we formally define the modulo ADC and study its performance for stationary scalar input processes, and for random vectors (spatial correlation). Section III develops the use of oversampled modulo ADCs as a substitute for converters, and analyzes the tradeoffs this architecture achieves. In Section IV we introduce an implementation of modulo ADCs via ring oscillators and establish the corresponding inputoutput mathematical model. Numerical experiments for evaluating the performance of ring oscillators based oversampled modulo ADCs are performed in Section V. Section VI proposes to use parallel modulo ADCs for digitizing jointly stationary processes. The paper concludes in Section VII.
Ii Preliminaries on Ideal Modulo ADC
Let be a positive number, and define the operation as
where the floor operation returns the largest integer smaller than or equal to . By definition, we have that for any and
(2) 
An bit modulo ADC with resolution parameter , or modADC, is defined by
where we have assumed that is an integer. In case itself is an integer, each sample of can be represented by bits. Otherwise, we can buffer consecutive samples and represent them by bits, such that the average number of bits per sample is . The role of here is to scale the input prior to quantization. We can write as
(3) 
The error term in (3) is clearly a deterministic function of . Nevertheless, throughout this paper we will model this error term as additive uniform noise statistically independent of , such that the modADC will be treated as a stochastic channel with input and output , related as
(4) 
The approximation of the modADC by the additive modulo channel (4) can be made exact via the use of subtractive dithers
. Specifically, we can use a random variable
, statistically independent of , which we refer to as a dither, and feed to the modADC instead of feeding . The output of the modulo ADC in this case will beSubtracting from and reducing the result modulo , we obtain
where the last equality follows from the distributive law of modulo (2). Note that for every , the random variable
is uniformly distributed over
, and is therefore independent of [31, Lemma 1]. Thus, with subtractive dithers, the additive noise model (4) is exact. We note that even when dithering is not used, under suitable conditions this approximation is quite accurate [32].Although the modulo operation entails loss of information in general, in many situations it is possible to unwrap it, i.e., reconstruct from with high probability.^{1}^{1}1Here, the term “high probability” is used to state that this probability can be made as high as desired by increasing . We explicitly quantify the relation between and the desired “nooverload” probability. In particular, let
(5) 
and note that conditioned on the nooverload event
we have that . Thus, if is close to , the modulo operation has no effect with high probability. Note that is identical to the probability that a standard uniform quantizer with dynamic range (support) is in overload. Thus, when thinking of as a single observation, it is unclear what the advantages of a modulo ADC are with respect to a traditional uniform ADC. However, as we illustrate below, the modulo ADC allows exploitation of the statistical structure of the acquired signal in a much more efficient manner than the standard ADC.
The following lemma is proved using Chernoff’s bound, and will be useful in the sequel for bounding in various scenarios.
Lemma 1 ([33, Lemma 4],[34, Theorem 7])
Consider the random variable where are iid Gaussian random variables with zero mean and some variance and are iid random variables, statistically independent of , uniformly distributed over the interval for some . Let . Then for any
Iia Modulo ADCs for Scalar Stationary Processes
Let be a zeromean discretetime stationary Gaussian stochastic process, obtained by sampling a stationary Gaussian process every seconds. Let
be the process obtained by applying a modADC on the process , where is a iid noise, and let
be its nonfolded version. Our goal is to design a decoder that recovers from the outputs of the modulo ADC, , with high probability. To that end, we assume the decoder has access to , an assumption that will be justified in the sequel, and that it knows the autocovariance function of . We apply the following algorithm (See also Figure 3 for a schematic illustration):
Inputs: ,, , , .
Algorithm:

Compute the optimal linear MMSE predictor for from its last samples
(6) where is a tap prediction filter, computed based on and , and the shift by compensates for .

Compute

Output , and .
Remark 1
Note that is the tap prediction filter for the quantized process from its past, rather than for from its past. While the loss for using the latter, instead of the former, becomes insignificant when highresolution assumptions apply, it can be arbitrarily large for oversampled processes, for which highresolution assumptions never hold [16, 35]. The filter coefficients need only be computed once, and can then be used for all times.
The following proposition characterizes the performance of the algorithm above. All logarithms in this paper are taken to base , unless stated otherwise.
Proposition 1
Let , and be as defined in the algorithm above, and let . We have that
(7) 
and
(8) 
where the event is the complement of the event .
Proof. Let be the th order prediction error of the process , and note that its variance is invariant to due to stationarity. We have that
(9)  
where equation (9) follows from the modulo distributive law (2), and constitutes the key advantage of the modulo operation for exploiting temporal correlations. Note that is a cyclicly shifted version of , as in (5). Therefore, conditioned on the event
we have that .
Note that is a zeromean linear combination of statistically independent Gaussian and uniform random variables, such that Lemma 1 applies, and we have that
(10) 
Whenever occurs, we have that , and consequently
and
(11) 
Proposition 1 shows that we can make as small as by choosing
(12) 
For example, taking bits, results in an overload probability smaller than . In particular, unless we take a very small , we have that , and consequently, by Proposition 1, we will have . Thus, to simplify expressions in the analysis that follows, we assume . We note the tradeoff in choosing : on the one hand, increasing decreases the MSE distortion , but on the other hand the prediction error variance of the process increases with such that the required rate for avoiding overload errors increases. Thus, the tradeoff between and the required quantization rate is controlled through the parameter . We now turn to characterize the tradeoff the developed scheme achieves.
Let denote the differential entropy of the random variable , and the conditional differential entropy of given the random variable [5]. Recall that for a stationary Gaussian process with PSD we have that [36]
(13) 
and in particular if and only if over a measurable subset of . Shannon’s lower bound [3], states that the number of bits per sample produced by any quantizer that attains an MSE distortion must satisfy
It is wellknown that for Gaussian processes with finite , Shannon’s lower bound is asymptotically tight, i.e., , [3].
Proposition 2
If , then
Proof. We can write
(14) 
where is the th order prediction error of the process , where iid.
For a Gaussian process , the condition is equivalent to
(15) 
As a consequence of (15), we have that
(16) 
By PaleyWiener’s theorem [37], we have that
(17) 
Combining (16) and (17), we obtain that
for processes with finite entropy rate . The result now follows by rearranging terms.
For the practically important case where is obtained by oversampling the process , which is studied in Section III, the assumption of Proposition 2 does not hold. Nevertheless, we will show that the modulo ADC nevertheless achieves performance that is close to the information theoretic limits.
Above, we have assumed that the decoder has access to the nonfolded samples . To justify this assumption, an initialization step is needed, where the decoder acquires the first consecutive samples , or estimates of these samples. Once those are obtained, we can apply the algorithm described above, samplebysample, and assume the estimate produced by the algorithm at time is correct, and can be used as an input for the algorithm in the next steps. All samples will be recovered correctly, as long as no overload error occurred within the decoding steps. Thus, by the union bound, we see that the first samples are recovered correctly with probability at least .^{2}^{2}2Note that conditioning on the event that no overload error occurred until time , changes the statistics of . Thus, applying the union bound correctly here requires some more care. See [35] for more details.
One conceptually simple way of performing the initialization, i.e., obtaining is by using a standard scalar quantizer with highrate for the first samples. Although the high power consumption of such a quantizer will have a negligible effect on the total power consumption, due to the fact it is used only for a small fraction of the time, this approach has the disadvantage of having to include two ADCs, a highrate standard ADC and a modulo ADC withing the system. Alternatively, one can perform the initialization using only a bit modulo ADC in one of the two following ways:

Increase gradually until it reaches its final value. For the first sample, will be chosen such that is w.h.p. within the modulo interval, such that no prediction is needed. Next, we can use in order to predict , which allows to use such that the prediction error is still within the modulo interval. Continuing this way, we can keep increasing until convergence.

We can collect a long vector of outputs from the modulo ADC, say , , and unwrap the modulo operation via the integerforcing source coding scheme described in the next subsection. The amount of computations per sample required in this method is greater than that of the “steady state”, i.e., after initialization is complete, but since initialization is rarely performed, the effect on the total complexity is negligible.
IiB Modulo ADCs for Random Vectors
Let be a dimensional Gaussian random vector with zero mean and covariance matrix . Let
be obtained by applying identical modADCs, each applied to a different coordinate of the vector , where the quantization noises , , are iid, and let
be its nonfolded version. Our goal is to recover from the outputs of the modulo ADCs with high probability.
By definition of the modulo operation, we have that . Consequently, the optimal decoder for from the measurement , in terms of minimizing , is
(18) 
where
is the probability density function (PDF) of the random vector
. Although can be expressed as the convolution of the dimensional Gaussian PDF of and the cubic PDF of , no simpler closedform expression is known for it. However, as increases (highresolution quantization regime), approaches the pdf of a random vector, where is a dimensional vector with all entries equal to andis the identity matrix. Consequently, one can use the suboptimal (in terms of minimizing
) decoderThe matrix is positive definite and therefore admits a Cholesky decomposition where is a lower triangular matrix with strictly positive diagonal entries. Setting , we can write
(19) 
Thus, the problem of finding is equivalent to that of finding the closest point to in the lattice generated by the basis . Solving this problem, in general, is known to require running time exponential in [38], unless P=NP. Thus, for large , finding is computationally prohibitive. One therefore needs to seek an alternative, lowcomplexity, decoder for from . Next, we review such a decoder, proposed in [11], dubbed the integerforcing (IF) source decoder, see Figure 7. The decoding algorithm works as follows.
Inputs: , , , .
Output: Estimates , and , for and , respectively.
Algorithm:

Solve
(20) where denotes the absolute value of .

For , compute
(21) and set .

Output , and .
Remark 2
The optimization problem (20) requires a computational complexity exponential in , in general (unless P=NP). However, the problem of finding the optimal integer matrix , need only be solved once for each covariance matrix and . Thus, even if the solution to this problem is computationally expensive, its cost is normalized by the number of times this solution is used. In practice, one can apply the LLL algorithm [39] in order to obtain a suboptimal with polynomial complexity in .
The next proposition, adapted from [11, Theorem 2] characterizes the performance of modulo ADCs with the decoder above.
Proposition 3
Let be the matrix found in step 1 of the algorithm above, and define
(22) 
We have that
and
for all , where the event is the complement of the event .
The main idea behind the decoder above is the simple observation that for any vector and any vector we have that
(23) 
Proof. By the identity (23), we have that the quantities , computed in step 2 of the algorithm, satisfy
where
Furthermore, is merely a cyclicly shifted version of . Thus, if and only if . Consequently, if and only if the event
occurs. Thus, by the union bound,
(24) 
The random variable has zero mean, variance , and satisfies the conditions of Lemma 1. We therefore have that
Substituting this into (24) and recalling the definition of , gives
(25) 
Conditioned on the event , i.e., the event that did not occur, we have that for all
where the last inequality follows similarly to (11).
As in the previous subsection, we set
(26) 
such that