Adapted Decimation on Finite Frames for Arbitrary Orders of Sigma-Delta Quantization

02/15/2019 ∙ by Kung-Ching Lin, et al. ∙ University of Maryland 0

In Analog-to-digital (A/D) conversion, signal decimation has been proven to greatly improve the efficiency of data storage while maintaining high accuracy. When one couples signal decimation with the ΣΔ quantization scheme, the reconstruction error decays exponentially with respect to the bit-rate. We build on our previous result, which extends signal decimation to finite frames, albeit only up to the second order. In this study, we introduce a new scheme called adapted decimation, which yields polynomial reconstruction error decay rate of arbitrary order with respect to the oversampling rate, and exponential with respect to the bit-rate.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

1.1. Background and Motivation

Analog-to-digital (A/D) conversion is a process where bandlimited signals, e.g., audio signals, are digitized for storage and transmission, which is feasible thanks to the classical sampling theorem. In particular, the theorem indicates that discrete sampling is sufficient to capture all features of a given bandlimited signal, provided that the sampling rate is higher than the Nyquist rate.

Given a function

, its Fourier transform

is defined as

The Fourier transform can also be uniquely extended to as a unitary transformation.

Definition 1.1.

Given , if its Fourier transform is supported in .

An important component of A/D conversion is the following theorem:

Theorem 1.2 (Classical Sampling Theorem).

Given , for any satisfying

  • on

  • for ,

with and , , one has


where the convergence is both uniform on compact sets of and in .

In particular, for and , the following identity holds in :

However, the discrete nature of digital data storage makes it impossible to store exactly the samples . Instead, the quantized samples chosen from a pre-determined finite alphabet are stored. This results in the following reconstructed signal

As for the choice of the quantized samples , we shall discuss the following two schemes

  • Pulse Code Modulation (PCM):

    Quantized samples are taken as the direct-roundoff of the current sample, i.e.,

  • Quantization:

    A sequence of auxiliary variables is introduced for this scheme. is defined recursively as

quantization was introduced in 1963, [15], and is still widely used, due to its advantages over PCM. Specifically, quantization is robust against hardware imperfection [9], a decisive weakness for PCM. For quantization, and the more general noise shaping schemes to be explained below, the boundedness of

turns out to be essential, as most analyses on quantization problems rely on it for error estimation. Schemes with bounded auxiliary variables are said to be


Despite its merits over PCM, quantization merely produces linear error decay with respect to bits used as opposed to exponential error decay produced by its counterpart PCM. Thus, it is desirable to generalize quantization for higher order error decay.

Given , one can consider an -th order quantization scheme as investigated by Daubechies and DeVore:

Theorem 1.3 (Higher Order Quantization, [8]).

Consider the following stable quantization scheme

where and are the quantized samples and auxiliary variables respectively. Then, for all ,

The existence of such scheme is also proven in the same paper. This has improved the error decay rate from linear to arbitrary polynomial degree while preserving the merits of a first order quantization scheme.

From here, a natural question arises: is it possible to generalize quantization scheme further so that the reconstruction error decay matches the exponential decay of PCM? Two solutions have been proposed for this question. The first one is to create new quantization schemes, known as noise shaping quantization schemes. A brief summary of its development will be provided in Section A.

The other possibility is to drastically enhance data storage efficiency while maintaining the same level of reconstruction accuracy, and signal decimation belongs in this category. The process is as follows: given an r-th order quantization scheme, there exists such that

where . Then, consider

a down-sampled sequence of , where .

The process where we convert the quantized samples to is called signal decimation.

Decimation has been known in the engineering community [5], and it was observed that decimation results in exponential error decay with respect to the bit-rate, even though the observation remained a conjecture until 2015 [10], when Daubechies and Saab proved the following theorem:

Theorem 1.4 (Signal Decimation for Bandlimited Functions, [10]).

Given , , and , there exists a function such that


Moreover, the bits needed for each Nyquist interval is



From (3) and (4), we can see that the reconstruction error after decimation still decays polynomially with respect to the sampling frequency. As for the data storage, the bits needed changes from to . Thus, the reconstruction error decays exponentially with respect to the bits used.

Figure 1. Illustration of the first order Decimation Scheme: taking averages of samples nearby before down-sampling. The effect on the reconstruction (Replacing with ) is illustrated in parentheses

In [16], the author made an extension of decimation to signals in finite dimensional spaces. Such signals are sampled by finite frames, and a brief introduction on finite frames is given in Appendix A. Using the alternative decimation operator introduced in the same paper, it is proven that up to the second order sigma-delta quantization, similar results to Theorem 1.4 can be achieved:

Definition 1.5 (Alternative Decimation).

Given fixed , the -alternative decimation operator is defined to be , where

  • is the integration operator satisfying

    Here, the cyclic convention is adopted: For any , .

  • is the sub-sampling operator satisfying

    where .

Definition 1.6 (Unitarily Generated Frames (UGF)).

Given a base vector

and a Hermitian matrix , the unitarily generated frame has its rows as


The eigenvalues and eigenvectors of

are represented as and .

Theorem 1.7 (Alternative Decimation for Finite Frames, [16]).

Given , , , , and as the generator, base vector, eigenvalues, eigenvectors, and the corresponding UGF, respectively, and . Suppose

  • ,

  • , and

  • ,

then the dual frame combined with the -th order quantization has polynomial reconstruction error decay rate of degree with respect to the oversampling rate :

Moreover, the total bits used to record the quantized samples are bits, where the constant depends on . Suppose is fixed as , then as a function of bits used at each entry, satisfies

The constant is independent of the oversampling rate .

1.2. Prior Works

1.2.1. Quantization for Bandlimited Functions

Despite its simple form and robustness, quantization only results in linear error decay with respect to the sampling period as . It was later proven in [8] that a generalization of quantization, namely the r-th order quantization, exists for any arbitrary , and for such schemes the error decay is of polynomial order . Leveraging the different constants for this family of quantization schemes, sub-exponential decay can also be achieved. A different family of quantization schemes was shown [14] to have exponential error decay with small exponent (.) In [11], the exponent was improved to .

1.2.2. Finite Frames

quantization can also be applied to finite frames. It is proven [3] that for any family of frames with bounded frame variation, the reconstruction error decays linearly with respect to the oversampling rate , where the frame is an matrix. With different choices of dual frames, [4] proposed that the so-called Sobolev dual achieves minimum induced matrix 2-norm for reconstructions. By carefully matching between the dual frame and the quantization scheme, [7] proved that using

-dual for random frames will result in exponential decay with near-optimal exponent and high probability.

1.2.3. Decimation

In [5], using the assumption that the noise in quantization is random along with numerical experiments, it was asserted that decimation greatly reduces the number of bits needed while maintaining the reconstruction accuracy. In [10], a rigorous proof was given to show that such an assertion is indeed valid, and the reduction of bits used turns the linear decay into exponential decay with respect to the bit-rate.

As for the adaptation to finite dimensional signals, the author proved in [16] that there exists a similar operator called the alternative decimation operator that behaves similarly to the decimation for bandlimited signals. In particular, for the first and second order of quantization, it is possible to achieve exponential reconstruction error decay with respect to the bit-rate as well. However, similar to the caveat of decimation, it merely improves the storage efficiency while maintaining the same level reconstruction error. Thus, the error rate with respect to the oversampling rate remains the same (quadratic for the second order,) which is still inferior to other noise shaping schemes.

2. Main Results

We have seen in Theorem 1.7 that alternative decimation is only useful up to the second order. Thus, we aim to extend our results to arbitrary orders, and the solution we present here is called the adapted decimation.

Definition 2.1 (Adapted Decimation).

Given , the -adapted decimation operator is defined to be

where is the usual backward difference matrix, satisfies , and has .

It will be shown that, for unitarily generated frames satisfying conditions specified in Theorem 2.2, an -th order quantization coupled with the corresponding adapted decimation has -th order polynomial reconstruction error decay rate with respect to the ratio . As for the data storage, decimation allows for highly efficient storage, making the error decay exponentially with respect to the bits used.

For the rest of the paper, we shall also assume that our quantization scheme is stable, i.e.  remains bounded as the dimension .

We also choose the quantization alphabet to be which is uniformly spaced and symmetric around the origin: Given , we define

For complex Euclidean spaces, we define to be the mid-rise quantizer with length . Throughout this paper we shall always deal with such .

Theorem 2.2.

Given , , , , and as the generator, base vector, eigenvalues, eigenvectors, and the corresponding UGF, respectively, and fixed. Suppose

  • ,

  • ,

  • ,

  • , and

  • there exists from the stable -th order quantization satisfying

where , then the following statements are true:

  • Recursivity: For all , there exists such that .

  • Signal reconstruction: is a frame.

  • Error estimate: For the dual frame , the reconstruction error satisfies

  • Efficient data storage: Suppose the length of the quantization alphabet is , then the total bits used to record the quantized samples are bits. Furthermore, suppose is fixed as , then as a function of bits used at each entry, satisfies


    where , independent of .

Remark 2.3.

Harmonic frames are unitarily generated frames with the generator being the diagonal matrix with entries and the base vector , so Theorem 2.2 is applicable for harmonic frames as well.

We shall prove Theorem 2.2 in several steps. First, we split into one main term and many residual terms in Section 3.1 . Then, we compute the cancellation among residual terms in Section 3.2. We compute the lower frame bound of in Section 3.3 before proving the theorem itself.

3. Proof of Theorem 2.2

First, we claim that scales down to the usual backward-difference matrix under the under-sampling matrix :

Lemma 3.1.

Given with ,

where is the -dimensional backward difference matrix.


Note that, for ,

For , . ∎

The most important estimate in Theorem 2.2 is (6), where we find the reconstruction error under adapted decimation . It can be written as follows: given a signal and the matrix , the reconstruction error satisfies

where the fourth equality follows from Lemma 3.1, and is the -to-2 norm of . Thus, in order to obtain this estimate, we need to answer two questions:

  • Is a frame? What is the lower frame bound of ?

  • What is ?

The the first question will be answered positively in Proposition 3.11, and the estimate in the third question is given in Lemma 3.13.

Aside from the reconstruction error estimate, we also need to calculate the bits needed to record the decimated sample , which will be done in Lemma 3.14.

3.1. The Effect of Adapted Decimation on the Frame

We start by introducing the following notation:

Definition 3.2.

Given , the -by- constant matrix has constant on all entries.

The following two lemmas are needed for us to describe in Proposition 3.5.

Lemma 3.3.

Given with generator , we have

where and , where

can be diagonalized by a unitary matrix



For any , the -th row of can be written as


where is a diagonal matrix.


Thus, . ∎

Lemma 3.4.

where .


For any ,

Combining Lemma 3.3 and 3.4, one has the following expansion:

Proposition 3.5.

Given ,


First, we claim that, for , .

For , by Lemma 3.3. For ,

As for the effect of , we claim that for .

For , by Lemma 3.4. For ,

From the two assertions above, we get

3.2. Cancellation Between Residual Terms of

From (8), we can divide into two parts: being the main term, and the rest being residual terms. In this section, we shall investigate the behavior of the residual terms.

Define a double-sequence recursively by

Let and . We first examine the form of each before calculating the cancellation between and .

Lemma 3.6.

For any and ,


First, it can easily be seen that for all by induction on . Then, by definition and induction on ,

Lemma 3.7.

For and ,


We shall prove this by induction on . For and ,

For ,

For and ,

As for ,

where the third equality follows from the fact that . ∎

Proposition 3.8.

For ,