1.1. Background and Motivation
Analog-to-digital (A/D) conversion is a process where bandlimited signals, e.g., audio signals, are digitized for storage and transmission, which is feasible thanks to the classical sampling theorem. In particular, the theorem indicates that discrete sampling is sufficient to capture all features of a given bandlimited signal, provided that the sampling rate is higher than the Nyquist rate.
Given a function
, its Fourier transformis defined as
The Fourier transform can also be uniquely extended to as a unitary transformation.
Given , if its Fourier transform is supported in .
An important component of A/D conversion is the following theorem:
Theorem 1.2 (Classical Sampling Theorem).
Given , for any satisfying
with and , , one has
where the convergence is both uniform on compact sets of and in .
In particular, for and , the following identity holds in :
However, the discrete nature of digital data storage makes it impossible to store exactly the samples . Instead, the quantized samples chosen from a pre-determined finite alphabet are stored. This results in the following reconstructed signal
As for the choice of the quantized samples , we shall discuss the following two schemes
Pulse Code Modulation (PCM):
Quantized samples are taken as the direct-roundoff of the current sample, i.e.,
A sequence of auxiliary variables is introduced for this scheme. is defined recursively as
quantization was introduced in 1963, , and is still widely used, due to its advantages over PCM. Specifically, quantization is robust against hardware imperfection , a decisive weakness for PCM. For quantization, and the more general noise shaping schemes to be explained below, the boundedness of
turns out to be essential, as most analyses on quantization problems rely on it for error estimation. Schemes with bounded auxiliary variables are said to bestable.
Despite its merits over PCM, quantization merely produces linear error decay with respect to bits used as opposed to exponential error decay produced by its counterpart PCM. Thus, it is desirable to generalize quantization for higher order error decay.
Given , one can consider an -th order quantization scheme as investigated by Daubechies and DeVore:
Theorem 1.3 (Higher Order Quantization, ).
Consider the following stable quantization scheme
where and are the quantized samples and auxiliary variables respectively. Then, for all ,
The existence of such scheme is also proven in the same paper. This has improved the error decay rate from linear to arbitrary polynomial degree while preserving the merits of a first order quantization scheme.
From here, a natural question arises: is it possible to generalize quantization scheme further so that the reconstruction error decay matches the exponential decay of PCM? Two solutions have been proposed for this question. The first one is to create new quantization schemes, known as noise shaping quantization schemes. A brief summary of its development will be provided in Section A.
The other possibility is to drastically enhance data storage efficiency while maintaining the same level of reconstruction accuracy, and signal decimation belongs in this category. The process is as follows: given an r-th order quantization scheme, there exists such that
where . Then, consider
a down-sampled sequence of , where .
The process where we convert the quantized samples to is called signal decimation.
Decimation has been known in the engineering community , and it was observed that decimation results in exponential error decay with respect to the bit-rate, even though the observation remained a conjecture until 2015 , when Daubechies and Saab proved the following theorem:
Theorem 1.4 (Signal Decimation for Bandlimited Functions, ).
Given , , and , there exists a function such that
Moreover, the bits needed for each Nyquist interval is
From (3) and (4), we can see that the reconstruction error after decimation still decays polynomially with respect to the sampling frequency. As for the data storage, the bits needed changes from to . Thus, the reconstruction error decays exponentially with respect to the bits used.
In , the author made an extension of decimation to signals in finite dimensional spaces. Such signals are sampled by finite frames, and a brief introduction on finite frames is given in Appendix A. Using the alternative decimation operator introduced in the same paper, it is proven that up to the second order sigma-delta quantization, similar results to Theorem 1.4 can be achieved:
Definition 1.5 (Alternative Decimation).
Given fixed , the -alternative decimation operator is defined to be , where
is the integration operator satisfying
Here, the cyclic convention is adopted: For any , .
is the sub-sampling operator satisfying
Definition 1.6 (Unitarily Generated Frames (UGF)).
Given a base vectorand a Hermitian matrix , the unitarily generated frame has its rows as
Theorem 1.7 (Alternative Decimation for Finite Frames, ).
Given , , , , and as the generator, base vector, eigenvalues, eigenvectors, and the corresponding UGF, respectively, and . Suppose
then the dual frame combined with the -th order quantization has polynomial reconstruction error decay rate of degree with respect to the oversampling rate :
Moreover, the total bits used to record the quantized samples are bits, where the constant depends on . Suppose is fixed as , then as a function of bits used at each entry, satisfies
The constant is independent of the oversampling rate .
1.2. Prior Works
1.2.1. Quantization for Bandlimited Functions
Despite its simple form and robustness, quantization only results in linear error decay with respect to the sampling period as . It was later proven in  that a generalization of quantization, namely the r-th order quantization, exists for any arbitrary , and for such schemes the error decay is of polynomial order . Leveraging the different constants for this family of quantization schemes, sub-exponential decay can also be achieved. A different family of quantization schemes was shown  to have exponential error decay with small exponent (.) In , the exponent was improved to .
1.2.2. Finite Frames
quantization can also be applied to finite frames. It is proven  that for any family of frames with bounded frame variation, the reconstruction error decays linearly with respect to the oversampling rate , where the frame is an matrix. With different choices of dual frames,  proposed that the so-called Sobolev dual achieves minimum induced matrix 2-norm for reconstructions. By carefully matching between the dual frame and the quantization scheme,  proved that using
-dual for random frames will result in exponential decay with near-optimal exponent and high probability.
In , using the assumption that the noise in quantization is random along with numerical experiments, it was asserted that decimation greatly reduces the number of bits needed while maintaining the reconstruction accuracy. In , a rigorous proof was given to show that such an assertion is indeed valid, and the reduction of bits used turns the linear decay into exponential decay with respect to the bit-rate.
As for the adaptation to finite dimensional signals, the author proved in  that there exists a similar operator called the alternative decimation operator that behaves similarly to the decimation for bandlimited signals. In particular, for the first and second order of quantization, it is possible to achieve exponential reconstruction error decay with respect to the bit-rate as well. However, similar to the caveat of decimation, it merely improves the storage efficiency while maintaining the same level reconstruction error. Thus, the error rate with respect to the oversampling rate remains the same (quadratic for the second order,) which is still inferior to other noise shaping schemes.
2. Main Results
We have seen in Theorem 1.7 that alternative decimation is only useful up to the second order. Thus, we aim to extend our results to arbitrary orders, and the solution we present here is called the adapted decimation.
Definition 2.1 (Adapted Decimation).
Given , the -adapted decimation operator is defined to be
where is the usual backward difference matrix, satisfies , and has .
It will be shown that, for unitarily generated frames satisfying conditions specified in Theorem 2.2, an -th order quantization coupled with the corresponding adapted decimation has -th order polynomial reconstruction error decay rate with respect to the ratio . As for the data storage, decimation allows for highly efficient storage, making the error decay exponentially with respect to the bits used.
For the rest of the paper, we shall also assume that our quantization scheme is stable, i.e. remains bounded as the dimension .
We also choose the quantization alphabet to be which is uniformly spaced and symmetric around the origin: Given , we define
For complex Euclidean spaces, we define to be the mid-rise quantizer with length . Throughout this paper we shall always deal with such .
Given , , , , and as the generator, base vector, eigenvalues, eigenvectors, and the corresponding UGF, respectively, and fixed. Suppose
there exists from the stable -th order quantization satisfying
where , then the following statements are true:
Recursivity: For all , there exists such that .
Signal reconstruction: is a frame.
Error estimate: For the dual frame , the reconstruction error satisfies
Efficient data storage: Suppose the length of the quantization alphabet is , then the total bits used to record the quantized samples are bits. Furthermore, suppose is fixed as , then as a function of bits used at each entry, satisfies
where , independent of .
Harmonic frames are unitarily generated frames with the generator being the diagonal matrix with entries and the base vector , so Theorem 2.2 is applicable for harmonic frames as well.
3. Proof of Theorem 2.2
First, we claim that scales down to the usual backward-difference matrix under the under-sampling matrix :
Given with ,
where is the -dimensional backward difference matrix.
Note that, for ,
For , . ∎
The most important estimate in Theorem 2.2 is (6), where we find the reconstruction error under adapted decimation . It can be written as follows: given a signal and the matrix , the reconstruction error satisfies
where the fourth equality follows from Lemma 3.1, and is the -to-2 norm of . Thus, in order to obtain this estimate, we need to answer two questions:
Is a frame? What is the lower frame bound of ?
What is ?
Aside from the reconstruction error estimate, we also need to calculate the bits needed to record the decimated sample , which will be done in Lemma 3.14.
3.1. The Effect of Adapted Decimation on the Frame
We start by introducing the following notation:
Given , the -by- constant matrix has constant on all entries.
The following two lemmas are needed for us to describe in Proposition 3.5.
Given with generator , we have
where and , where can be diagonalized by a unitary matrix
can be diagonalized by a unitary matrix.
For any , the -th row of can be written as
where is a diagonal matrix.
Thus, . ∎
For any ,
3.2. Cancellation Between Residual Terms of
From (8), we can divide into two parts: being the main term, and the rest being residual terms. In this section, we shall investigate the behavior of the residual terms.
Define a double-sequence recursively by
Let and . We first examine the form of each before calculating the cancellation between and .
For any and ,
First, it can easily be seen that for all by induction on . Then, by definition and induction on ,
For and ,
We shall prove this by induction on . For and ,
For and ,
As for ,
where the third equality follows from the fact that . ∎