Rank Minimization for Snapshot Compressive Imaging

07/20/2018 ∙ by Yang Liu, et al. ∙ Duke University Tsinghua University Bell Labs 0

Snapshot compressive imaging (SCI) refers to compressive imaging systems where multiple frames are mapped into a single measurement, with video compressive imaging and hyperspectral compressive imaging as two representative applications. Though exciting results of high-speed videos and hyperspectral images have been demonstrated, the poor reconstruction quality precludes SCI from wide applications.This paper aims to boost the reconstruction quality of SCI via exploiting the high-dimensional structure in the desired signal. We build a joint model to integrate the nonlocal self-similarity of video/hyperspectral frames and the rank minimization approach with the SCI sensing process. Following this, an alternating minimization algorithm is developed to solve this non-convex problem. We further investigate the special structure of the sampling process in SCI to tackle the computational workload and memory issues in SCI reconstruction. Both simulation and real data (captured by four different SCI cameras) results demonstrate that our proposed algorithm leads to significant improvements compared with current state-of-the-art algorithms. We hope our results will encourage the researchers and engineers to pursue further in compressive imaging for real applications.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 10

page 11

page 12

page 13

page 14

page 15

page 16

Code Repositories

DeSCI

Rank Minimization for Snapshot Compressive Imaging (SCI)


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Compressive sensing (CS) [1, 2]

has inspired practical compressive imaging systems to capture high-dimensional data such as videos 

[3, 4, 5, 6, 7, 8] and hyper-spectral images [9, 10, 11, 12, 13]. In video CS, high-speed frames are modulated at a higher frequency than the capture rate of the camera, which is working at a low frame rate. In order to achieve ultra-high frame rate [14], for multiple frames, there is only a single measurement available per pixel in these high-dimensional compressive imaging systems. In this manner, each captured measurement frame can recover a number of high-speed frames, which is dependent on the coding strategy, e.g., 148 frames reconstructed from a snapshot in [5]; if the CS imaging system is working (sampling measurements) at 30 frames per second (fps), we can achieve a video frame rate of higher than 4,000 fps. In hyper-spectral image CS, the wavelength dependent coding is implemented by a coded aperture (physical mask) and a disperser [10, 11]. More than 30 hyperspectral images have been reconstructed from a snapshot measurement. These systems can be recognized as snapshot compressive imaging (SCI) systems.

Though these SCI systems have led to exciting results, the poor quality of reconstructed images precludes them from wide applications. Therefore, algorithms with high quality reconstruction are desired. This gap will be filled by our paper via exploiting the performance of optimization based reconstruction algorithms. It is worth noting that different from traditional CS [15] using random sensing matrices, in SCI, the sensing matrix has special structures and it is not random. Though this has raised the challenge in theoretical analysis [16], we can gain the speed in the algorithmic development (details in Sec. 4.3) by using this structure. Most importantly, this structure is inherent in the hardware development of SCI, such as the video CS [5] and spectral CS [11].

1.1 Motivation

Fig. 1: Flowchart of our proposed algorithm (DeSCI) for SCI reconstruction. Left: the sensing (compressively sampling) process of video SCI [5]. Middle: The proposed rank minimization based reconstruction algorithm, where the projection and WNNM for patch groups are iteratively performed. Right: Our reconstruction result and that of the GAP-TV method [17] are shown for comparison in the upper part.

Without loss of generality, we use video CS as an example below to describe the motivation and contribution of this work. Hardware review and results of hyperspectral images are presented along with videos in corresponding sections. Different from traditional CS [15, 18], in SCI, the desired signal usually lies in high dimensions; for instance, a 148-frame video clip with spatial resolution (pixels) is recovered from a single (pixels) measurement frame. This paper aims to address the following key challenges in SCI reconstruction.

  • How to exploit the structure information in high-dimensional videos and hyperspectral images?

  • Are there some approaches for other (image/video processing) tasks that can be used in SCI to improve the reconstruction quality?

One of the state-of-the-art algorithms for SCI, the Gaussian mixture model (GMM) based algorithms 

[19, 20] only exploited the sparsity of the video patches. In widely used videos (e.g., the Kobe dataset in Fig. 5), these GMM based algorithms cannot provide reconstructed video frames with a PSNR (peak-signal-to-noise-ratio) more than 30dB. Similar to simulation, for real data captured by SCI cameras, the results of GMM suffer from blurry and other unpleasant artifacts (Fig. 13). While the blurry is mainly due to the limitation of sparse priors used in GMM, the unpleasant artifacts might be due to the system noise. Motivated by these limitations of GMM, to develop a better reconstruction algorithm, high-dimensional structure information is necessary to be investigated, for example, the nonlocal similarity across temporal and spectral domain in addition to the spatial domain. Moreover, an algorithm which is robust to noise is highly demanded as the system noise is unavoidable in real SCI systems.

A reconstruction framework that takes advantage of more comprehensive structural information can potentially outperform existing algorithms. One recent proposal in CS is to use algorithms that are designed for other data processing tasks such as denoising [21], which employed advanced denoising algorithms in the approximate message passing framework to achieve state-of-the-art results in image CS. This motivates us to develop advanced reconstruction algorithms for SCI by leveraging the advanced techniques in video denoising [22]. Meanwhile, recent researches on rank minimization have led to significant advances in other image and video processing tasks [22, 23, 24, 25, 26]. These advances have great potentials to boost the performance of reconstruction algorithms for SCI and this will be investigated in our paper.

1.2 Contributions

While the rank minimization approaches have been investigated extensively for image processing, extending them to the video and hyperspectral image cases, especially the SCI is nontrivial. Particularly, to achieve high-speed videos, in SCI, the measurement is not video frames, but the linear combination of a number of frames. In this manner, the rank minimization methods cannot be imposed directly as used in image processing. To conquer this challenge, a new reconstruction framework is proposed, which incorporates the rank minimization as an intermediate step during reconstruction. Specifically, by integrating the compressive sampling model in SCI with the weighted nuclear norm minimization (WNNM) [24] for video patch groups (see details in Sec. 4.2), a joint model for SCI reconstruction is formulated. To solve this problem, the alternating direction method of multipliers (ADMM) [27] method is employed to develop an iterative optimization algorithm for SCI reconstruction.

Fig. 1 depicts the flowchart of our proposed algorithm for SCI reconstruction. After the measurement is captured by the SCI systems (left part in Fig. 1), our proposed algorithm, dubbed DeSCI (decompress SCI, middle part in Fig. 1), performs the projection (which projects the measurement/residual to the signal space to fit the sampling process in the SCI system) and WNNM denoising (imposing signal structural priors) for video patch groups (with details in Sec. 4.2) iteratively. In the right part, we show the reconstruction results of our proposed DeSCI algorithm on the Kobe data used in [19] and the results of the GAP-TV (generalized alternating projection based total variation) method [17] are shown for comparison in the upper part. It can be seen clearly that our DeSCI algorithm provides better image/video quality than GAP-TV.

Moreover, our proposed DeSCI algorithm has boosted the reconstruction quality of real data captured by four different SCI cameras, i.e., CACTI (coded aperture compressive temporal imaging) [5], color-CACTI [6], CASSI (coded aperture snapshot spectral imaging) [11] and the high-speed stereo camera [8]; please refer to Fig 13 to Fig. 20 for visual comparisons. These real data results clearly demonstrate the advantages of our proposed algorithms, e.g., robust to noise. We thus strongly believe that our findings have significant practical values for both research and applications. We hope our encouraging results will inspire researchers and engineers to pursue further in compressive imaging.

1.3 Related work and organization of this paper

As SCI reconstruction is an ill-posed problem, to solve it, different priors have been employed, which can be categorized into total variation (TV) [17], sparsity in transform domains [4, 5, 6], sparsity in over-complete dictionaries [3, 12], and the GMM based algorithms [19, 20]. The group sparsity based algorithms [28] and the nonlocal self-similarity [29]

model, which has led to significant advances in image processing, have not been used in these SCI reconstruction algorithms. This is investigated in our paper. Most recently, the deep learning techniques have been utilized for video CS 

[30, 31]

. We are not aiming to compete with these algorithms as they are usually complicated and require data to train the neural networks. Furthermore, some of these algorithms require the sensing matrix being spatially repetitive 

[30], which is very challenging (or unrealistic) in real applications. In addition, we have noticed that limited improvement (around 2dB) has been obtained using these deep learning techniques in [30] compared with GMM. By contrast, our proposed algorithm has improved the results significantly (dB) over GMM. Specifically, we employ rank minimization approach to the nonlocal similar patches in videos and hyperspectral images. In this manner, the reconstruction results have been improved dramatically and this would pave the way of wide applications of SCI systems, such as high-speed videos [6], hyperspectral images [12] and three-dimensional high-accuracy high-speed in-door localizations [8].

The rest of this paper is organized as follows. Sec. 2 reviews the two standard SCI systems, namely, the video SCI and hyperspectral image SCI systems. The mathematical model of SCI is introduced in Sec. 3. Sec. 4 develops the rank minimization based algorithm for SCI reconstruction. Extensive simulation results are presented in Sec. 5 to demonstrate the efficacy of the proposed algorithm, and real data results are shown in Sec. 6. Sec. 7 concludes the paper and discusses future research directions.

2 Review of snapshot compressive imaging systems

The last decade has seen a number of SCI systems [9, 10, 3, 4, 5, 32, 14, 33, 6, 34, 35, 8, 36] with the development of compressive sensing [1, 37, 38]. The underlying principle is encoding the high-dimensional data on a 2D sensor with dispersion for spectral imaging [9, 10, 33], temporal-variant mask for high-speed imaging [3, 4, 5, 14], and angular variation for light-field imaging [32]. Recently, several variants explore more than three dimensions of the scene [6, 35, 8, 39, 40], which paves the way for plenoptic imaging [41].

We validate the proposed DeSCI methods on two typical snapshot compressive imaging systems, i.e., snapshot compressive imagers such as the CACTI system [5] and the CASSI system [9, 10], as shown in Fig. 2 and Fig. 3, respectively. Similar approaches could be adapted for other compressive imaging systems with minor modifications, since we only need to change the sensing matrix for various coding strategies and the nonlocal self-similarity always holds for natural scenes.

2.1 Snapshot-video compressive imaging system

In video snapshot compressive imagers, i.e., the CACTI system [5], the high-speed scene is collected by the objective lens and spatially coded by the temporal-variant mask, such as the shifting mask [5] or different patterns on the digital micromirror device (DMD) or the spatial light modulator (SLM) [3, 4, 7], as shown in Fig. 2. Then the coded scene is detected by the monochrome or color CCD (Charge-Coupled Device) for grayscale [5] and color [6] video capturing, respectively. A snapshot on the CCD encodes tens of temporal frames of the high-speed scene. The number of coded frames for a snapshot is determined by the number of variant codes (of the mask) within the integration time.

2.2 Snapshot-spectral compressive imaging system

In spectral snapshot compressive imagers, i.e., the CASSI system [9, 10], the spectral scene is collected by the objective lens and spatially coded by a fixed mask, as shown in Fig. 3. Then the coded scene is spectrally dispersed by the dispersive element, such as the prism or the grating. The spatial-spectral coded scene is detected by the CCD. A snapshot on the CCD encodes tens of spectral bands of the scene. The number of coded frames for a snapshot is determined by the dispersion property of the dispersive element and the pixel size of the mask and the CCD [10].

Fig. 2: Schematic of the coded aperture compressive temporal imaging (CACTI) system [5]. A snapshot on the CCD encodes tens of temporal frames of the scene coded by the spatial-variant mask, e.g., the shifting mask or different patterns on the digital micromirror device (DMD). The mask/DMD and the mono/color detector, i.e., CCD, are in the conjugate image plane of the scene.
Fig. 3: Schematic of the coded aperture snapshot spectral imaging (CASSI) system [10]. A snapshot on the CCD encodes tens of spectral bands of the scene spatially coded by the mask and spectrally coded by the dispersive element. The mask, dispersive element and the CCD are in the conjugate image plane of the scene.

3 Mathematical model of snapshot compressive imaging

Mathematically, the measurement in the SCI systems can be modeled by [5]

(1)

where is the sensing matrix, is the desired signal, and denotes the noise. Unlike traditional CS, the sensing matrix considered here is not a dense matrix. In SCI, e.g., video CS as in CACTI [5, 6] and spectral CS as in CASSI [11], the matrix has a very specific structure and can be written as

(2)

where are diagonal matrices.

Taking the SCI in CACTI [5] as an example, consider that high-speed frames are modulated by the masks , correspondingly (Fig. 2). The measurement is given by

(3)

where denotes the Hadamard (element-wise) product. For all pixels (in the frames) at the position , ; , they are collapsed to form one pixel in the measurement (in one shot) as

(4)

By defining

(5)

where , and , for

, we have the vector formulation of Eq. (

1), where . Therefore, , , and the compressive sampling rate in SCI is equal to . 111Multiple measurements have also been investigated in [42, 43] for hyperspectral image CS, and our algorithm can be used in that system with minor modifications. The code design has also been investigated in [43] for multiple shot hyperspectral image CS. However, this is out of the scope of this paper. It is worth noting that due to the special structure of in (2), we have being a diagonal matrix. This fact will be useful to derive the efficient algorithm in Sec. 4.3 for handling the massive data in SCI.

One natural question is that is it theoretically possible to recover from the measurement defined in Eq. (1), for ? Most recently, this has been addressed in [16] by using the compression-based compressive sensing regime [44] via the following theorem, where denotes the encoder/decoder, respectively.

Theorem 1

[16] Assume that , . Further assume the rate- code achieves distortion on . Moreover, for , , and . For and let denote the solution of compressible signal pursuit optimization. Assume that is a free parameter, such that . Then,

(6)

with a probability larger than

.

Details of the compressible signal pursuit optimization and the proof can be found in [16]. Most importantly, Theorem 1 characterizes the performance of SCI recovery by connecting the parameters of the (compression/decompression) code, its rate and its distortion , to the number of frames and the reconstruction quality. This theoretical finding strongly encourages our algorithmic design for SCI systems.

4 Rank minimization for signal reconstruction in snapshot compressive imaging

In this section, we first briefly review the rank minimization algorithms and the joint model is developed in Sec. 4.2. The proposed algorithm is derived in Sec. 4.3.

4.1 Rank minimization

In image/video processing, since the matrix formed by nonlocal similar patches in a natural image is of low rank, a series of low-rank matrix approximation methods have been proposed for various tasks [24, 45, 46, 47]. The main goal of low-rank matrix approximation is to recover the underlying low-rank structure of a matrix from its degraded/corrupted observation. Within these frameworks, nuclear norm minimization (NNM) [23] is the most representative one and this will be used in our work. Specifically, given a data matrix , the goal of NNM is to find a low-rank matrix of rank , which satisfies the following objective function,

(7)

where is the nuclear norm, i.e.

, the sum of the singular values of a given matrix,

denotes the Frobenius norm and is the regularization parameter. Despite the theoretical guarantee of the singular value thresholding model [23], it has been observed that the recovery performance of such a convex relaxation will degrade in the presence of noise, and the solution can seriously deviate from the original solution of the rank minimization problem [48]. To mitigate this issue, Gu [24] proposed the weighted nuclear norm minimization (WNNM) model, which is essentially the reweighted -norm [49] of the singular values of the desired matrix, and WNNM has led to state-of-the-art image denoising results. Though generally the WNNM problem is nonconvex, for non-descendingly ordered weights as used in this paper, we can get the optimal solution in closed-form by the weighted soft-thresholding operator [25]. In the following, we will investigate how to use WNNM in the SCI systems.

4.2 Integrating WNNM to SCI

To be concrete, all video frames are divided into overlapping patches of size , and each patch is denoted by a vector . For the patch , its similar patches are selected from a surrounding (searching) window with pixels to form a set , where denotes the window size in space and signifies the window size in time (across frames). After this, these patches in are stacked into a matrix , i.e.,

(8)

This matrix consisting of patches with similar structures is thus called a group, where denotes the patch in the group. Since all patches in each data matrix have similar structures, the constructed data matrix is of low rank.

By using this rank minimization as a constraint, the SCI problem in (1) can be formulated as

(9)

where is a parameter to balance these two terms, and recall that is constructed from .

Given the fact that WNNM [24] provides better results than NNM, we modify the model in (9) to

(10)

Here

(11)

is the weighted nuclear norm of the matrix with weights defined in the vector , where is the weight assigned to , the singular value of . The model in (10) is denoted as SCI-WNNM below.

4.3 Solving the SCI-WNNM problem

Under the ADMM [27] framework, we introduce an auxiliary variable to the problem in (10),

(12)

where again is constructed from . Eq. (12) can be translated into three sub-problems:

(13)
(14)
(15)

We derive the solutions to these sub-problems below, and without confusion, we discard the iteration index .

Solve : Given , Eq (13) is a quadratic form and has a closed-form solution

(16)

where

is an identity matrix with desired dimensions. Since

is a fat matrix, will be a large matrix and thus the matrix inversion formula is employed to simplify the calculation:

(17)

Plug (17) into (16), we have

(18)

As mentioned earlier, is a diagonal matrix in our imaging systems. Let

(19)

we have

(20)
(21)

Let and denote the element of the vector ; (18) becomes

(22)

Note that can be updated in one shot by , and is pre-calculated and stored with

(23)

with defined in (3). In this way, the last term in (22) can be computed element-wise and can thus be updated very efficiently.

Solve : Let . Eq. (14) can be considered as the WNNM denoising problem (however, for videos rather than images)

(24)

Recall that is the patch group constructed from . Let be the patch group constructed from corresponding to . The structure of (24) is very complicated and in order to achieve a tractable solution, a general assumption is made. Specifically, is treated as a noisy version of , which is , where denotes the zero-mean white Gaussian noise and each element

follows an independent zero mean Gaussian distribution with variance

, i.e., . It is worth noting that we can recover after reconstructing , and there are in total

patch groups. Invoking the law of large numbers in probability theory, we have the following equation with a very large probability (limited to 1) at each iteration 

[50], i.e., ,

(25)

The proof can be found in [51]. As both and are large (i.e., ) in our case and the overlapping patches are averaged, we can thus translate (24) to the following problem

(26)

Note that there is a scale difference between in (26) and (24). As mentioned above, can be recovered after reconstructing and each can be independently solved by

(27)

By considering and , (27) can be solved in closed-form by the generalized soft-thresholding algorithm [52, 25] detailed below, where we have used .

Considering the singular value decomposition (SVD) of

and the weight vector , can be solved by

(28)
(29)

The following problem is to determine the weight vector . For natural images/videos, we have the general prior knowledge that the larger singular values of are more important than the smaller ones since they represent the energy of the major components of . In the application of denoising, the larger the singular values, the less they should be shrunk, i.e.,

(30)

where is a constant, is the number of patches in and is a tiny positive constant. Since is not available,

is not available either. We estimate it by

(31)

where

is updated in each iteration. There are some heuristic approaches on how to determine

based on current estimations [24]. In principle, with the increasing of the iteration number, the measurement error is decreasing and is getting smaller. In our SCI problem, we have found that progressively decreasing the noise level by starting with a large value performs well in our experiments (details in Sec. 5.1.1). This is consistent with the analysis in [24].

The complete algorithm, i.e., DeSCI, is summarized in Algorithm 1.

0:  ,
1:  Initialize , , , , , , and .
2:  Pre-calculate and save the diagonal elements .
3:  for  to Max-Iter  do
4:     Update by Eq. (22).
5:     Update .
6:     for Each patch in  do
7:        Find similar patches to construct patch group .
8:        Perform SVD to .
9:        Estimate using Eqs. (28)-(31).
10:     end for
11:     Aggregate to form .
12:     Update .
13:     Update by Eq. (15).
14:  end for
15:  Output: the recovered video (hyperspectral images) .
Algorithm 1 The DeSCI Algorithm for SCI reconstruction

Relation to generalized alternating projection. The generalized alternating projection (GAP) algorithm [53] has been investigated extensively for video CS [5, 6, 17] with different priors. The difference between ADMM developed above and GAP is the constraint on the measurement. In ADMM, we aim to minimize , while GAP imposes . Specifically, by introducing a constant , GAP reformulates the problem in (12) as

(32)

This is solved by a series of alternating projection problem:

(33)

where is a constant that changes in each iteration; is constructed from , and is constructed from , respectively.

Eq. (33) is equivalent to

(34)

where is a parameter to balance these two terms. Eq. (34) is solved by alternating updating and . Given , is solved by a WNNM denoising algorithm as derived before.

Given , the update of is simply an Euclidean projection of on the linear manifold ; this is solved by

(35)
(36)

Comparing with the updated equation of in (22), by interchange and , we can found (22) and (36) are equivalent by setting and in (22). This is expected as if we set in (13), we impose that is on the linear manifold , which is exactly the constraint used in GAP. This works well in the noiseless and low noise cases. However, when the noise level is high, ADMM will outperform GAP as the strong constraint will bias the results.

An accelerated GAP (GAP-acc) is also proposed by Liao et al. [53]. The linear manifold can also be adaptively adjusted and so (35) can be modified as

(37)
(38)

This accelerated GAP speeds up the convergence in the noiseless case in our experiments. We have also tried the approximate message passing (AMP) algorithm [54] used in [21] to update . Unfortunately, AMP cannot lead to good results due to the special structure of ; this may be because the convergence of AMP heavily depends on the Gaussianity of the sensing matrix in CS.

5 Simulation results

In this section, we validate the proposed DeSCI algorithm on simulation datasets, including the videos and hyperspectral images and compare it with other state-of-the-art algorithms for SCI reconstruction.

5.1 Video snapshot compressive imaging

We first conduct simulations to the video SCI problem. The coding strategy employed in [5], specifically the shifting binary mask, is used in our simulation. Two datasets, namely Kobe and Traffic, used in [19] are employed in our simulation. We also used another two datasets, Runner [55] and Drop [56] to show the wide applicability. The video frame is of size (pixels) and eight () consequent frames are modulated by shifting masks and then collapsed to a single measurement. We compare our proposed DeSCI algorithm with other leading algorithms, including GMM-TP [19], MMLE-GMM, MMLE-MFA [20] and GAP-TV [17]. The performance of GAP-wavelet proposed in [6] is similar to GAP-TV as demonstrated in [17] and thus omitted here. Due to the special requirements of the mask in deep learning based methods [30], we do not compare with them. However, as mentioned earlier, we can notice that limited improvement (around 2 dB) has been obtained using these deep learning techniques in [30] compared with GMM. By contrast, our algorithm has improved the results significantly ( dB) over GMM. All algorithms are implemented in MATLAB and the GMM and GAP-TV codes are downloaded from the authors’ websites. The MATLAB code of the proposed DeSCI algorithm is available at https://github.com/liuyang12/DeSCI.

5.1.1 Parameter setting

In our DeSCI algorithm, for all noise levels, the parameter and iteration number Max-Iter are fixed to and , respectively. The image patch size is set to and for and , respectively; the searching window size is set to , and the number of adjacent frames for denoising is set to . Each patch group has and patches for and , respectively. The parameter is set based on the signal-to-noise ratio (SNR) of the SCI measurements. Specifically, is set to for the measurement SNR of {40dB, 30dB, 20dB, 10dB, 0dB}. For noiseless measurements, is set to ; as derived in Sec. 4.3, in this case, ADMM is equivalent to GAP. Our algorithm is terminated by sufficiently small difference between successive updates of or reaching the Max-Iter. When the SNR of the SCI measurement is larger than 30 dB, GAP-acc is used to update . Both PSNR and structural similarity (SSIM) [57] are employed to evaluate the quality of reconstructed videos.

Fig. 4: The PSNR of reconstructed video frames by setting different at different iterations of DeSCI. It can be seen that the is starting from a large value (100) and then decreasing by half for every 60 iterations. Both PSNR and SSIM are gradually increasing with the iteration number.

As mentioned earlier, the noise estimation in each iteration of our DeSCI algorithm is important to the performance. We found that firstly setting to a large value and then progressively decreasing it leads to good results. One example of the Kobe data used in [19] is shown in Fig. 4. We have found that this sequential setting can always lead to good results in our experiments.

5.1.2 Video compressive imaging results

Algorithm Kobe Traffic Runner Drop Average
GMM-TP 24.47, 0.5246 25.08, 0.7652 29.75, 0.6995 34.76, 0.6356 28.52, 0.6562
MMLE-GMM 27.33, 0.6962 25.68, 0.7798 33.68, 0.8224 39.86, 0.7369 31.64, 0.7588
MMLE-MFA 24.63, 0.5291 22.66, 0.6232 30.83, 0.7331 35.66, 0.6690 28.45, 0.6386
GAP-TV 26.45, 0.8448 20.89, 0.7148 28.81, 0.9092 34.74, 0.9704 27.72, 0.8598
DeSCI 33.25, 0.9518 28.72, 0.9251 38.76, 0.9693 43.22, 0.9925 35.99, 0.9597
TABLE I: The average results of PSNR in dB (left entry in each cell) and SSIM (right entry in each cell) by different algorithms on 4 datasets.

We hereby demonstrate that the proposed DeSCI algorithm performs much better than current state-of-the-art algorithms. Comparison of DeSCI, GMM-TP, MMLE-GMM, MMLE-MFA and GAP-TV on four datasets is shown in Table I. It can be seen clearly that the proposed DeSCI outperforms other algorithms on every dataset. Specifically, the average gains of DeSCI over GMM-TP, MMLE-GMM, MMLE-MFA and GAP-TV are as much as {7.47, 4.35, 7.54, 8.27}dB on PSNR and {0.3035, 0.2009, 0.3211, 0.0999} on SSIM.

Fig. 5: Reconstruction frames of DeSCI and other algorithms (GMM-TP, MMLE-GMM, MMLE-MFA, and GAP-TV).

In average, MMLE-GMM is the runner-up on PSNR while GAP-TV is the runner-up on SSIM. This may due to the fact that the TV prior is imposed globally and thus reserves more structural information of the video (thus leading to a higher SSIM). Exemplar reconstructed frames are shown in Fig. 5. It can be observed that GMM based algorithms suffer from blurring artifacts and GAP-TV suffers from blocky artifacts. By contrast, our proposed DeSCI algorithm can not only provide the fine details but also the large-scale sharp edges and clear motions. The reconstructed videos are shown in the supplementary material (SM).

Fig. 6: Frame-wise PSNR (a,c) and SSIM (b,d) of DeSCI and other algorithms for the Kobe (a-b) and Traffic (c-d) datasets.

One phenomenon for SCI reconstruction is the image quality (PSNR and SSIM) dropping down of the first and last reconstruction frames for each measurement. One of the reasons is that previous reconstruction algorithms do not consider the non-local self similarity in video frames. In Fig. 6, we plot the frame-wise PSNR and SSIM of DeSCI as well as other algorithms for the Kobe and Traffic datasets. It can be seen that the proposed DeSCI smooths this quality dropping down phenomenon out. The self-similarity exists in every frame of the video and thus helps the reconstruction of these frames. We can also notice that the PSNR of reconstructed video frames drops in the last 16 frames of the Kobe data (Fig. 6(a)); this is because complicated motions exist in these frames, i.e., the slam dunk. Though the PSNRs of all algorithms drop, the SSIM of our proposed DeSCI is much smoother (Fig. 6(b)) than that of other algorithms.

5.1.3 Robustness to noise

As mentioned in Sec. 4.3, DeSCI using GAP as projection can lead to fast results under noiseless case, which has been verified in previous results. However, GAP imposes that the solution of should be on the manifold , which is too strong in the noisy case. By contrast, ADMM minimizes the measurement error, i.e., , and is thus robust to noise [58, 27]. To verify this, we perform the experiments on the Kobe dataset by adding different levels of white Gaussian noise. The results are summarized in Fig. 7, where we can see that in the noiseless case, ADMM and GAP performs the same, which is consistent with our theoretical analysis. When the measurement SNR is getting smaller, ADMM outperforms GAP in both PSNR and SSIM. Therefore, DeSCI using ADMM as projection is recommended in realistic systems with noise.

Fig. 7: Comparison of ADMM and GAP for DeSCI with noisy measurements. A single measurement of the Kobe dataset is used.

5.1.4 DeSCI with other denoising algorithms

Fig. 8: Reconstruction frames of DeSCI using VBM4D and WNNM for video denoising. We can see that DeSCI-VBM4D still suffers from some undesired artifacts while DeSCI-WNNM can provide fine details as well as large-scale sharp edges.

Our proposed DeSCI is an iterative algorithm and in each iteration, we first perform projection via ADMM or GAP and then perform denoising via the low-rank minimization method, namely, WNNM. A natural question is how will the performance change by replacing the derived WNNM with other state-of-the-art video denoising algorithms. We have already seen that the DeSCI-WNNM performs much better than TV based denoising algorithms. The VBM4D [28] algorithm, which is the representative state-of-the-art video denoising algorithm, is incorporated into our proposed framework, dubbed DeSCI-VBM4D. We run DeSCI-VBMD on the Kobe and Traffic datasets and obtained the PSNR of {30.60, 26.60}dB and SSIM of {0.9260, 0.8958}, respectively. More specifically, DeSCI-WNNM outperforms DeSCI-VBM4D more than 2 dB on PSNR and more than 0.025 on SSIM. Exemplar reconstruction frames are shown in Fig. 8, where we can see DeSCI-VBM4D still suffers from some undesired artifacts while the DeSCI-WNNM can provide small-scale fine details as well as large-scale sharp edges. This clearly verifies the superiority of the proposed reconstruction framework along with the WNNM denoising on SCI reconstruction.

We have also tried to integrate WNNM and VBM4D with other priors for denoising, such as TV priors. We observe that the TV prior can marginally help DeSCI-VBM4D (less than 0.2 dB) but almost cannot help DeSCI-WNNM. This again verifies that our proposed DeSCI-WNNM has investigated both the global prior and the local prior for videos, through the nonlocal self-similarity and low-rank estimation.

5.1.5 Computational complexity

As mentioned in Sec. 4.3, the projection step can be updated very efficiently, and the time consuming step is the weighted nuclear norm minimization for video denoising, in which both the block matching and low-rank estimation via SVD require a high computational workload. While the block matching can be performed once per (about) 20 iterations, the SVD needs to be performed in every iteration for each patch group. Specifically, a single round of block matching and patch estimation via SVD takes 90 seconds and 11 seconds, respectively. Recent advances of deep learning have shown great potentials in various image processing tasks, and we envision the block matching might be able to speed up via Generative Adversarial Networks (GAN) [59]. In this way, the time of block matching can be reduced significantly. Regarding the low-rank estimation via SVD, some truncated approaches [60] can be employed to speed-up the operation. Via using all these advanced tools, we believe each iteration of our algorithm can be performed within 10 seconds, and DeSCI can provide good results in a few minutes.

Algorithm bird toy
GAP-TV 30.36 dB, 0.9251 24.66 dB, 0.8608
DeSCI 32.40 dB, 0.9452 25.91 dB, 0.9094
TABLE II: The average results of PSNR (left entry in each cell) and SSIM (right entry in each cell) by GAP-TV and DeSCI on simulated bird and toy hyperspectral datasets.
Fig. 9: Reconstructed spectra of simulated bird hyperspectral data. A snapshot measurement encoding 24 spectral bands is shown on the top-right. The original RGB image of the scene is shown on the top-left with a size of pixels. The spectra of four birds are shown on the middle and bottom rows. The correlation of the reconstructed spectra and the ground truth is shown in the legends.
Fig. 10: Reconstructed frames of simulated bird hyperspectral data.

5.2 Snapshot compressive hyperspectral image

We further demonstrate that DeSCI can be used for snapshot-hyperspectral compressive imaging systems, such as CASSI [9]. Both simulated data with shifting masks and data from CASSI systems [9] are presented. We first show the simulation results in this section and real data results are demonstrated in Sec. 6.3. We generate the simulated hyperspectral data by summing the spectral frames from the hyperspectral dataset222The bird spectral data is from [12]. The toy spectral data is from CAVE multispectral image database (http://www1.cs.columbia.edu/CAVE/databases/multispectral/). and shifting random masks using Eq. (3).

Bird simulated hyperspectral data. The bird hyperspectral images consists of 24 spectral bands with each of size pixels. The results of the reconstructed spectra of simulated bird hyperspectral data and exemplar frames are shown in Fig. 9 and Fig. 10, respectively. The averaged results of PSNR and SSIM are shown in Table II. It can be seen clearly that both GAP-TV and DeSCI can recover the spectra of the four birds reliably with correlation over 0.9999, as shown in the legends of Fig. 9. DeSCI provides more details in reconstruction (see Fig. 10, where the feather of the orange bird is less smoothed out than GAP-TV) and higher quantitative indexes (PSNR and SSIM in Table II).

Fig. 11: Reconstructed spectra of toy hyperspectral data. A snapshot measurement encoding 31 spectral bands from simulated data is shown on the top-right. The toy data is from CAVE hyperspectral dataset. The original RGB image of the scene is shown on the top-left with a size of pixels. The spectra of four color regions are shown on the middle and bottom rows. The correlation of the reconstructed spectra and the ground truth is shown in the legends.
Fig. 12: Reconstructed frames of simulated toy hyperspectral data.

Toy simulated hyperspectral data. The toy hyperspectral images consists of 31 spectral bands with each of size pixels. The results of the reconstructed spectra of simulated toy hyperspectral data and exemplar frames are shown in Fig. 11 and Fig. 12, respectively. The averaged results of PSNR and SSIM are shown in Table II. It can be seen clearly that DeSCI provides better reconstructed spectra than GAP-TV, as shown in the legends of Fig. 11. Besides, the reconstructed frames of DeSCI preserves fine details of the toy dataset, for example the resolution targets and the fringe on the clothes in Fig. 12 and gets higher PSNR and SSIM, as shown in Table II.

Fig. 13: Real data: Performance comparison on chopper wheel high-speed video reconstruction. A snapshot measurement encoding frames captured from the real SCI system [5] is shown on the top-left. Four algorithms (GMM-TP, MMLE-GMM, MMLE-GMM and GAP-TV) are compared with the proposed DeSCI algorithm. Close-ups of the letter ‘D’ (in blue) and motion blur (in red) are shown on the bottom-right.

6 Real data results

In this section, we demonstrate the efficacy of the proposed DeSCI algorithm with real data captured from various SCI systems. Both grayscale and color video data and hyperspectral images are used for reconstruction and DeSCI provides significantly better reconstruction results.

6.1 Grayscale high-speed video

The grayscale high-speed video data are captured by the coded aperture compressive temporal imaging (CACTI) system [5]. A snapshot measurement of size pixels encodes frames of the same size. The same data is used in two GMM papers [19, 20]. We compare the results of DeSCI with three GMM based algorithms (GMM-TP, MMLE-GMM and MMLE-MFA) and re-use the figures333We re-use the figures in  [20] for two reasons. Firstly, we use exactly the same dataset as the GMM paper [20]. Secondly, only the pre-trained model of is provided by the authors [19]. However, for real data, for the grayscale high-speed video and for the color high-speed video. from [20].

Chopper wheel grayscale high-speed video. The results of the chopper wheel dataset are shown in Fig. 13. It can be seen clearly that all other leading algorithms suffer from motion blur artifacts (see letter ‘D’ and the corresponding motion blur in each frame and the close-ups in Fig. 13). However, DeSCI reserves both fine details and large-scale sharp edges.

Hand lens grayscale high-speed video. Similar results of the hand lens dataset are shown in Fig. 14. DeSCI reserves the clear background of the motionless hand (close-ups of the hand) and sharp edges of the moving lens (close-ups of the lens).

Fig. 14: Real data: Performance comparison on hand lens high-speed video reconstruction. A snapshot measurement encoding frames captured from the real SCI system [5] is shown on the top-left. Four algorithms (GMM-TP, TwIST, GAP and GAP-TV) are compared with the proposed DeSCI algorithm. Close-ups of the hand (in blue) and the lens (in red) are shown on the bottom-right.

UCF grayscale high-speed video. DeSCI can also boost the performance of other snapshot high-speed compressive imaging systems. A snapshot measurement of size pixels encodes 10 frames of the same size from [8]. We compare DeSCI and TwIST used in [8] and results are shown in Fig. 15. Three frames out of 10 total frames are shown here. DeSCI not only preserves sharp edges of the scene (close-ups of the ‘C’ character and the moving ball), but also resolves fine details of the background. As shown in the middle close-ups, the characters of the book can be seen clearly in DeSCI reconstruction. However, TwIST blurs these details and no characters on the book can be identified.

Fig. 15: Real data: Performance comparison on UCF high-speed video reconstruction (raw data and code from [8]). Close-ups shown on right.

6.2 Color high-speed video

The color high-speed video data are from the high-speed motion, color and depth system [6]. A snapshot Bayer RGB measurement of size pixels encodes frames of the same size. Since we do not have the pre-trained GMM model of , we do not compare our algorithm DeSCI with three GMM based algorithms. As discussed in Sec. 6.1, GMM based algorithms perform similar to GAP-TV w.r.t. motion blur artifacts.

Triball color high-speed video. The results of the triball dataset are shown in Fig. 16. It can be seen clearly that the DeSCI results eliminate the motion blur artifacts compared with the GAP-TV results (see the close-up of the dark red ball in Fig. 16). Comparing with the GAP-wavelet results shown in [6], our DeSCI results recovered a much smoother background. The orange ball still suffers from motion blur artifacts because its moving direction keeps the same as the shifting direction of the mask in all frames, and shift-mask-based SCI systems fail to resolve motions with the same direction as the shifting mask.

Hammer color high-speed video. The results of the hammer dataset are shown in Fig. 17. It can be seen clearly that the DeSCI results recovered a much smoother background compared with GAP-wavelet used in [6] and sharper edges of the hammer compared with GAP-TV (see the close-ups of the hammer in Fig. 17).

Fig. 16: Real data: Performance comparison on triball high-speed color video reconstruction. A snapshot Bayer RGB measurement encoding frames captured from the real SCI system [6] is shown on the top-left. A close-up of the dark red ball is shown on the left. GAP-TV is compared with the proposed DeSCI algorithm.
Fig. 17: Real data: Performance comparison on hammer high-speed color video reconstruction. Two snapshot Bayer RGB measurements encoding frames captured from the real SCI system [6] is shown on the top-left. A close-up of the hammer is shown on the left. GAP-TV is compared with the proposed DeSCI algorithm.

6.3 Hyperspectral image data

We apply DeSCI on real snapshot-hyperspectral compressive imaging data and show that DeSCI provides significantly better results by preserving the spectra and reducing artifacts induced by compressed measurements. The real hyperspectral data is from the CASSI system [9].

Bird hyperspectral data. The bird data [9] consists of 24 spectral bands with each of size pixels. The reconstructed spectra and exemplar frames of the bird real hyperspectral data are shown in Fig. 18 and Fig. 19, respectively. We shift the reconstructed spectra two bands to keep align with optical calibration. So we only get 21 spectral bands shown in Fig. 18, compared with 24 bands shown in simulated reconstruction in Fig. 9. It can be seen clearly that DeSCI preserves the spectra property of the scene with correlation over 0.99. Besides, the reconstructed frames of DeSCI are clearer than those of GAP-TV; the results of GAP-TV suffer from blurry artifacts resulted from coded measurements. We notice that there are some over-smooth phenomena existing in the DeSCI results on this real data in Fig. 19, which might be due to the system noise in DeSCI reconstruction.

Fig. 18: Real data: Reconstructed spectra of real bird hyperspectral data. A snapshot measurement encoding 24 spectral bands captured from real CASSI system [9] is shown on the top-right. The original RGB image of the scene is shown on the top-left with a size of pixels. The spectra of four birds are shown on the middle and bottom rows. The correlation of the reconstructed spectra and the ground truth is shown in the legends.
Fig. 19: Real data: Reconstructed frames of real bird hyperspectral data.

Object hyperspectral data. The object data [9] consists of 33 spectral bands with each of size pixels. The reconstructed frames of DeSCI and comparison of exemplar frames with TwIST used in [9] and GAP-TV are shown in Fig. 20 and Fig. 21, respectively. Since there is no ground truth for the object data, we cannot compare the reconstructed spectra to the ground truth, nor the correlation. Our reconstructed frames are clear and free of blur. Each frame of a spectral band has a whole object or nothing, as shown in Fig. 20. Because the plastic object with the same color in this scene are made of the same material, the whole object has the same spectral band. In contrast to the yellow band of the TwIST and GAP-TV results shown in Fig. 21, DeSCI sees no fraction of the banana object. DeSCI reduces the mask artifacts significantly induced by compressed measurements.

Fig. 20: Real data: Reconstructed frames of real object hyperspectral data. A snapshot measurement encoding 33 spectral bands captured from real CASSI system [9] is shown on the top-right. The original RGB image of the scene is shown on the top-left with a size of pixels.
Fig. 21: Real data: Reconstructed frames of real object hyperspectral data.

7 Concluding remarks

We have proposed a new algorithm to reconstruct high-speed video frames from a single measurement in snapshot compressive imaging systems, with videos and hyperspectral images as two exemplar applications. The rank minimization approach is incorporated into the forward model of the snapshot compressive imaging system and a joint optimization problem is formulated. An alternating minimization algorithm is developed to solve this joint model. Our proposed algorithm has investigated the nonlocal self-similarity in video (hyperspectral) frames, and has led to significant improvements over existing algorithms. Extensive results of both simulation and real data have demonstrated the superiority of the proposed algorithm.

Most recently, the deep learning algorithms have been used in CS inversion [61, 62, 63] and as mentioned in the introduction, deep learning has been employed for video CS in [30, 31]

(but with some constraints). While most of these algorithms tried to learn an end-to-end inversion network via convolutional neural networks (CNN), Yang

et al. [62] incorporates the CNN into the ADMM framework. This inspires one of our future work to integrate the proposed ADMM framework with the deep learning based denoising algorithms [64, 65].

Our proposed algorithm can be used in other snapshot (or multiple shots) compressive imaging systems, for example, the depth compressive imaging [66], polarization compressive imaging [67], X-ray compressive imaging [68, 69, 70] and three-dimensional high-speed compressive imaging [7] systems. This will be another direction of our future work. We believe our results will encourage the researchers and engineers to pursue further in compressive imaging on real applications.

Acknowledgments

The authors would like to thank Dr. Patrick Llull for capturing the real video data of CACTI, Dr. Tsung-Han Tsai for capturing the real hyperspectral image data of CASSI, and Mr. Yangyang Sun and Dr. Shuo Pang for providing the UCF video data. This work was supported by the National Natural Science Foundation of China (grant Nos. 61327902, 61722110, 61627804, and 61631009).

References

  • [1] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, April 2006.
  • [2] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Transactions on Information Theory, vol. 52, no. 2, pp. 489–509, February 2006.
  • [3] Y. Hitomi, J. Gu, M. Gupta, T. Mitsunaga, and S. K. Nayar, “Video from a single coded exposure photograph using a learned over-complete dictionary,” in

    IEEE International Conference on Computer Vision (ICCV)

    , 2011, pp. 287–294.
  • [4] D. Reddy, A. Veeraraghavan, and R. Chellappa, “P2C2: Programmable pixel compressive camera for high speed imaging,” in

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , Conference Proceedings, pp. 329–336.
  • [5] P. Llull, X. Liao, X. Yuan, J. Yang, D. Kittle, L. Carin, G. Sapiro, and D. J. Brady, “Coded aperture compressive temporal imaging,” Optics Express, vol. 21, no. 9, pp. 10 526–10 545, 2013.
  • [6] X. Yuan, P. Llull, X. Liao, J. Yang, D. J. Brady, G. Sapiro, and L. Carin, “Low-cost compressive sensing for color video and depth,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, Journal Article, pp. 3318–3325.
  • [7] Y. Sun, X. Yuan, and S. Pang, “High-speed compressive range imaging based on active illumination,” Optics Express, vol. 24, no. 20, pp. 22 836–22 846, Oct 2016.
  • [8] ——, “Compressive high-speed stereo imaging,” Opt Express, vol. 25, no. 15, pp. 18 182–18 190, 2017.
  • [9] M. E. Gehm, R. John, D. J. Brady, R. M. Willett, and T. J. Schulz, “Single-shot compressive spectral imaging with a dual-disperser architecture,” Optics Express, vol. 15, no. 21, pp. 14 013–14 027, 2007.
  • [10] A. Wagadarikar, R. John, R. Willett, and D. J. Brady, “Single disperser design for coded aperture snapshot spectral imaging,” Applied Optics, vol. 47, no. 10, pp. B44–B51, 2008.
  • [11] A. Wagadarikar, N. Pitsianis, X. Sun, and D. Brady, “Video rate spectral imaging using a coded aperture snapshot spectral imager,” Optics Express, vol. 17, no. 8, pp. 6368–6388, 2009.
  • [12] X. Yuan, T.-H. Tsai, R. Zhu, P. Llull, D. J. Brady, and L. Carin, “Compressive hyperspectral imaging with side information,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 6, pp. 964–976, September 2015.
  • [13] X. Cao, T. Yue, X. Lin, S. Lin, X. Yuan, Q. Dai, L. Carin, and D. J. Brady, “Computational snapshot multispectral cameras: Toward dynamic capture of the spectral world,” IEEE Signal Processing Magazine, vol. 33, no. 5, pp. 95–108, Sept 2016.
  • [14] L. Gao, J. Liang, C. Li, and L. V. Wang, “Single-shot compressed ultrafast photography at one hundred billion frames per second,” Nature, vol. 516, no. 7529, pp. 74–77, 2014.
  • [15] M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 83–91, 2008.
  • [16] S. Jalali and X. Yuan, “Compressive imaging via one-shot measurements,” in IEEE International Symposium on Information Theory (ISIT), 2018.
  • [17] X. Yuan, “Generalized alternating projection based total variation minimization for compressive sensing,” in 2016 IEEE International Conference on Image Processing (ICIP), Sept 2016, pp. 2539–2543.
  • [18] X. Yuan, H. Jiang, G. Huang, and P. Wilford, “SLOPE: Shrinkage of local overlapping patches estimator for lensless compressive imaging,” IEEE Sensors Journal, vol. 16, no. 22, pp. 8091–8102, November 2016.
  • [19] J. Yang, X. Yuan, X. Liao, P. Llull, G. Sapiro, D. J. Brady, and L. Carin, “Video compressive sensing using Gaussian mixture models,” IEEE Transaction on Image Processing, vol. 23, no. 11, pp. 4863–4878, November 2014.
  • [20] J. Yang, X. Liao, X. Yuan, P. Llull, D. J. Brady, G. Sapiro, and L. Carin, “Compressive sensing by learning a Gaussian mixture model from measurements,” IEEE Transaction on Image Processing, vol. 24, no. 1, pp. 106–119, January 2015.
  • [21] C. A. Metzler, A. Maleki, and R. G. Baraniuk, “From denoising to compressed sensing,” IEEE Transactions on Information Theory, vol. 62, no. 9, pp. 5117–5144, 2016.
  • [22] H. Ji, C. Liu, Z. Shen, and Y. Xu, “Robust video denoising using low rank matrix completion.” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 1791–1798.
  • [23] J.-F. Cai, E. J. Candès, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM Journal on Optimization, vol. 20, no. 4, pp. 1956–1982, 2010.
  • [24] S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted nuclear norm minimization with application to image denoising,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 2862–2869.
  • [25] S. Gu, Q. Xie, D. Meng, W. Zuo, X. Feng, and L. Zhang, “Weighted nuclear norm minimization and its applications to low level vision,” International Journal of Computer Vision, vol. 121, no. 2, pp. 183–208, 2017.
  • [26] G. Liu, Z. Lin, and Y. Yu, “Robust subspace segmentation by low-rank representation,” in

    Proceedings of the 27th international conference on machine learning (ICML-10)

    , 2010, pp. 663–670.
  • [27] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, January 2011.
  • [28] M. Maggioni, G. Boracchi, A. Foi, and K. O. Egiazarian, “Video denoising, deblocking, and enhancement through separable 4-d nonlocal spatiotemporal transforms,” IEEE Transactions on Image Processing, vol. 21, pp. 3952–3966, 2012.
  • [29] W. Dong, G. Shi, X. Li, Y. Ma, and F. Huang, “Compressive sensing via nonlocal low-rank regularization,” IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 3618–3632, 2014.
  • [30] M. Iliadis, L. Spinoulas, and A. K. Katsaggelos, “Deep fully-connected networks for video compressive sensing,” Digital Signal Processing, vol. 72, pp. 9–18, 2018.
  • [31] K. Xu and F. Ren, “CSVideoNet: A real-time end-to-end learning framework for high-frame-rate video compressive sensing,” arXiv: 1612.05203, Dec 2016.
  • [32] K. Marwah, G. Wetzstein, Y. Bando, and R. Raskar, “Compressive light field photography using overcomplete dictionaries and optimized projections,” ACM Transactions on Graphics, vol. 32, no. 4, 2013.
  • [33] X. Lin, Y. Liu, J. Wu, and Q. Dai, “Spatial-spectral encoded compressive hyperspectral imaging,” ACM Transactions on Graphics, vol. 33, no. 6, pp. 1–11, 2014.
  • [34] D. J. Brady, A. Mrozack, K. MacCabe, and P. Llull, “Compressive tomography,” Advances in Optics and Photonics, vol. 7, no. 4, p. 756, 2015.
  • [35] T.-H. Tsai, P. Llull, X. Yuan, D. J. Brady, and L. Carin, “Spectral-temporal compressive imaging,” Optics Letters, vol. 40, no. 17, pp. 4054–4057, Sep 2015.
  • [36] D. J. Brady, W. Pang, H. Li, Z. Ma, Y. Tao, and X. Cao, “Parallel cameras,” Optica, vol. 5, no. 2, 2018.
  • [37] E. Candès, J. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications on Pure and Applied Mathematics, 2006.
  • [38] ——, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Transactions on Information Theory, 2006.
  • [39] L. Wang, Z. Xiong, G. Shi, F. Wu, and W. Zeng, “Adaptive nonlocal sparse representation for dual-camera compressive hyperspectral imaging,” IEEE Trans Pattern Anal Mach Intell, vol. 39, no. 10, pp. 2104–2111, 2017.
  • [40] L. Wang, Z. Xiong, H. Huang, G. Shi, F. Wu, and W. Zeng, “High-speed hyperspectral video acquisition by combining nyquist and compressive sampling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.
  • [41] E. H. Adelson and J. R. Bergen, The Plenoptic Function and the Elements of Early Vision.   Cambridge, MA: MIT Press, 1991, pp. 3–20.
  • [42] D. Kittle, K. Choi, A. Wagadarikar, and D. J. Brady, “Multiframe image estimation for coded aperture snapshot spectral imagers,” Applied Optics, vol. 49, no. 36, pp. 6824–6833, December 2010.
  • [43] H. Arguello and G. R. Arce, “Rank minimization code aperture design for spectrally selective compressive imaging,” IEEE Trans Image Process, vol. 22, no. 3, pp. 941–54, 2013. [Online]. Available: https://www.ncbi.nlm.nih.gov/pubmed/23060334
  • [44] S. Jalali and A. Maleki, “From compression to compressed sensing,” Applied and Computational Harmonic Analysis, vol. 40, no. 2, pp. 352–385, 2016.
  • [45] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma, “RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2233–2246, 2012.
  • [46]

    E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?”

    Journal of the ACM, vol. 58, no. 3, pp. 11:1–37, 2011.
  • [47] E. J. Candès and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational mathematics, vol. 9, no. 6, pp. 717–772, 2009.
  • [48] Y. Xie, S. Gu, Y. Liu, W. Zuo, W. Zhang, and L. Zhang, “Weighted schatten -norm minimization for image denoising and background subtraction,” IEEE Transactions on Image Processing, vol. 25, no. 10, pp. 4842–4857, 2016.
  • [49] E. J. Candés, M. B. Wakin, and S. P. Boyd, “Enhancing sparsity by reweighted minimization,” Journal of Fourier analysis and applications, vol. 14, no. 5, pp. 877–905, 2008.
  • [50] Z. Zha, X. Liu, X. Huang, X. Hong, H. Shi, Y. Xu, Q. Wang, L. Tang, and X. Zhang, “Analyzing the group sparsity based on the rank minimization methods,” The IEEE International Conference on Multimedia & Expo (ICME), pp. 883–888, 2017.
  • [51] J. Zhang, D. Zhao, and W. Gao, “Group-based sparse representation for image restoration,” IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 3336–3351, Aug 2014.
  • [52] W. Zuo, D. Meng, L. Zhang, X. Feng, and D. Zhang, “A generalized iterated shrinkage algorithm for non-convex sparse coding,” in IEEE International Conference on Computer Vision (ICCV), 2013, pp. 217–224.
  • [53] X. Liao, H. Li, and L. Carin, “Generalized alternating projection for weighted- minimization with applications to model-based compressive sensing,” SIAM Journal on Imaging Sciences, vol. 7, no. 2, pp. 797–823, 2014.
  • [54] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,” Proceedings of the National Academy of Sciences, vol. 106, no. 45, pp. 18 914–18 919, 2009.
  • [55] “Runner data,” https://www.videvo.net/video/elite-runner-slow-motion/4541/.
  • [56] “Drop data,” http://www.phantomhighspeed.com/Gallery.
  • [57] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
  • [58] W. Yin, S. Osher, D. Goldfarb, and J. Darbon, “Bregman iterative algorithms for -minimization with applications to compressed sensing,” SIAM J. Imaging Sci, pp. 143–168, 2008.
  • [59] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27, 2014, pp. 2672–2680.
  • [60] N. Halko, P. Martinsson, and J. Tropp, “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions,” SIAM Review, vol. 53, no. 2, pp. 217–288, 2011.
  • [61] K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, and A. Ashok, “Reconnet: Non-iterative reconstruction of images from compressively sensed measurements,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016, pp. 449–458.
  • [62] Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep admm-net for compressive sensing mri,” in Advances in Neural Information Processing Systems 29, 2016, pp. 10–18.
  • [63] X. Yuan and Y. Pu, “Parallel lensless compressive imaging via deep convolutional neural networks,” Optics Express, vol. 26, no. 2, pp. 1962–1977, Jan 2018.
  • [64] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155, July 2017.
  • [65] L. Zhang and W. Zuo, “Image restoration: From sparse and low-rank priors to deep priors [lecture notes],” IEEE Signal Processing Magazine, vol. 34, no. 5, pp. 172–179, 2017.
  • [66] P. Llull, X. Yuan, L. Carin, and D. Brady, “Image translation for single-shot focal tomography,” Optica, vol. 2, no. 9, pp. 822–825, 2015.
  • [67] T.-H. Tsai, X. Yuan, and D. J. Brady, “Spatial light modulator based color polarization imaging,” Optics Express, vol. 23, no. 9, pp. 11 912–11 926, May 2015.
  • [68] L. Wang, J. Huang, X. Yuan, K. Krishnamurthy, J. Greenberg, V. Cevher, D. J. Brady, M. Rodrigues, R. Calderbank, and L. Carin, “Signal recovery and system calibration from multiple compressive poisson measurements,” SIAM Journal on Imaging Sciences, vol. 8, no. 3, pp. 1923–1954, 2015.
  • [69] J. Huang, X. Yuan, and R. Calderbank, “Collaborative compressive x-ray image reconstruction,” in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, April 2015, pp. 3282–3286.
  • [70] ——, “Multi-scale bayesian reconstruction of compressive x-ray image,” in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, April 2015, pp. 1618–1622.