Current video coding techniques, such as HEVC SullivanOHW12 , are designed to have low-complexity decoders for broadcasting applications; this is based on the assumption that large amounts of resources are available at the encoder. However, many emerging real-time encoding applications, including low-power sensor network applications or surveillance cameras, call for an opposite system design that can work with very limited computing and power resources at the encoder. Distributed video coding (DVC) hoangvan2012
is an alternate solution for a low-complexity encoder, in which the encoding complexity is substantially reduced by shifting the most computationally-intensive module of motion estimation/motion compensation to the decoder. Nonetheless, other than the encoding process, the processes of image/video acquisition also need to be considered to further reduce the complexity of the encoderhoangvan2012 because current image/video applications capture large amounts of raw image/video data, most of which are thrown away in the encoding process for achieving highly compressed bitstream. In this context, compressive sensing (CS) has drawn interest since it provides a general signal acquisition framework at a sub-Nyquist sampling rate while still enabling perfect or near-perfect signal reconstruction donoho2006 . More clearly, a sparse signal that has most entries equal to zero (or nearly zero) can be sub-sampled via linear projection onto sensing bases; this can be reconstructed later by a sophisticated recovery algorithm, which basically seeks its -sparse approximation (i.e., the largest magnitude coefficients). Consequently, CS leads to simultaneous signal acquisition and compression to form an extremely simple encoder. Despite its simplicity, its recovery performance is heavily dependent on the recovery algorithm, in which some of the important factors are properly designing the sparsifying transforms and deploying appropriate denoising tools.
Although many CS recovery algorithms have been developed, including NESTA (Nesterov’s algorithm) becker2011 , gradient projection for sparse reconstruction (GPSR) figueiredo2007 , Bayesian compressive sensing ji2008 ; he2009 ; he2010 , smooth projected Landweber (SPL) gan2007 , and total variation (TV)-based algorithms li2013 ; zhang2010 ; xu2012 , their reconstructed quality has yet to be improved much, especially at a low subrate. For better CS recovery, Candes candes2008 proposed a weighted scheme based on the magnitude of signals to get closer to norm, while still using norm in the optimization problems. In a similar manner, Asif et al. asif2013 adaptively assigned weight values according to the homotopy of signals. As another approach, the authors in mun2009 ; van2013 ; van2014 utilized local smoothing filters, such as Wiener or Gaussian filters, to reduce blocking artifacts and enhance the quality of the recovered images. Despite these improvements, the performances of the aforementioned approaches are still far from satisfactory because much of the useful prior information of the image/video signals (e.g., the non-local statistics) was not taken into full account.
More recent investigations have sought to design a sparsifying transform to sparsify the image/video signal to the greatest degree because the CS recovery performance can be closer to that of sampling at the full Nyquist rate if the corresponding transform signal is sufficiently sparse donoho2006 . The direct usage of predetermined transform bases, such as the discrete wavelet transform (DWT) he2009 ; mun2009 , discrete cosine transform (DCT) he2010 ; mun2009 , or gradient transform li2013 ; zhang2010 ; xu2012 ; van2014 , is appealing due to their low complexity. However, predetermined transform bases cannot produce sufficient sparsity (i.e., the number of zero or close-to-zero coefficients is limited) for the signal of interest, thereby limiting their recovery performance. Because image and video signals are rich in nonlocal similarities (i.e., a pixel can be similar to other pixels that are not located close to it), usage of those nonlocal similarities buades2005 can generate a higher sparsity level to achieve better recovery performance; this is known as a patch-based sparse representation approach elad2006 . Note that this approach originally showed much success in image denoising elad2006 ; dabov2007 ; dabov2009bm3d ; chatterjee2012 and researchers have incorporated this idea into CS frameworks. Xu and Yin xu2014fast proposed a fast patch method for whole-image sensing and recovery under a learned dictionary, while Zhang et al. zhang2012image took advantage of hybrid sparsifying bases by iteratively applying a gradient transform and a three-dimensional (D) transform dabov2007 . By using the concept of decomposition, the authors in canh2016compressive also used a D transform for cartoon images to enhance the recovery quality. The D transform can be considered a global sparsifying transform because it is used for all patches of the recovered images. Dong et al. dong2014compressive
, motivated by the success of data-dependent transforms for patches (referred to as local sparsifying transforms) such as principal component analysis (PCA) or singular-value decomposition (SVD), proposed a method to enhance the sparsity level with thelogdet function to bring norm closer to norm, similar to the work of Candes candes2008 . Metzler et al. metzler2016denoising acquired a local sparsifying transform via block matching dabov2009bm3d and demonstrated the effectiveness of applying denoising tools to the CS recovery of the approximate message passing (AMP) method. However, because of the frame sensing that accesses the entire image at once, the work described in xu2014fast ; zhang2012image ; dong2014compressive ; metzler2016denoising requires extensive computation and huge amounts of memory for storing the sensing matrix fowler2012block ; thus, these approaches are not suitable as sensing schemes for real-time encoding applications or large-scale images/video.
Alternatively, block compressive sensing (BCS) has been developed to deal more efficiently with large-sized natural images and video by sensing each block separately using a block sensing matrix with a much smaller size. The compressive sensor can instantly generate the measurement data of each block through its linear projection rather than waiting until the entire image is measured, as is done in frame sensing. The advantages of BCS are discussed in gan2007 ; fowler2012block ; dadkhah2014block ; dinh2016iterative . However, in BCS, the recovery performance has yet to be substantially improved in comparison to that of frame sensing. To address this problem on the sensing side, a Gaussian regression model between the coordinates of pixels and their gray levels can be used to achieve better performance compared to traditional Gaussian matrices han2015novel . Additionally, Fowler et al. fowler2011multiscale developed an adaptive subrate method (i.e., multi-scale BCS) to exploit the different roles of wavelet bands. On the recovery side, for example, Dinh et al. dinh2014weighted designed overlapped recovery with a weighted scheme to reduce the blocking artifacts caused by block recovery. Chen et al. chen2011compressed used the Tikhonov regularization and residual image information to enhance the smooth projected Landweber gan2007 . Furthermore, to enrich the details of recovered images, the K-SVD algorithm elad2006 was used in zhang2014image . By sharing the same idea in xu2014fast ; zhang2012image ; dong2014compressive ; metzler2016denoising ; zhang2014image where nonlocal similarities are exploited to design the local sparsifying transform, group-based sparse representation (GSR) zhang2014group can achieve better recovery performance (in terms of the peak signal-to-noise-ratio (PSNR)) than other algorithms that were previously designed for BCS. However, its recovered images still contain many visual artifacts since the nonlocal searching and collecting patches based on the initial recovered images produced by chen2011compressed often have poor quality at low subrates. Consequently, this implies that more efforts are required for improving both the objective and subjective quality.
This paper attempts to improve the recovery performance of the BCS framework by using TV minimization, which is good at preserving edges li2013 , with multiple techniques consisting of reducing blocking artifacts in the gradient domain, denoising the Lagrangian multipliers, and enhancing the detailed information with patch-based sparse representation. Furthermore, the proposed recovery methods are easily extendible to compressive sensing and encoding problems of video do2009distributed ; mun2011residual ; tramel2011video ; van2014block ; kang2009distributed . Specifically, our main contributions are summarized as follows.
For BCS of images, we propose a method, referred to as multi-block gradient processing, that addresses the blocking artifacts caused by block-by-block independent TV processing during recovery. Furthermore, based on our observation that both image information (e.g., edges and details) and high-frequency artifacts and staircase artifacts are still prevalent in the Lagrangian multiplier of the TV optimization, we propose a method to reduce such artifacts by denoising the Lagrangian multiplier directly with a nonlocal means (NLM) filter. Because the direct application of the NLM filter is not effective in preserving local details with low contrast buades2005 , we further propose enriching these low-contrast details through an additional refinement process that uses patch-based sparse representation. We propose using both global and local sparsifying transforms because the single usage of either transform limits the effective sparse basis and achievement of a sufficient sparsity level for noisy data. The proposed recovery method demonstrates improvements for BCS of images compared to previous works he2009 ; he2010 ; mun2009 ; van2013 ; chen2011compressed ; zhang2014image ; zhang2014group .
For BCS of videos, we extend the proposed method to a compressive video sensing problem known as block distributed compressive video sensing (DCVS). An input video sequence is divided into groups of pictures (GOP), each of which consists of one key frame and several non-key frames. These undergo block sensing by a Gaussian sensing matrix. The proposed method first recovers the key frame using the proposed recovery method. Then, for non-key frames, side information is generated by exploiting measurements of the non-key and previously recovered frames in the same GOP. Improved quality of the non-key frames is sought by joint minimization of the sparsifying transforms and side information regularization. Our experimental results demonstrate that the proposed method performs better than existing recovery methods designed for block DCVS, including BCS-SPL using motion compensation (MC-BCS-SPL) mun2011residual or BCS-SPL using multi-hypothesis prediction (MH-BCS-SPL) tramel2011video .
The rest of this paper is organized as follows. Section 2 briefly presents works related to the BCS framework with some discussion. The proposed recovery method for BCS of images is described in Section 3, and its extension to the block DCVS model is addressed in Section 4. Section 5 evaluates the effectiveness of the proposed methods compared to other state-of-the-art recovery methods. Finally, our conclusions are drawn in Section 6.
2 Block compressive sensing
In the BCS framework, a large-sized image
is first divided into multiple non-overlapping (small) blocks. Let a vectorof length denote the th block, which is vectorized by raster scanning. Its measurement vector is generated through the following linear projection by a sensing matrix
A ratio denotes the subrate (or sub-sampling rate, i.e., the measurement rate). BCS is memory-efficient as it only needs to store a small sensing matrix instead of a full one corresponding to the whole image size. In this sense, block sampling is more suitable for low-complexity applications.
The CS recovery performance heavily depends on the mutual coherence of the sensing matrix which is computed as eldar2012compressed :
Here, and are any two arbitrary columns of ; denotes the inner product of two vectors. According to the Welch bounds eldar2012compressed , is limited in the range of . Additionally, at a low subrate (i.e., ), it can be approximated as
Note that a low mutual coherence is preferred for CS, and that the lower bound of (3) is inversely proportional to . A low mutual coherence is harder to achieve with a small block size, although it is attractive in terms of memory requirements. This explains the limited recovery quality of BCS with a small block size, despite the great amount of research that has been conducted on this topic he2009 ; he2010 ; mun2009 ; dinh2014weighted ; chen2011compressed ; zhang2014image ; dinh2014measurement ; kulkarni2016reconnet .
sought to obtain structured sparsity of signals in the Bayesian framework with a Markov-chain Monte-Carlohe2009 or variational Bayesian framework he2010 . However, for practical image sizes, these approaches demand impractical computational complexity; thus, the search must be terminated early, and the recovered images suffer from high-frequency oscillatory artifacts zhang2012image . The recoveries mun2009 ; chen2011compressed are much faster than those previous ones, but they are limited by their predetermined transforms. Specifically, non-iterative recovery was demonstrated in kulkarni2016reconnet
with a convolutional neural network being used for the measurement data. The work inzhang2014image ; zhang2014group uses a single patch-based sparse representation; therefore, its reconstructed quality is still not satisfactory. The initial images used for adaptive learning of sparsity contain a large amount of noise and artifacts; thus, using only one of them limits the definition of the effective sparse basis and makes it difficult to achieve a sufficient sparsity level for noisy data. These observations motivated us to design a new recovery method for high recovery quality, as discussed in the next section.
3 Proposed recovery for block compressive sensing of image
In this section, we design a recovery method for BCS of images. For this, we modify the TV-based recovery method li2013 for BCS by introducing a multi-block gradient process to reduce blocking artifacts and directly denoise the nonlocal Lagrangian multiplier to mitigate the artifacts generated by the TV-based methods. Also, due to the limitations of the NLM filter in preserving the image texture, a patch-based sparse representation is also designed to enrich the local details of recovered images.
3.1 CS recovery for BCS framework
The first recovery method for BCS was proposed by L. Gan gan2007 , who incorporated a Wiener filter and Landweber iteration with the hard thresholding process. S. Mun et al. mun2009 further improved this idea by adding directional transforms of DCT, dual-tree DWT (DDWT), or contourlet transform (CT). The visual quality of the best method mun2009 , namely the SPL with the dual-tree wavelet transform (SPLDDWT), is shown in Figure (a)a for the image Leaves. The recovered image suffers from high-frequency oscillatory artifacts candes2006stable and has blurred edges. TV has been shown to be effective for frame CS li2013 in preserving edges and object boundaries in recovered images. As expected, the recovered image by TV, shown in Figure (b)b, looks much sharper than the image recovered via SPLDDWT mun2009 , although TV is applied to BCS for block-by-block recovery. However, it shows significant blocking artifacts, due to the block-independent TV processing. Motivated by this, we investigate a TV-based recovery scheme for BCS with an emphasis on improving the TV method that is applied to BCS such that it does not suffer from blocking artifacts. Our TV-based BCS recovery method has several implications, as follows.
3.2 Noise and artifacts reduction
As mentioned above, independent block-by-block TV processing makes images suffer heavily from blocking artifacts as in Figure (b)b. When TV is computed separately for individual blocks as in xu2012 , a good de-blocking filter should also be used to mitigate the blocking artifacts. When the BCS scheme gan2007 is in use, a block diagonal sensing matrix corresponding to a whole image of size (assuming it consists of blocks) and its measurement are given as
We design a method, referred to as multi-block total variation (MBTV), based on a multi-block gradient process, as depicted in Figure (a)a. This calculates the gradient for TV over multiple blocks such that the discontinuities at block boundaries can be reduced significantly by minimizing the gradient. Notice that, if this method is applied to all blocks in a recovered image, it is equivalent to a frame-based gradient calculation. The visual quality of the recovered image Leaves, which is illustrated in Figure (c)c, demonstrates that many of the blocking artifacts are reduced compared to the block-by-block TV-based recovery (Figure (b)b). Here, we use a small sensing matrix () to visualize the significant improvements of MBTV. Note that the recovered images usually suffer from more blocking artifacts with a small sensing operator when they are independently recovered block-by-block, as shown in Figure (b)b.
The proposed MBTV-based recovery is described in detail below. The constrained problem of TV-based CS is expressed as
Here, , and , where and denote the horizontal and vertical gradient operators, respectively. and are Lagrangian multipliers, and and are positive penalty parameters. The key idea of the augmented Lagrangian method is to seek a saddle point of that is also the solution of (6). At the th iteration, by acquiring the splitting technique afonso2010fast , (7) is iteratively solved by two so-called sub-problems, and , as shown below:
The solution of (8) is found by the shrinkage-like formula, where denotes the element-wise product:
Because (9) is a quadratic function, its solution can be achieved by calculating the first derivative of the sub-problem . However, to reduce the computational complexity of the Moore-Penrose inverse li2013 , we also use gradient descent, as proposed in li2013 ; van2014 :
The two Lagrangian multipliers are then updated by li2013 :
Since TV basically assumes piecewise smoothness, it cannot avoid losing detailed information dong2013compressive ; this is a valid assumption for natural images in smooth regions but is less applicable in non-stationary regions near edges. Consequently, so-called staircase artifacts occur in the recovered image, as shown in Figure (c)c. Moreover, it is worth emphasizing that, even though the signal acquisition is assumed to be noise-free (i.e., ), image signals cannot be perfectly recovered because they cannot be described exactly by -sparse approximation, as shown in ji2008 . Therefore, if a compressible signal of length in a selected transform domain is -term approximated by its -largest entries in magnitude , then the remaining
elements can be considered as recovery error or noise. By the central limit theorem, it is reasonable to assume that the noise is Gaussian if the sensing matrix is randomji2008 ; in this scenario, the application of a filter will help smoothing the recovered images gan2007 ; mun2009 ; van2013 ; van2014 . Moreover, for image denoising, the idea of applying a denoising technique to a selected derived image (for example, the normal vectors of the level curves of the noisy image lysaker2004noise , the curvature image bertalmio2014denoising , or the combined spatial-transformed domain image knaus2013dual ) might be more effective than directly smoothing the corresponding original noisy image. Motivated by this idea, we suggest that smoothing the Lagrangian multiplier can effectively enhance the CS recovered image quality.
Lagrangian multipliers are used to find the optimum solution for a multivariate function of CS recovery. The Lagrangian multipliers that represent the gradient image and the measurement vector ( and , respectively) should have their own roles in solving the ill-posed CS problems. Specially, is updated by the gradient image which naturally contains a rich image structure. Hence, in CS recovery, we note that in (14) can be seen as a noisy version of the gradient image. Indeed, the noise can actually be seen if Figure (a)a and (b)b are compared. With full-Nyquist sampling (i.e., a subrate of ), there is no noise in ; however, there exists a large amount of noise if the subrate is lowered to . Also note that, according to the splitting technique, plays a role in estimating the solution . Therefore, a more exact will provide more accuracy to the sub-problem and ultimately produce a superior recovered image. Consequently, this suggests the importance of improving the quality of in order to obtain better quality with the augmented Lagrangian TV recovery. A proper process should be designed to mitigate noise and artifacts in . Rather than utilizing Wiener or Gaussian filters, which might easily over-smooth the recovered image van2013 , we employ the nonlocal means (NLM) filter buades2005 , which is well-known for its denoising ability while also preserving textures by employing an adaptive weighting scheme with a smoothing parameter that depends on the amount of noise in the signals. The new method to update the Lagrangian multiplier is designed as
Here denotes the NLM filtering operator buades2005 . We first update by the traditional method li2013 in order to estimate a temporal version (denoted as in Step ). Next, the NLM filter is applied to reduce the noise in Step . Figure (c)c visualizes the efficiency of the proposed method. After NLM denoising, the Lagrangian multiplier is much cleaner and shows image structures better. Moreover, Figure (d)d shows the recovered image “Leaves” at a subrate of , indicating a reduction of the high-frequency artifacts. The proposed combination of MBTV and the nonlocal Lagrangian multiplier (NLLM) is referred to as MBTV-NLLM and is summarized in Algorithm 1.
At this point, we stress that utilizing the NLM filter for the Lagrangian multiplier yields better recovered images in terms of both subjective and objective qualities. In addition, NLLM has lower computational complexity than nonlocal regularization methods that directly use an NLM filter for noisy images, as in zhang2010 ; zhang2013improved , since the Lagrangian multipliers are only updated if the recovered images are significantly changed. One can refer to Trinh et al. van2014 for both a theoretical analysis and a numerical comparison between NLLM and nonlocal regularizations zhang2010 ; zhang2013improved . In this paper, we focus on a valid explanation for the gain of the NLLM method, as discussed below.
Mathematically, the error bound of our method (MBTV-NLLM) is smaller than that of traditional TV li2013 due to the following local convergence statement bertekas1982constrained . Assume that and are the solutions (see (11) and (16)) of the proposed method at the th iteration, where and are the solutions of TV li2013 . With a positive scalar , according to proposition bertekas1982constrained , the reconstructed errors of MBTV-NLLM and TV li2013 are
Here, is the Lagrangian multiplier corresponding to the solution . Hence, we can set up error bounds:
Note that the solution prefers to be a clean version (see Figure (a)a). NLLM tries to reduce the noise in by acquiring the NLM filter (i.e., see (16)). This implies that is closer to than is (i.e., compare Figure (b)b with Figure (c)c), which can be represented as . Thus, we obtain
The above error coincides with recent reports needell2013stable , confirming that the spatial error is bounded by the gradient error. Although the NLM filter provides nonlocal benefits, it still has difficulties in preserving many details in images with low contrast. This drawback is caused by the fact that two very similar pixels on opposite sides produce inaccurate weights for the NLM filter. As a result, some artifacts near edges cannot be sufficiently mitigated without losing detailed information maleki2012suboptimality . In this paper, additional processing to enrich the local details through patch-based sparse representation is designed to solve the problem, as proposed in the next sub-section.
3.3 Refinement for recovered images with patch-based sparse representation
Classical predetermined transforms, such as DCT or wavelet transforms, cannot always attain a sparse representation of complex details. For example, sharp transitions and singularities in natural images are not expressed well by DCT. In the same way, D wavelets might perform poorly for textured or smooth regions dabov2007 . Recently, for image restoration applications such as denoising, inpainting, or deblurring, patch-based sparse representation has been actively investigated to deal with the complex variations of natural images. Suppose that (where is the total number of patches) denotes an column vector representing the th image patch extracted by a patch-extracting operator through from an image of size represented by an column vector . In this scenario, the image is synthesized as
where is the regular transpose. Moreover, assume to be sparse over a dictionary with its coefficient vector (that is, ). and denote the concatenation of dictionaries and , respectively. Then, in (22) is further expressed in a patch-based sparse representation as
The operator makes the patch-based sparse representation more compact elad2006 . Briefly, utilizing a patch-based sparsifying transform to de-correlate the signal and noise in the transform domain is presented by the five following basic steps dabov2007 ; dabov2009bm3d ; chatterjee2012 :
Group similar patches: use a nonlocal search to find patches that are similar to the reference patch and stack them in a group.
Forward transform: apply a sparsifying transform (i.e., global or local sparsifying transforms) to each group for transformed coefficients.
Thresholding process: decollate signal and noise by keeping only the significant coefficients. The remaining coefficients are considered to be noise and are discarded. For the CS viewpoint, this step can be referred to as -sparse approximation gan2007 .
Inverse transform: obtain the estimates for all grouped patches.
Weighting process: return the pixels of the patches to the original locations. The overlapping patches are appropriately weighted according to the number of times each pixel repeats.
For the -sparse approximation (Steps , , and ), choosing a proper sparsifying basis will determine the recovered image quality. The authors in zhang2012image applied a global transform dabov2007 that combined D wavelet transform and D DCT transform for all grouped blocks. A predetermined global transform is advantageous in terms of its simplicity; however, it cannot reflect the various sparsity levels of all groups. Therefore, a local transform dabov2009bm3d ; chatterjee2012 was calculated for each individual group to adaptively support the various local sparsity levels. If the local sparsifying transform is poorly designed due to data heavily contaminated by noise, the recovered images will have serious visual artifacts.
Another important quality issue is determining how to collect the proper groups. Related to the nonlocal search discussed in buades2005 ; sutour2014adaptive , patch-based sparse representation still faces some explicit challenges with two outright problems sutour2014adaptive : for singular structures, it might fail to find similar patches, thereby producing poor results; and
due to noise, it may detect incorrect patches (i.e., selecting some patches that do not actually belong to the same underlying structure). This can eventually cause over-smoothing. Below, we show that the similarity between a recovered image and its original heavily depends on the variance of error.
Let us assume that the elements of the error vector
are independent and come from a normal distribution with zero mean and varianceHere, represents a restored version of an original image after performing patch-based sparse representation. Since are also independent and come from the half-normal distribution with mean and variance
, a new random variablehas mean and variance of
Based on the Chebyshev inequality in probability, for a value,
With a sufficiently large image size (i.e., as becomes large), the probability of similarity between and in (27) approaches . The implication of this is twofold:
First, a good patch-based sparse representation should produce less estimation error (i.e., is small).
Second, as the noise becomes smaller, the patch-based sparse representation performs better.
The probability of the estimation error in (27) is important and suggests the idea of combining the local and global sparsifying transforms. At the th iteration, the recovered image is first updated by the five aforementioned steps with a local sparsifying transform. The output of this stage is referred to as , as shown in Figure 4. Thanks to the local sparsifying transform, has less noise and fewer artifacts than the input image . Additionally, this stage generates a more desirable input version for the second stage, in which we determine the sparsity levels of signals via a global sparsifying transform in order to produce the output image . Our suggested design is described below.
Generally, improving the sparsity level of a signal via a proper transform in CS can be carried out using xu2014fast ; zhang2012image ; dong2014compressive ; metzler2016denoising ; zhang2014image ; zhang2014group ; van2014block
The unconstrained problem of (28), according to the patch-based sparse representation, is formulated to a more tractable optimization problem as
where is a slack variable ensuring that (29) is equal to (28). We further note that (29) is a mixed optimization problem that aims at minimizing the cost between the sparsifying coefficients of all of the patches and compressive sensing. For a very simple encoder, the sparsifying transform is moved to the decoder gan2007 , which means that measurements are directly acquired in the spatial domain (i.e., ). According to the modified augmented Lagrangian approach in afonso2011augmented , which is a closed form of a sparsifying transform, (29) is changed to:
Here, is a positive penalty parameter. The scaled vector is then updated as
Using the splitting technique afonso2010fast , we minimize (30) by alternatively solving the and sub-problems. More clearly, is solved by seeking sparsity levels with the five steps of patch-based sparse representation shown in Figure 4. The sub-problem is solved by gradient descent with an optimized step , and the direction is calculated by the BarzilaiBorwein method li2013 ; van2013 ; van2014 :
is an identity matrix. Thecombination of local and global sparsifying transforms (CST) and MBTV-NLLM is referred to as MBTV-NLLM-CST. A summary of our proposed recovery method for the BCS framework is depicted in Algorithm 2. Figure 5 verifies the effectiveness of the suggested design. More clearly, MBTV-NLLM-CST (P) visualizes the reduction in error variance by the first stage in Figure 4 (i.e., using a local sparsifying transform), while MBTV-NLLM-CST (P) points out the noise reduction after the two stages in Figure 4. The gap between the two graphs of MBTV-NLLM-CST (P) and MBTV-NLLM-CST (P) indicates the gain from the additional global sparsifying transform. To better understand the nature of this gain, Figure 5 also shows two extra graphs corresponding to MBTV-NLLM with a global sparsifying transform (MBTV-NLLM-GST) and the one with a local sparsifying transform (MBTV-NLLM-LST) only. We recall that the PSNR value is inversely proportional to the variation of error; thus, from Figure 5, we can confirm that our proposed algorithms always converge to a feasible solution. This is valuable because proving the convergence of a recovery algorithm that deploys a sparsifying transform is not trivial zhang2014image ; zhang2014group . More interestingly, the results also reveal that the proposed method is better than previous ones zhang2014image ; zhang2014group , which use either a global sparsifying transform or a local sparsifying transform.
4 Block distributed compressive video sensing
In this section, we extend the proposed recovery method to block DCVS, as shown in Figure 6. The main advantage of this design over other existing ones, such as the design proposed in do2009distributed , is that it does not require full Nyquist sampling.
4.1 Key frame recovery
A key frame is recovered using the proposed recovery scheme in Algorithm 2, which was developed for still images. That is, an initial estimate is generated by MBTV-NLLM, and then a sparsifying transform is applied to enrich the local details of the reconstructed key frame.
4.2 Side information generation
In distributed video decoding, side information (SI) plays an important role because inaccurate SI strongly degrades the recovery quality of non-key frames. The frames that can be used for reconstructing a non-key frame (denoted by ) as the side information are
Here, denotes the th GOP in a video sequence, and the subscript indicates a non-key frame. However, the definition in (34) is impractical when finding proper SI frames for DCVS simply because the only information available at the decoder is the measurement data of non-key frames. According to the JohnsonLindenstrauss lemma baraniuk2008simple , the selection in (34) can be equivalently written as
In a GOP, (35) allows all of the non-key frames similar to the current non-key frame in the measurement domain to be gathered. The selected SI frames are not much different from each other; thus, an initial non-key frame is computed as their average. Otherwise, it is considered to be a frame with the minimum value of . Our goal is to find the best initial non-key frame.
4.3 Recovery of non-key frames
The non-key frame is refined by its measurement vector, sparsifying transform, and the SI frame denoted by :
Here, and are positive penalty parameters. At each iteration, the side information is updated, and then the two sub-problems and are solved. Additionally, the two vectors that represent the sparsifying transform and side information regularization (i.e., and , respectively) are updated as
In detail, the side information is first initialized by (35) and then updated at each iteration by a multi-hypothesis (MH) prediction using Tikhonov regularization tramel2011video . Additionally, the sub-problem is solved by the patch-based sparse representation with a combination of local and global sparsifying transforms, as explained previously. Finally, the sub-problem is solved by gradient descent with the optimal step size and direction estimated by the BarzilaiBorwein method li2013 ; van2013 ; van2014 :
5 Experimental results
5.1 Test condition
The recovery performance of the proposed recovery schemes for the BCS framework is evaluated by extensive experiments using both natural images and video. The parameters in the proposed method are experimentally chosen to achieve the best-reconstructed quality. The positive penalty parameters and of MBTV-NLLM are equal to and , respectively. The NLM filter has a patch size of and a search range of . The smoothing parameter is . The outer stopping criterion is defined as , while the inner stopping criterion is defined as . For patch-based sparse representations, the size of the groups is , and overlapping is used between patches with an overlapping step size of pixels. The training window for collecting groups is in size. The penalty parameters and of the refinement problems are set to , while is set to . Additionally, for video recovery, the scale vector is initially set to a zero vector, and the value of is set to . The testing conditions required for the other recovery methods are established based on their suggested recommendations he2009 ; he2010 ; mun2009 ; chen2011compressed ; zhang2014image ; zhang2014group . All of the experiments are performed by MatlabR2011a running on a desktop Intel Corei RAMG with the Microsoft Windows operating system. For objective analysis, we use the PSNR (in units of dB). Additionally, the Feature SIMilarity (FSIM) zhang2011fsim is used for visual quality evaluation. FSIM is in the range of , where a value of 1 indicates the best quality.
|Gain by MBTV-NLLM||5.65||0.110||4.19||0.079||3.05||0.049||2.70||0.041||0.66||0.011||-||-|
|Gain by MBTV-NLLM-CST||2.17||0.024||1.04||0.009||1.24||0.008||0.54||0.003||-||-|
5.2 Test results with still images
Eight well-known natural images are used, that is, Lena, Leaves, Monarch, Cameraman, House, Boat, and Pepper, as shown in Figure 7. For fair comparisons with previous works he2009 ; he2010 ; mun2009 ; chen2011compressed ; zhang2014image ; zhang2014group , the natural images are divided into non-overlapping blocks ( in size). They are compressively sensed by an i.i.d. random Gaussian sensing matrix. Table 1 compares five well-known existing CS recovery methods (i.e., the tree-structured CS with variational Bayesian analysis using DWT (TSDWT) he2009 , the tree-structured CS with variational Bayesian analysis using DCT (TSDCT) he2010 , SPLDDWT mun2009 , the SPL using the contourlet transform (SPLCT) mun2009 , and the multi-hypothesis CS method (MH) chen2011compressed ) with the proposed MBTV-NLLM when patch-based sparse representation is not employed. It is worth emphasizing that, for this test case, MH is by far the most state-of-the-art method. However, it turns out that the proposed MBTV-NLLM is competitive with MH and much better than the others. In the best case, MBTV-NLLM surpasses TSDWT, TSDCT, SPLCT, SPLDWT, and MH by dB, dB, dB, dB, and dB, respectively. Thus, it successfully demonstrates the effectiveness of the proposed schemes: MBTV and denoising of the Lagrange multiplier. The last rows of Tables 1 and 2 show the gains achieved by the proposed MBTV-NLLM and MBTV-NLLM-CST, respectively, with respect to each individual method.
Further effectiveness of the proposed patch-based sparse representation is demonstrated in Table 2. In zhang2014image , the authors employed KSVD elad2006 to design a recovery method that used adaptively-learned sparsifying (RALS), while the group sparse representation (GSR) zhang2014group acquired the local sparsifying transform. GSR is certainly better than RALS because of the local sparsifying basis for each group. In our three proposed methods in Table 2, MBTV-NLLM-GST attains better reconstructed quality than RALS, while MBTV-NLLM-LST and MBTV-NLLM-CST outperform GSR. This is because the better initial image created by MBTV-NLLM has beneficial effects on grouping patches by facilitating a better non-local search and defining more appropriate sparsifying bases for each group (when using a local sparsifying transform). In particular, the PSNR of MBTV-NLLM-CST is as much as dB higher than GSR for the recovered image Leaves.
Furthermore, for a complex image with as much detail as is found in the image Lena, MBTV-NLLM-CST is not as successful as MBTV-NLLM-GST at a subrate of . The recovered image lacks spatial detail at a very low-subrate such that the combination of local and global sparsifying transforms might make it slightly over-smoothed. The visual quality of the proposed schemes and previous work are compared in Figure 8 and Figure 9 using the image Monarch at subrate and Cameraman at subrate . This test shows that, while all conventional CS recovery schemes he2009 ; he2010 ; mun2009 ; chen2011compressed ; zhang2014image ; zhang2014group suffer from a large degree of high-frequency artifacts, including the state-of-art method (GSR), the three proposed schemes seem to work much better. However, the recovered image of MBTV-NLLM-GST still has some artifacts at a very low subrate (e.g., see the image Monarch at subrate ). This indicates that a global sparsifying transform cannot adequately express the sparsifying levels for all groups.
Figure 10 quantifies the effectiveness of MBTV-NLLM-CST according to block sizes utilizing three images (i.e., Lena, Leaves, and Cameraman). Increasing the size of the sensing matrix yields better quality in the recovered images in terms of the PSNR. For example, at subrate with a block size of , the recovered image Lena has a PSNR of dB. However, this value can be increased up to dB with a block size of . These results coincide with our analysis in Section II based on the RIP property.
5.3 Test results with video for block DCVS
The effectiveness of the proposed CS recovery design is also evaluated with the first frames of three QCIF video sequences: News, Mother-daughter, and Salesman videotest . The GOP is set to . Input frames are split into non-overlapping blocks ( in size), each of which is subject to BCS by an i.i.d. random Gaussian sensing matrix. To achieve better quality, key frames are sensed with a subrate of , while non-key frames are sensed by a subrate ranging from to . Figure 11 shows the improvements of the reconstructed key frames of the proposed methods compared with MC-BCS-SPL and MH-BCS-SPL. All three proposed recovery schemes show far better visual quality than the previous block DCVS in mun2011residual ; tramel2011video . On average, over the three tested video sequences, MBTV-NLLM-CST shows dB and dB gains over MC-BCS-SPL and MH-BCS-SPL, respectively.
The improvements of non-key frames for various block DCVS are shown in Figure 12. Due to the findings that TV can preserve edge objects, the nonlocal Lagrangian multiplier can reduce staircasing artifacts, and 3) the patch-based sparsifying transforms can enrich detail information, our proposed CS recovery schemes also produce far better PSNR values. Compared with MC-BCS-SPL, MBTV-NLLM-CST demonstrates gains between dB and dB depending on the subrate. In the best case, our recovery scheme is better by an average of dB compared to MC-BCS-SPL over non-key frames.
Moreover, the visual quality of the first non-key frame of the News sequence is illustrated in Figure 13. Because we utilized temporal redundancy over the frames, detailed information could be preserved for all block DCVS schemes. The high values of FSIM, even at a subrate of , demonstrate how crucial it is to exploit the correlation of frames in compressive video sensing. However, MC-BCS-SPL mun2011residual and MH-BCS-SPL tramel2011video still suffer from high-frequency oscillatory artifacts. Meanwhile, the proposed schemes no longer appear to have artifacts (i.e., the FSIM values are very close to ).
5.4 Computational complexity
Excluding patch-based sparse representation, the main computational complexity of the proposed MBTV-NLLM comes from the high cost of the NLM filter. More specifically, if the search range and size of the similarity patches of the NLM filter are and , respectively, then, for an image of size , the computational complexity of this filter is . For natural images that are in size, with a subrate of , MBTV-NLLM takes around min. to recover in our simulation. This is comparable to other methods that also do not use patch-based sparsifying transforms. That is, with the image Leaves at a subrate of 0.1, MBTV-NLLM needs 63 s, MH takes s, and SPLDDWT consumes s. Alternatively, TSDCT requires much more decoding time than the others, requiring about min. to recover.
Patch-based sparse representation acts as a computational bottle-neck. For a patch size of , search range of , group size of (where is the number of similar patches in a group), and two constant values and , the patch-based sparse representation using a local sparsifying transform demands a computational complexity of . In this way, MBTV-NLLM-CST is more complex due to the second stage containing the global sparsifying transform. Subsequently, for recovery of a QCIF video frame using MBTV-NLLM-CST, a key frame demands around min., and a non-key frame requires s. Therefore, complexity optimization of patch-based sparse representation is an important task for future works. Specifically, to reduce complexity, we may be able to integrate our reconstructed algorithms with a robust sensing matrix such as Gaussian regression-based han2015novel or multi-scale-based sensing matrices fowler2011multiscale .
This paper proposed recovery schemes for BCS of still images and video that can recover pictures with high-quality performance. For compressive imaging, the modified augmented Lagrangian total variation with a multi-block gradient process and nonlocal Lagrangian multiplier are used to generate an initial recovered image. Subsequently, the patch-based sparse representation enhances the local detailed information. Our design is also easily extendible to DCVS. More specifically, key frames are reconstructed to have improved quality and used to create initial versions of non-key frames. Subsequently, non-key frames are refined by patch-based sparsifying transform-aided side information regularization. Our experimental results demonstrated the improvements made by the proposed recovery schemes compared to representative state-of-the-art algorithms for both natural images and video.
This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIP) (No.--), by the MSIP G-ITRC support program (IITP-2016-R6812-16-0001) supervised by the IITP, and by the ERC via Grant EU FP 7 - ERC Consolidator Grant 615216 LifeInverse.
- (1) G. J. Sullivan, J.-R. Ohm, W. Han, T. Wiegand, Overview of the high efficiency video coding (HEVC) standard., IEEE Transactions on Circuits and Systems for Video Technology 22 (12) (2012) 1649–1668.
- (2) X. HoangVan, B. Jeon, Flexible complexity control solution for transform domain Wyner-Ziv video coding, IEEE Transactions on Broadcasting 58 (2) (2012) 209–220.
- (3) D. L. Donoho, Compressed sensing, IEEE Transactions on Information Theory 52 (4) (2006) 1289–1306.
- (4) S. Becker, J. Bobin, E. J. Candès, Nesta: A fast and accurate first-order method for sparse recovery, SIAM Journal on Imaging Sciences 4 (1) (2011) 1–39.
- (5) M. A. Figueiredo, R. D. Nowak, S. J. Wright, Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems, IEEE Journal of Selected Topics in Signal Processing 1 (4) (2007) 586–597.
- (6) S. Ji, Y. Xue, L. Carin, Bayesian compressive sensing, IEEE Transactions on Signal Processing 56 (6) (2008) 2346–2356.
- (7) L. He, L. Carin, Exploiting structure in wavelet-based Bayesian compressive sensing, IEEE Transactions on Signal Processing 57 (9) (2009) 3488–3497.
- (8) L. He, H. Chen, L. Carin, Tree-structured compressive sensing with variational Bayesian analysis, IEEE Signal Processing Letters 17 (3) (2010) 233–236.
- (9) L. Gan, Block compressed sensing of natural images, in: IEEE 15th International Conference on Digital Signal Processing, 2007, pp. 403–406.
- (10) C. Li, W. Yin, H. Jiang, Y. Zhang, An efficient augmented lagrangian method with applications to total variation minimization, Computational Optimization and Applications 56 (3) (2013) 507–530.
- (11) X. Zhang, M. Burger, X. Bresson, S. Osher, Bregmanized nonlocal regularization for deconvolution and sparse reconstruction, SIAM Journal on Imaging Sciences 3 (3) (2010) 253–276.
- (12) J. Xu, J. Ma, D. Zhang, Y. Zhang, S. Lin, Improved total variation minimization method for compressive sensing by intra-prediction, Signal Processing 92 (11) (2012) 2614–2623.
- (13) E. J. Candes, M. B. Wakin, S. P. Boyd, Enhancing sparsity by reweighted minimization, Journal of Fourier Analysis and Applications 14 (5-6) (2008) 877–905.
- (14) M. S. Asif, J. Romberg, Fast and accurate algorithms for re-weighted -norm minimization, IEEE Transactions on Signal Processing 61 (23) (2013) 5905–5916.
- (15) S. Mun, J. E. Fowler, Block compressed sensing of images using directional transforms, in: IEEE 16th International Conference on Image Processing (ICIP), 2009, pp. 3021–3024.
- (16) C. Van Trinh, K. Q. Dinh, B. Jeon, Edge-preserving block compressive sensing with projected landweber, in: IEEE 20th International Conference on Systems, Signals and Image Processing (IWSSIP), 2013, pp. 71–74.
- (17) C. Van Trinh, K. Q. Dinh, V. A. Nguyen, B. Jeon, Total variation reconstruction for compressive sensing using nonlocal lagrangian multiplier, in: IEEE 22nd European Signal Processing Conference (EUSIPCO), 2014, pp. 231–235.
- (19) M. Elad, M. Aharon, Image denoising via sparse and redundant representations over learned dictionaries, IEEE Transactions on Image Processing 15 (12) (2006) 3736–3745.
- (20) K. Dabov, A. Foi, V. Katkovnik, K. Egiazarian, Image denoising by sparse 3D transform-domain collaborative filtering, IEEE Transactions on Image Processing 16 (8) (2007) 2080–2095.
- (21) K. Dabov, A. Foi, V. Katkovnik, K. Egiazarian, BM3D image denoising with shape-adaptive principal component analysis, in: SPARS’09-Signal Processing with Adaptive Sparse Structured Representations, 2009.
- (22) P. Chatterjee, P. Milanfar, Patch-based near-optimal image denoising, IEEE Transactions on Image Processing 21 (4) (2012) 1635–1649.
- (23) Y. Xu, W. Yin, A fast patch-dictionary method for whole image recovery, Inverse Problems and Imaging (IPI) 10 (2) (2016) 563–583.
- (24) J. Zhang, D. Zhao, C. Zhao, R. Xiong, S. Ma, W. Gao, Image compressive sensing recovery via collaborative sparsity, IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2 (3) (2012) 380–391.
- (25) T. N. Canh, K. Q. Dinh, B. Jeon, Compressive sensing reconstruction via decomposition, Signal Processing: Image Communication 49 (2016) 63–78.
- (26) W. Dong, G. Shi, X. Li, Y. Ma, F. Huang, Compressive sensing via nonlocal low-rank regularization, IEEE Transactions on Image Processing 23 (8) (2014) 3618–3632.
- (27) C. A. Metzler, A. Maleki, R. G. Baraniuk, From denoising to compressed sensing, IEEE Transactions on Information Theory 62 (9) (2016) 5117–5144.
- (28) J. E. Fowler, S. Mun, E. W. Tramel, et al., Block-based compressed sensing of images and video, Foundations and Trends in Signal Processing 4 (4) (2012) 297–416.
- (29) M. Dadkhah, M. J. Deen, S. Shirani, Block-based CS in a CMOS image sensor, IEEE Sensors Journal 14 (8) (2014) 2897–2909.
- (30) K. Q. Dinh, B. Jeon, Iterative weighted recovery for block-based compressive sensing of image/video at low subrates, IEEE Transactions on Circuits and Systems for Video Technology, In press.
- (31) H. Han, L. Gan, S. Liu, Y. Guo, A novel measurement matrix based on regression model for block compressed sensing, Journal of Mathematical Imaging and Vision 51 (1) (2015) 161–170.
- (32) J. E. Fowler, S. Mun, E. W. Tramel, Multiscale block compressed sensing with smoothed projected landweber reconstruction, in: IEEE 19th European Signal Processing Conference (EUSIPCO), 2011, pp. 564–568.
- (33) K. Q. Dinh, H. J. Shim, B. Jeon, Weighted overlapped recovery for blocking artefacts reduction in block-based compressive sensing of images, Electronics Letters 51 (1) (2014) 48–50.
- (34) C. Chen, E. W. Tramel, J. E. Fowler, Compressed-sensing recovery of images and video using multihypothesis predictions, in: IEEE Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), 2011, pp. 1193–1198.
- (35) J. Zhang, C. Zhao, D. Zhao, W. Gao, Image compressive sensing recovery using adaptively learned sparsifying basis via minimization, Signal Processing 103 (2014) 114–126.
- (36) J. Zhang, D. Zhao, W. Gao, Group-based sparse representation for image restoration, IEEE Transactions on Image Processing 23 (8) (2014) 3336–3351.
- (37) T. T. Do, Y. Chen, D. T. Nguyen, N. Nguyen, L. Gan, T. D. Tran, Distributed compressed video sensing, in: IEEE 16th International Conference on Image Processing (ICIP), 2009, pp. 1393–1396.
- (38) S. Mun, J. E. Fowler, Residual reconstruction for block-based compressed sensing of video, in: IEEE Data Compression Conference (DCC), 2011, pp. 183–192.
- (39) E. W. Tramel, J. E. Fowler, Video compressed sensing with multihypothesis, in: Data Compression Conference (DCC), 2011, IEEE, 2011, pp. 193–202.
- (40) C. Van Trinh, V. A. Nguyen, B. Jeon, Block-based compressive sensing of video using local sparsifying transform, in: IEEE 16th International Workshop on Multimedia Signal Processing (MMSP), 2014, pp. 1–5.
- (41) L.-W. Kang, C.-S. Lu, Distributed compressive video sensing, in: IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP), 2009, pp. 1169–1172.
- (42) Y. C. Eldar, G. Kutyniok, Compressed sensing: Theory and applications, Cambridge University Press, 2012.
- (43) K. Q. Dinh, C. Van Trinh, V. A. Nguyen, Y. Park, B. Jeon, Measurement coding for compressive sensing of color images, IEIE Transactions on Smart Processing & Computing 3 (1) (2014) 10–18.
- (44) K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, A. Ashok, ReconNet: Non-iterative reconstruction of images from compressively sensed random measurements, in: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- (45) E. J. Candes, J. K. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Communications on Pure and Applied Mathematics 59 (8) (2006) 1207–1223.
- (46) M. V. Afonso, J. M. Bioucas-Dias, M. A. Figueiredo, Fast image recovery using variable splitting and constrained optimization, IEEE Transactions on Image Processing 19 (9) (2010) 2345–2356.
- (47) W. Dong, X. Yang, G. Shi, Compressive sensing via reweighted TV and nonlocal sparsity regularisation, Electronics Letters 49 (3) (2013) 184–186.
- (48) M. Lysaker, S. Osher, X.-C. Tai, Noise removal using smoothed normals and surface fitting, IEEE Transactions on Image Processing 13 (10) (2004) 1345–1357.
- (49) M. Bertalmío, S. Levine, Denoising an image by denoising its curvature image, SIAM Journal on Imaging Sciences 7 (1) (2014) 187–211.
- (50) C. Knaus, M. Zwicker, Dual-domain image denoising, in: IEEE 20th International Conference on Image Processing (ICIP), 2013, pp. 440–444.
- (51) J. Zhang, S. Liu, R. Xiong, S. Ma, D. Zhao, Improved total variation based image compressive sensing recovery by nonlocal regularization, in: IEEE International Symposium on Circuits and Systems (ISCAS), 2013, pp. 2836–2839.
- (52) D. Bertekas, Constrained optimization and lagrange methods (1982).
- (53) D. Needell, R. Ward, Stable image reconstruction using total variation minimization, SIAM Journal on Imaging Sciences 6 (2) (2013) 1035–1058.
- (54) A. Maleki, M. Narayan, R. G. Baraniuk, Suboptimality of nonlocal means for images with sharp edges, Applied and Computational Harmonic Analysis 33 (3) (2012) 370–387.
- (55) C. Sutour, C.-A. Deledalle, J.-F. Aujol, Adaptive regularization of the NL-means: Application to image and video denoising, IEEE Transactions on Image Processing 23 (8) (2014) 3506–3521.
- (56) M. V. Afonso, J. M. Bioucas-Dias, M. A. Figueiredo, An augmented lagrangian approach to the constrained optimization formulation of imaging inverse problems, IEEE Transactions on Image Processing 20 (3) (2011) 681–695.
- (57) R. Baraniuk, M. Davenport, R. DeVore, M. Wakin, A simple proof of the restricted isometry property for random matrices, Constructive Approximation 28 (3) (2008) 253–263.
- (58) L. Zhang, L. Zhang, X. Mou, D. Zhang, Fsim: A feature similarity index for image quality assessment, IEEE Transactions on Image Processing 20 (8) (2011) 2378–2386.
T. V. Sequences:.