Compressive sensing (CS), an emerging sampling and reconstructing strategy, can recover original signal from dramatically fewer measurements with a sub-Nyquist sampling rate [CS]. As CS has the potentials of significantly improving the sampling speed and sensor energy efficiency, it has been applied in many practical applications, including single pixel imaging [singlecamera], fast magnetic resonance imaging [MRI], high-speed video cameras [Video] and image encryption [cqli:meet:JISA19]. To deal with high-dimensional natural images efficiently, block-based CS is proposed as a lightweight CS approach [BCS, SPL, BCS-1]
. In such strategy, a scene under view is partitioned into some small blocks, which are then sampled and reconstructed independently. Meaningful information is usually not uniformly distributed in an image, so the block partition benefits more fair allocation of the sensing resources for the whole image[BCS-Salie].
Although block-based CS enjoys the advantages of low-cost sampling, lightweight reconstruction, and capability of adaptively assigning sensing resources, it also usually suffers from reduced quality of image reconstruction due to blocking artifacts [BCS-1, BCS-2]. To address the issue, some methods using an iterative block-based CS algorithm (BCS) are proposed [BCS, BCS-1]. In each iteration, the projection operation is used to build an approximation of each block, while denoising operation acts on the full image reassembled by the approximative blocks. Results demonstrate that the recovered image blocks can be improved, while blocking artifacts can also be ameliorated as the iterations progress. This approach, however, may increase the reconstructing time, since small blocks still need to be concatenated into large-size full images to remove blocking artifacts.
Inspired by the powerful learning ability of deep neural networks in image representation [DNN-C, DNN-S], several network-based CS methods are proposed [Icassp, Reconnet, Ldamp, Im-recon, ISTA], which are significantly faster than the traditional CS reconstruction algorithms. Using a fully connected layer to mimic the CS sampling, the network models can jointly optimize the sampling matrix and the reconstruction process, improving the qualities of recovered images. Although the CS network models are carefully constructed to enhance learning capabilities, several specific models have to been trained with various sampling rates, ignoring the mutual relationships among them. Consequently, blocking artifacts still exist in the existing deep network methods [Reconnet, Im-recon, ISTA], especially when the employed sampling rates are very low. Moreover, most network-based image CS methods are trained as a black box, ignoring structural insight of CS reconstruction algorithms. Consequently, the reconstruction accuracy is decreased.
In this paper, we propose a multi-channel deep learning architecture for casting BCS algorithm into a learning network. It can benefit from the speed and learning capacities of deep networks while retaining the advantages of the previous BCS algorithms. To facilitate description, we term the multi-channel deep architecture as BCS-Net, which consists of a channel-specific sampling network and a unified deep reconstruction network. The channel-specific architecture is specifically designed to handle block-based allocation of sensing resources. The blocks with various sampling rates can then be fed into the same deep reconstruction network to exploit inter-block correlation for removing blocking artifacts. We further divide the reconstruction network into a fixed number of residual layers, each of which corresponds to an iteration of the BCS algorithm. To enable training, a modified version of the famous DnCNN network designed in[DnCNN] is used to replace the denoising operation in traditional BCS approach, which easily propagates gradients.
Our contributions of the paper are summarized below:
A multi-channel sampling architecture specifically for block-based image CS is designed. Using this multi-channel architecture, block-wise CS measuring processed with a variety of sampling rates can be integrated into a single model to utilize the correlation among the blocks with different CS sampling rates.
A deep reconstruction architecture based on the BCS algorithm is proposed using block-wise approximation and full-image-based denoising.
Performances of the proposed approaches are verified by extensive experiments on three widely used benchmark datasets. The results show that the proposed multi-channel deep network can significantly outperform the state-of-the-art CS methods and network-based ones in terms of both subjective and objective metrics.
The remainder of this paper is organized as follows. In Sec. II, the related work on CS methods and network-based methods are reviewed. Section III introduces the idea of block-wise approximation and full-image-based denoising in BCS algorithm, and presents an extended version of the well-known Damp algorithm. The proposed multi-channel deep architecture is presented in Sec. IV and test results on its performance are given in Sec. V. The last section concludes the paper.
Ii The related work
In this section, we present the background of CS theory, then review the representative work on block-based image CS and deep network approaches.
Ii-a Preliminary of CS theory
Compressive sensing consists of two main steps: sampling/measuring process and reconstructing process. Let and denote a sparse signal of size and an measurement/sampling matrix, respectively. Then, the sampling process can be presented as
where is the
-length measurement vector sampled from. If signal is not sparse but compressible, the sampling process has to be further deduced from Eq. (1). That is, , where represents measurement noise, denotes the sparse transform, is the coefficient vector in the corresponding transform domain, , and is considered as a sparse vector approximating to . In general, a natural image is not strictly sparse signal, but often approximately sparse in some spare transform domain.
The process of reconstructing signal needs much more computational complexity than the sampling process. It has been proven in [CS] that if the sampling matrix obeys the restricted isometry property (RIP), it is possible to recover by solving an -norm optimization problem: , subjecting to , even if . Here is a small constant, and one has when is a compressible signal.
In the past two decades, a number of CS reconstruction algorithms have been developed, including basis pursuit [BP], orthogonal matching pursuit [OMP], and the latest iteration-based Damp algorithm [Damp]. Although these algorithms enjoy solid mathematical foundations, they usually need long reconstructing time due to high computational complexity.
Ii-B Block-based image CS
Block-based CS is more effective for processing natural images because of increased dimensionality of such signals [BCS, BCS-1, BCS-Small]. The scene under view is partitioned into relatively smaller non-overlapping blocks. The measurement matrixes corresponding to the small-size blocks are observed. Then, the image is sampled and reconstructed on a block-by-block basis [BCS-1]. This block-independent approach results in a reduced computational complexity for reconstruction with a much simplified sampling process.
As shown in Fig. 1, the meaningful information in different blocks of an image is non-uniform. Depending on the volume of information contained in each block, different sampling rates are adopted to reduce the overall sampling rate. Taking the two images in Fig. 1 as an example, lower CS sampling rates can be allocated for the block marked “C” in the image “Cameraman” and block “G” in image “Parrot”.
As a consequence, sensing resources should be reasonably allocated to each block, instead of equally assigned. In [BCS-Salie], less sampling rates are allocated to non-salient image blocks but more to salient ones using the characteristics of human visual system. Concretely, a low-resolution sensor is used produce an initially sampled image, such that the adaptive sampling can be achieved for the input scene. In [adaptive], the CS procedure is initialized with a low fixed-rate pre-sampling and an initial recovery. Then, the important regions are extracted by computing the saliency map of the initially recovered image. The adaptive CS strategy is further validated on some real video sequences [adaptive-tip]. In [Asymmetric], we also propose an asymmetric approach to ensure fairer allocation of sampling rates among image blocks.
Due to the fact that block partition breaks the global correlation of the whole image, block-based CS prone to generate reconstructed image of low quality. In [BCS], an iterative BCS reconstruction approach is proposed to remove the blocking artifacts. The recovered images via BCS algorithm are approximated on a block-by-block basis, but hard-thresholding denoising in each iteration is imposed on the full image, not image blocks. As a result, the artifacts incurred by block partition can be smoothed as the iteration progress. However, this can result in substantially increasing reconstruction complexity because of full-image denoising, which violates the motivation of lightweight design.
Ii-C Deep network approach for image CS
The tremendous success of deep learning in computer vision as shown in[DNN-C, DNN-S] attracted application of deep neural networks in image CS [Ldamp, Reconnet, ISTA, CSnet]. When an imaging system acquires CS measurements, the reconstructing process is performed with a deep network. Compared with the traditional CS, deep network approaches generally enjoy much faster reconstruction speed, while still achieving high-quality recovered images owing to their significant learning capabilities.
In [Reconnet], a network-based ReconNet approach is introduced to learn the inverse mapping from block-wise CS measurements to their desired image blocks. It is further improved in [Im-recon] that the measurement matrix and the reconstruction process are jointly learned. In [ISTA, Ldamp], traditional CS algorithms and deep networks are blended by treating parameters of the algorithm as weights to be learned. Unfortunately, full-image reconstruction is prone to overfitting due to the potentially overwhelming number of parameters of the sampling layer. So block-independent image recovery has to be performed instead of reconstructing the full image directly. As a consequence, blocking artifacts can be observed in the recovered images. To address the issue, the recovered images are fed into the BM3D denoiser designed in [BM3D] to remove blocking artifact [Reconnet, Im-recon]. However, the benefit of using an traditional off-the-shelf denoiser is not convincingly demonstrated.
For each sampling rate, the corresponding network model has to be trained to learn the inverse CS mapping. This may be not desirable for block-based CS, since a large number of model parameters need to be stored. In [Ldamp]
, a neural network architecture is applied to a variety of measurement matrices. Unfortunately, its performance improvement is less significant than traditional CS methods because the measurement matrices cannot participate in the network training. Inspired by the multi-scale super-resolution method given in[Enhanced], [Multi] introduce a multi-scale CS approach, where the main network is shared across multiple sampling rates. However, their method only reuses a portion of parameters, and a CS sampling rate still correspond to a specific network model. In other words, they do not consider block partition and the corresponding problem of blocking artifacts.
Iii Block-wise approximation and full-image-based denoising
In this section, we first introduce the key idea of the popular BCS algorithm [BCS], i.e., block-wise approximation and full-image-based denoising. Then we propose an extended version of the well known Damp approach [Damp] by exploiting this idea.
Iii-a Block-wise approximation and full-image-based denoising in BCS
The BCS algorithm is an iterative reconstruction approach specialized for block-based image CS. It solves the image reconstruction problem by using approximation with projection onto the convex set and hard thresholding denoising in the iteration process, as shown in
where is the measurement matrix corresponding to a block and is its pseudo-inverse, denotes the approximation of the recovered block , and all are reassembled into a full image . Here we assume that the original image are divided into blocks, is the measurement sampled from block . The reconstruction starts from some initial approximation and forms the recovered image at iteration. In the BCS approach, represents hard thresholding, which is widely used in removing Gaussian noise. In the special case when is an orthonormal matrix, we can deduce that , where is the transpose of .
We can conclude that, the key idea of BCS algorithm is block-wise approximation, as illustrated in the first equation in (2). The denoising operation, on the contrary, is imposed on the full image, not each block, as shown in the second equation in (2). In this way, blocking artifacts can be removed while still maintaining block-wise CS sampling.
Iii-B BCS-Damp algorithm
Inspired by BCS algorithm, in this subsection, we propose an extension for Damp algorithm, called BCS-Damp, to reconstruct the images with block partition.
Damp is a state-of-the-art CS reconstruction algorithm, which is also an iterative approach like BCS. Let be the measurement matrix of image , and is the corresponding measurements. Damp algorithm takes the form
where . The part in Eq. (3) can be written as , where is the original image and can be regarded as a Gaussian noise at iteration .is the number of CS measurements. div denotes the operation of partial derivative, and div represents the divergence of the denoiser.
However, Damp is not specially designed for block-based CS, and correspondingly it does not consider blocking artifacts as a result. With the idea of BCS in mind, that is, block-wise approximation is employed to decrease computation complexity, while full-image-based denoising is imposed to remove blocking artifacts, we propose an extension of Damp for block-based image CS. Our proposed BCS-Damp algorithm is illustrated as
where represents the block in an image. Here we maintain the operation of in Damp algorithm. The modification we have to introduce is that, does not run on the blocks, but on the full image , obtained by concatenating all approximated blocks , as shown in the last equation in (4).
In the following section, block-wise approximation but full-image-based denoising, along with the iterative structure of BCS algorithm, will be further casted into a carefully designed deep network for removing artifacts and improving the recovered image.
Iv Multi-channel deep network architecture
In this section, we propose a multi-channel deep network architecture, termed BCS-Net, to reconstruct the images acquired by block-wise CS sampling. The proposed BCS-Net is composed of a multi-channel sampling network and a deep reconstructing network, as shown in Fig. 2. These two networks consist of an integrated end-to-end model, of which the learnable parameters are jointly trained by our proposed two-stage training strategy.
Iv-a Multi-channel sampling architecture
This subsection investigates a -channel network architecture, named , to mimic the adaptive sampling process of block-based CS, as shown in Fig. 3.
The proposed has channels, each of which corresponds to a sampling rate assigned to a specific image block. A higher value of indicates a more detailed division of sampling rates of image blocks, thereby block partition can benefit the fairer allocation of the sensing resources, but resulting in more complex sampling architecture. The full scene under view, , is partitioned into non-overlapping image blocks of size . Using the schematic image in Fig. 3 as an example, is set to 9. According to CS theory, for image block , the sampling process can be represented as . Here () is the measurement matrix of block , , , and is the corresponding measurements. We have , where is the sampling rate assigned to block . With our sampling network, a channel is related to a sampling rate, and channels can then correspond to a number of target sampling rates by employing linear combination of those sampling rates.
For the channel in , we use a convolutional layer, in which we do not set bias and activation, to mimic the sampling process. This convolutional layer is defined as , where denotes convolution kernels of size . In other words, corresponds to measurement matrix . Note that the measurements among different channels are different from each other, since each channel corresponds to its own convolutional kernel. However, the blocks within the same image are spatially correlated. As a consequence, the measurements of those blocks among different channels are related to each other. If is fed into channel, we have
The size of the convolution kernel depends on the sampling rate and the size of image blocks. We should note that, compared with traditional measurement matrix, the weights of are learnable. In this perspective, it is more rigorous that network-based CS approaches should be referred to as the ones inspired by CS, instead of being CS.
We note that channels in our sampling network corresponds to sampling rates, respectively, and may be less than , the total number of the blocks in an image. With our multi-channel model, the blocks have to be measured via their respective channels, and the number of the channels, , is then related to the number of blocks of an image, . Ideally, a block corresponds to a unique channel. In this case, we have . If several blocks within an image are similar to each other, such as those blocks belonging to the background, they can be considered to assign the same sampling rates. At this point, we have . Fortunately, all channels can be trained sufficiently in the training process, since we have enough training images, and thus each of these channels will receive enough training blocks regardless of the allocation strategies employed.
Obviously, if the full image is not partitioned into blocks, too many parameters may be needed to store the weights and it will be easily prone to overfitting. This may also be the main reason that most existing deep network approaches reconstruct the full image with block-by-block strategy. In this paper, we are going to further investigate the problem of blocking artifacts due to block partition.
Iv-B Deep reconstructing architecture
In this subsection, we construct a deep reconstruction architecture to cast the idea of iterative block-wise approximation but full-image-based denoising into the network, achieving model-level removal of blocking artifacts. Our deep architecture is composed of an initial reconstruction network and a deep reconstruction network , as shown in Fig. 4.
The initial reconstruction network, , has inputs, each of which corresponds to a sampling channel in as illustrated in Section IV-A. The () input is connected to the corresponding convolutional layer, which uses kernels of size and generates values by convolving them with . Here is the measurement of block entering channel. All these values are combined into one feature map, , and we refer to it as the initially reconstructed result of block . From the network’s point of view, we have
where denotes the above mentioned convolutional kernel corresponding to the channel in the sampling network. As we can see from Fig. 4, our initial reconstruction network includes only one convolutional layer for simplification reason, and the initially recovered images is going to be improved by our deep reconstruction network.
The proposed deep reconstruction network, , is further divided into phases, so that the iterative BCS algorithm can be unrolled along with these phases. In , each phase corresponds to one iteration in BCS algorithm consisting of approximation and denoising operations. At phase the block-wise approximation is implemented by using a formula slightly different from (2) in Sec. III-A, as shown in
for each block , where is the measurement matrix specialized for block , if is fed into the network via channel in our multi-channel sampling model. Note that the matrix
is learnable, and it may be not an orthogonal matrix. Thus, its pseudo-inverse,, can not be simplified into as in the traditional BCS or Damp algorithm.
We then reassemble all approximated blocks to build a full, approximated image,
, for further denoising processing. To enable training of the deep reconstruction network, we modify the famous DnCNN network to implement full-image denoising. Traditional denoising methods, such as hard thresholding in BCS and BM3D in Damp algorithm, will not work in deep network architecture, since they cannot propagate gradients. This restricts us to focus on feed-forward convolutional neural networks. DnCNN is our choice, which fortunately offers improved performance on image deblocking and Gaussian denoising. Our deep reconstruction network is composed ofphases. Each phase has 5 convolutional layers, and the configuration is designed by referring to the DnCNN network. The first layer generates feature maps with the kernels of size , and last layer generates 1 feature map with one kernel. All three other layers employ kernels, each of which is of the size of
. It should be noted that all conventional layers explore the RELU activation function except the last layer. In DnCNN, 20 convolutional layers are employed to form a deep network for image denoising. In our network model, 5 convolutional layers form a phase, andphases are employ to deal with both image denoising and image approximation. In our experiments, , and are set to 64, 3 and 10, respectively. Let be the parameters of convolutional kernels in phase. Then one has
where is the approximated image in the first phase in the deep network.
Iv-C Two-stage training
We propose to divide the training process into two stages to improve the recovered images by training sampling matrix while being able of utilizing in deep reconstruction process.
As illustrated in Section IV-A, in our network sampling matrixes are implemented by employing convolution operations. That is, the elements in and have to be taken from the convolution kernel in the channel in the sampling network. However, we find that, if both and participate in the training process, we cannot achieve a desired recovered image. This is because, in the training process, has to be updated in real time along with each back propagation. Unfortunately, back propagation is based on the gradient decent rule, which will be hindered due to the real-time computing of in the deep reconstruction network.
In view of this, in the first stage, we aim to obtain the training parameters of the sampling network. That is, we have to achieve optimal sampling matrices and the corresponding matrices . It is observed that, our deep architecture consists of an initial reconstruction network without including and a deep reconstruction network where is utilized to improve the recovered images. In this way, we combine the sampling network and the initial reconstruction part of our reconstruction network into a training network, i.e., , which is used to train the sampling matrix . Given the training images , the cost function is
where is the number of images in the training dataset.
In the second stage, we further train the reconstructing network consisting of an initial reconstruction part and a deep reconstruction part, i.e., , where the parameters in come from . That is, the sampling weights are fixed while the parameters are updated in the training process. Our reconstructing network
directly learns the mapping between the CS measurements and the ground truth, and the loss function minimizing the error between the input and the output is on the basis of full images instead of image blocks. Mean square error is adopted to design an end-to-end cost function
where denotes the CS measurement vector of the block in training image.
V Performance evaluation
In this section, we conduct extensive experiments to evaluate the performance of the proposed BCS-Net and BCS-Damp schemes, and compare them with state-of-the-art methods, including traditional BCS [BCS], Damp [Damp], network-based Ista [ISTA], ReconNet [Reconnet] and its improved version, I-Recon [Im-recon], in terms of reconstruction quality, time complexity and visual effect.
V-a Training and testing
V-A1 Constructing a training set
The training images are from the training set (200 images) and testing set (200 images) of the BSDS500 database [BSD500], in which we randomly crop 89600 images with the size of as the training set. Each training image, , is further partitioned into 9 image blocks of size , . That is, there are a total of 806400 blocks in our training set. Visual saliency of the scene was exploited in [saliency, BCS-Salie]. In the experiments, the methods in reference [BCS-Salie] is used to compute the saliency map of training images. Suppose that represents the amount of the saliency information embodied in image . Then we have , where is the total number of pixels on image , denotes the saliency map of , and is the saliency value of location on . Let be the saliency information of image block , and denotes the proportion of the saliency information of block . We can then construct the training data pair for our network, as shown in
where and there are a total of training image pairs.
For three existing network-based approaches, Ista, ReconNet and I-Recon, 806400 image blocks are randomly cropped. These blocks and themselves consists of 806400 training block pairs for training, since these approaches are all based on block-independent image recovery.
V-A2 Training details
We set for our -channel network model, and the sampling rates are in the range of . Each image pair is further processed in order to find out the most appropriate channels in the network. For a given target sampling rate (), we calculate sub-rate of block as
where is defined in Section V-A1, and are the sizes of an image and its block, respectively. In the training process, the space of sampling rates is divided into seven intervals, each of which corresponds to a channel. If falls within the interval corresponding to channel, then block is pushed into the network via channel .
In the experiment, we train the network with 50 epoches. The batch size is set to 1, since each image have to be partitioned into 9 blocks and these blocks are reassembled in our multi-channel network. The mean square error between the original image and the output of the network is calculated as the loss for back-gradient propagation. Adam optimization[adam]
with a learning rate of 0.0001 is adopted to optimize the parameters. We use TensorFlow 1.4[Tensor] to train the proposed multi-channel network at a desktop platform configured with one NVIDIA 1060 GPU, one CPU @ 4.00 GHz of Intel(R) Core (TM) i7-4790K and 32GB of memory. The training processes takes about 3 hours for one epoch.
V-A3 Testing set
We test our multi-channel networks with three widely used benchmark datasets, Set5, Set11 and BSD100, where Set5 and Set11 are shown in Fig. 5 and Fig. 6, respectively.
Set5 consists of 5 gray images, where the sizes of “Bird” and “Head” are , “Baby”, “Butterfly” and “Woman” are with the size of , and , respectively. Set11 has 11 gray images, where the sizes of “Fingerprint” and “Flintstones” are , and the other 9 images are all with the size of . BSD100 includes 100 images with the size of or . These test images are with a various types of spatial distribution of key visual information. For example, the main meaningful information in images “Cameraman” and “Parrot” in Set11 is located in the single connected region. In contract, the visual information of “Bird” in Set5, “Fingerprint”, “Flintstones” and “Peppers” in Set11 uniformly distributes in the whole images. Note that all those test images are strictly separate from the training datasets.
V-B Results and analysis
V-B1 Comparisons with the state-of-the-art methods
In this subsection, we evaluate the performance of the proposed BCS-Net with adaptive allocation of sampling rate and BCS-Damp, and compare them with the existing methods.
For our BCS-Net, all test images are reprocessed to simulate the initial CS sampling by conforming to the following simulations. Each original test image is first resized to one percent of its original size, which mimics the scene under view pre-sampled by a low-resolution imaging sensor. After that, the saliency of the pre-sampled image is computed. This small-size saliency map is further normalized and bilinearly interpolated to a map with the original size. The saliency information of the original test images,, is then estimated. As a consequence, the sampling resources can be allocated to all blocks by using (12), instead of being equally allocated.
To guarantee the average rate of the sampling rates corresponding to each channel being equal or close to the target rate, , the following procedure is employed. As with the training process, the space of sampling rates is first divided into seven intervals. The internal corresponds to channel, and is represented as , where and is the total number of channels in our multi-channel model. Then we compute the sampling rate for block by using (12), where and is the total number of blocks. If falls into the interval , the value of is changed to . Here is the sampling rate corresponding to channel and . If , then the average sampling rate of all block is exactly equal to the target sampling rate. Otherwise, we carry out fine tune by changing some blocks to the higher or lower channel according to the positive or negative difference, i.e., . We should note that the average sampling rates can be very close to the target rates, and thus we ignore the differences between them in the experiments.
The average PSNR (peak signal-to-noise ratios) and SSIM (structural similarity index) with BCS-Net are reported in Table I. The comparison of running times of reconstructing the images in Set5 and Set11 is shown in Table II. Here the running times are the average values of all 16 test images in Set5 and Set11 with the sampling rate of 0.1. We should note that, 0.1 is the target sampling rate of an image, and our running time contains the time of reconstructing all blocks with different channels in the multi-channel architecture.
The best performance is labeled in bold, the second best is italic, and the second best is underlined.
As shown in Table I, our BCS-Net yields higher-quality recovered image in terms of both PSNR and SSIM than other existing methods, including BCS, Damp, ReconNet, I-Recon, and Ista, for Set5, Set11, and BSD100, respectively. From Table I, we can easily observe a significant performance improvement of, for instance, 3.82 dB and 2.98 dB on Set5 and Set11 with the sampling rate of 0.1, and 2.95 dB on BSD 100 with the sampling rate of 0.2, respectively. We notice that, along with the increase of the sampling rate, the improvement of our scheme slows down. The possible cause is that, when the sampling rate is as high as 0.1 or 0.2, the recovered images for the competing approach are of relatively high quality. And there is not too much space for improvement with even more sampling rates.
We can also see from Table I that, in existing methods I-Recon has relatively better reconstruction quality at extremely low sampling rates of 001, 0.03 and 0.05, while Ista usually performs better at the sampling rates of 0.2 and above. However, our BCS-Net always outperforms the traditional BCS and Damp algorithms, as well as the network-based ReconNet, I-Recon and Ista algorithms. We think that, the performance improvement of our scheme is mainly due to the following two factors, i.e., adaptive allocation of sampling rate in our multi-channel sampling network, and block-wise approximation and full-image denoising in our deep reconstruction network. We notice that BCS-Net has slightly longer running time than network-based ReconNet, I-Recon and Ista because of our multi-channel sampling and block reassembling, but it runs significantly far faster than optimization-based BCS and Damp reconstruction algorithm.
The proposed BCS-Damp approach also outperforms Damp and BCS algorithm, as shown in Table I. This is because, compared with Damp algorithm, in our BCS-Damp, BM3D denoising is imposed on the full image instead of each block, and blocking artifacts can then ameliorated. And compared with BCS approach, our BCS-Damp has better denoising performance, since BM3D denoising outperforms the hard thresholding employed in BCS approach. We notice that the proposed BCS-Damp even outperforms BCS-Net in terms of PSNR for Set11 at very high sampling rate of 0.4. We think this is another indication that network-based approaches generally offer more advantages over relatively lower sampling rates. Note that our BCS-Damp has much longer running time than BCS and Damp due to our full-image denoising strategy in the iteration process and relatively higher computation complexity of BM3D denoising algorithm.
V-B2 Performance evaluation without assigning sampling resources
In this subsection, we evaluate the performance of our BCS-Net scheme without assigning sampling rate (WA), since the sensing resources may not be allocated in certain scenarios. That is, all blocks in an image are assigned the same sampling rate, and accordingly, they are fed into our model from the same channel corresponding to the target rate. In this case, our multi-channel architecture becomes a unified deep network for target sampling rates. In the simulation, is set to 7, and these seven channels are in a range of . The comparison of average PSNR and SSIM of recovered Set5, Set11 and BSD100 can be observed in Table I. The detailed comparison of PSNR and SSIM for Set5 and Set11 is illustrated in Table III and IV, where we omit the results of testing set BSD100 due to limited space.
We should note that, our multi-channel model decreases large amount of storage requirements, since we use a unified deep reconstruction network to serve all sampling rates. For instance, Ista approach for seven different sampling rates has about 0.34 million (M) parameters each, totaling 2.4M, while our reconstruction network has only 1.1M parameters.
Table I shows that, the proposed WA scheme still achieves higher PSNR and SSIM than the best results of the existing BCS, Damp, ReconNet, I-Recon and Ista, thanks to the strategy of block-wise approximation and full-image based denoising. We can see from Table I that, our WA scheme always has lower PSNR than the proposed BCS-Net with adaptive allocation at the sampling rates of for Set5, Set11 and BSD100. We also notice that, for Set5, our WA scheme achieves slightly better SSIM than BCS-Net with adaptive allocation. From Table III, IV, the quality gap between BCS-Net and BCS-Net (WA) increases in some images, but decreases in other images. For example, the gap of recovered “Cameraman” and “Parrot” is about 28.02-26.06=1.96(dB) and 31.14-28.39=2.59(dB), but it drops to 25.29-25.03=0.26(dB) and 26.36-24.49=-0.13(dB) for “Flintstones” and “Fingerprint” with sampling rate of 0.1, respectively. This is because, the foreground objects of “Cameraman” and “Parrot” can be clearly distinguished from background information. In other words, the main meaning information in these images is limited to some local regions. As a consequence, the average assignment of sampling rate is obviously less efficient than adaptive allocation. However, the meaning information in “Flintstones” and “Fingerprint” is almost uniformly distributed in their respective image, and thus the effect of adaptive allocation is not so obvious. From Table I, our WA scheme has the same values of PSNR and SSIM with adaptive allocation at the target rates of 0.01 and 0.4. The reason is, in our experiment 0.01 and 0.4 are set to be minimal and maximal sampling rates respectively. According to (12) illustrated in Section V-A2, the adaptive allocation strategy degenerate to equal allocation, i.e., WA scheme.
V-B3 Comparison of visual effects
Using test image “Parrot” as an example, this section gives the visual effect of recovered images with our BCS-Net, BCS-Damp and other existing methods.
As shown in Fig. 8(a), in the case of very low sampling rate of 0.03, the proposed BCS-Net has obviously better visual effect than other approaches. We can observe significant blocking artifacts in recovered “Parrot” with traditional Damp and network-based Ista, ReconNet and I-Recon. This is because all these algorithms reconstruct “Parrot” block by block, without consideration of blocking artifacts due to block partition. We also notice that the block artifacts in BCS and our proposed BCS-Damp are not so obvious as other competing methods. The reason is that, in BCS and BCS-Damp, block-wise approximation is interleaved with full-image-based denoising, and the artifacts can then be gradually ameliorated as iterations progress. Our BCS-Net synthesizes the common merits of BCS algorithm and deep network approach, and thus achieves the best performance.
As we can see from Fig. 8(b), when the sampling rate increases to 0.3, the recovered “Parrot” with seven approaches are all improved. However, we can still observe obvious blocking artifacts for Damp and ReconNet algorithms. With I-Recon and Ista, some weak blocking artifacts are also noted. Compared with those competing approaches, our BCS-Net is capable of reconstructing more details and sharper edges, and has not more blocking artifacts.
In this paper, we further studied the problem of block-based image compressive sensing, and proposed a multi-channel deep neural network architecture, termed ‘BCS-Net’. The proposed architecture originates from the popular block-based CS algorithm, where block-wise iterative approximation together with full-image-based denoising is key for improving the recovered image. We then cast this idea into a carefully designed deep network, so that our proposed BCS-Net is capable of benefitting both from the learning capacities of deep network and from the hand-designed structure of BCS algorithm. Extensive experimental results show that our BCS-Net with adaptive sensing resource allocation achieves far better reconstruction quality and superb visual effect compared with state-of-the-art methods. At the same time, BCS-Net with WA approach also has excellent reconstruction performance with significantly reduced number of network parameters.