With the continuously increasing of modern data information, it brings a great challenge to signal acquisition, storage, and transmission technologies zeng1_i ; zeng7_i ; zeng8_i . Nyquist sampling theorem does not meet the needs of practical application wang2_i ; wang4_i ; wang6_i . In recent years, the theory of compressed sensing (CS) proposed by Candes r1 shows that if the signal is sparse or compressible and the measurement matrix satisfies the Restricted Isometry Property (RIP) condition, the original signal can be accurately restored from less than that of the Nyquist sampling theorem, and it saves a lot of memory for data sampling, transmission, and storage zeng7_i ; wang4_i .
For traditional compressed sensing algorithms, there are two main categories of CS reconstruction methods: convex relaxation methods r2 ; r3 and greedy matching pursuit methods r4 ; r8_co . Convex relaxation methods, which mainly include Interior Point Method (IPM) r6 , Gradient Projection for Sparse Reconstruction (GPSR) r7 , and Iterative Soft Thresholding Algorithm (ISTA) r8 , can solve the optimization problem of compressed sensing based on gradient descent r2 . IPM uses the preconditioned conjugate gradients algorithm to compute the search direction. It can efficiently settle large dense problems, which arise in sparse signal recovery with orthogonal transforms, by exploiting fast algorithms for those transforms. GPSR is based on the gradient descent method. It introduces hidden variables which transform the non-differentiable optimization function into a differentiable unconstrained convex function to reconstruct the original signal. ISTA uses the contraction thresholding function to solve the sub-optimization problem instead of the optimization problem and sets a fixed threshold to select the support set of meeting the conditions. But the calculation of these algorithms is very complicated and the calculation speed is slow.
To accelerate the convergence speed of the algorithms, researchers propose some greedy matching pursuit methods. Representative algorithms are the Orthogonal Matching Pursuit (OMP) algorithm r8_omp and Compressive Sampling Matching Pursuit (CoSaMP) r8_co . As these methods converge faster and are easy to implement in practice, researchers continue to study these methods. Needell et al. propose Regularized Orthogonal Matching Pursuit (ROMP) algorithm r9 . It is faster than OMP r8_omp , but the stability becomes worse. Kang et al. propose an adaptive subspace OMP method r10 which utilizes the prior knowledge of target size and coherence of target distribution to change the structure of subspace adaptively. Furthermore, it takes advantage of OMP r8_omp , SP r10_sp , and SaMP r10_sasp to improve the performance of reconstruction. Davenport describes a variant of the CoSaMP algorithm r11 which uses the D-RIP (a condition on the CS matrix analogous to restricted isometry property). This method mainly focuses on an orientation around recovering the signal rather than its dictionary coefficients. Zhang et al. combine the CoSaMP r8_co
and the genetic algorithm (GA)r11_ga and propose new signal recovery framework r12 which has better reconstruction quality and effectively avoids premature convergence. Although the performance of traditional algorithms is improved in reconstruction speed and quality, these methods have high computational complexity, and the reconstruction accuracy is limited.
, image super-resolutionr16_sda , CS image reconstruction zeng1_i , speaker recognition wang5_s ; wang7_s ; wang8_s ; zeng2_s , to digital forensics wang1_ad ; wang3_ad ; wang9_ad ; zeng3_ad ; zeng4_ad ; zeng5_ad ; zeng6_ad
. Stacked Denoising Autoencoders (SDA)r16_sda is the first deep learning technology applied in the CS field. Mousavi et al. r17 apply SDA to solve the CS recovery problem, which captures statistical dependencies between different elements of image signals to improve image reconstruction quality. Since convolutional neural network (CNN) r17_cnn has achieved great results in image processing, CNN is also applied in CS. Kulkarni et al. r18 are inspired by SRCNN r19 and propose a non-iterative reconstruction network (ReconNet), which uses CNN to learn the mapping from CS measurement to the original image. In r20 , Bo et al. propose FompNet based on CNN, which is used as post-processing for fast matching pursuit algorithm. In r21 , Zhang et al. are inspired by the Iterative Shrinkage-Thresholding Algorithm (ISTA) to propose ISTA-Net which casts ISTA into deep network form for image CS reconstruction. After the deep Residual Network (ResNets) r22 is proposed, researchers introduce residual learning to the network to improve reconstruction quality. Yao et al. propose a deep residual reconstruction network (DR2-Net) r23 which increases network depth based on ReconNet to further improve image reconstruction quality. All the above methods utilize the block-by-block measurement, and each measurement block is restored separately, which ignores the association between image blocks. Therefore, these may produce serious block effects, especially at low measurement rates (e.g., ). Shi et al. r23_CS propose an end-to-end framework dubbed CSNet, which does not directly block the input image during measurement, but uses convolution to obtain information about each image block. CSNet significantly improves image reconstruction quality and achieves fast running speed. In the reconstruction part, these methods only utilize standard CNN whose neuronal receptive fields are designed to the same size in each layer, which is inconsistent with the actual observation of the human visual system, hence hindering the representational ability of CNN.
Compared to the previous convolutional network, the multi-scale network can extract richer feature information and improve the representational ability of CNN. In r24 , Prabhu et al. propose a multi-scale convolutional network termed U-Finger for fingerprint image denoising and inpainting. U-Finger obtains three different scales by downsampling and upsampling and merges each scale features information that effectively improves image denoising ability. In r25 , Dong et al. propose a second-order multi-scale super-resolution network that concatenates the output of each RACB module to obtain a multi-scale group and concatenates the output of each group to obtain second-order multi-scale features information. The SMSR network achieves super-resolution reconstruction with high quality, even when dealing with remote sensing image which has highly complex spatial distribution. In r26 Lian et al. propose multi-scale residual reconstruction network (MSRNet) for CS image reconstruction. Although MSRNet also employs dilated convolution in the reconstruction part, multi-scale feature information is directly fused after extracting feature by dilated convolution with different dilated factors, which does not fully utilize multi-scale feature information.
To solve the above problems, we propose MsDCNN to learn the end-to-end mapping between the original images and the reconstructed images for CS image reconstruction in this paper. Firstly, by completely measuring the original image, we apply a fully convolutional network instead of a traditional CS matrix. The fully convolutional network directly measures the complete image and it doesn’t need to be cut into blocks, which effectively uses the image structure information. In the reconstruction period, we design the multi-scale feature extraction (MFE) network architecture which consists of multiple parallel convolutional channels to obtain multi-scale feature information. In each convolutional channel, we apply the dilated convolution with different dilation factors to obtain different receptive fields. Convolutional kernels of different receptive fields can extract different scale feature information after the convolutional operation. The MFE module can not only obtain multi-scale features which provide rich information for subsequent image reconstruction and improve the performance of image reconstruction but also avoid the increase of parameters. The contributions of our research work are mainly in three aspects:
We propose a novel multi-scale dilated convolutional neural network for high quality image compressed sensing. The MsDCNN combines and jointly trains the measurement and reconstruction modules to learn the end-to-end mapping between the original image and the reconstructed image, which outperforms many other state-of-the-art methods in the quality of reconstruction.
During the measurement period, we train a fully convolutional measurement network to obtain all measurements from the complete input image. Therefore, these adjacent measurement data are closely related to each other, which is totally different from traditional block-by-block measurements. The measurement method effectively uses the structural information of adjacent data to improve the quality of subsequent image reconstruction and eliminates the block effect.
During the reconstruction period, in order to improve the feature extraction ability of the traditional deep CS methods with a fixed size feature map, we propose MFE architecture to imitate the human visual system to capture multi-scale feature information, which consists of multiple parallel dilated convolutional channels. We apply dilated convolution with different dilation factors to increase the receptive fields, which capture multi-scale features in the image. Finally, we fuse multiple feature information to further improve the quality of image reconstruction.
the length of the measurement vector
|the length of the original signal vector|
|the measurement rate defined as|
|an original signal vector with size of|
|a measurement vector|
|a CS matrix|
|a sparse representation matrix|
|a sparse transform coefficients vector|
|the sparsity of|
|the RIP parameter|
|the dilated factor|
|the size of blocks of the input image|
|the - block vecotr of the original signal|
|the - block vecotr of the measurement|
|the block measurement matrix|
|the original input image matrix|
|the measurement matrix for image compression|
|the initial reconstruction image matrix|
|the - input image matrix|
|the reconstruction image matrix|
a tensor of multi-scale feature maps
|the weight matrix of the covolutional layer|
the bias vector of the covolutional layer
|the parameter set of the MsDCNN|
2 Related work
2.1 Compressed sensing theory
The mathematical model of CS measurement is expressed as follows:
Generally, the above data reconstruction is an ill-posed inverse problem. However, as long as the signal is sparse or compressible, the CS still can recover the signal from the measurement . When the original signal is an image, the matrix needs to be flattened into a vector row by row, and then the measurement is performed. After that, the output vector obtained by the reconstruction algorithm is spliced into a reconstruction image matrix. In the sparse domain, the original signal can be presented as follows:
The task of the above signal reconstruction is essentially the problem of -norm minimization.
With , satisfies the - RIP condition:
Then we can reconstruct the original signal from the CS measurement.
2.2 Block-based CS measurement
Since the measurement can be obtained by , the size of the CS matrix increases rapidly with increasing of the input image size. Direct measurement of the whole image requires large storage space and expensive computation. To overcome this problem, the block-based CS method (BCS) r27 is proposed. The block measurement method of CS is shown in Fig. 1. In this method, the input image is divided into several non-overlapping blocks and each block uses an independent and smaller CS matrix. The block measurement effectively reduces the dimension of the measurement data and the memory required for the calculation. Although this method effectively reduces computational complexity, it ignores the correlation between adjacent blocks resulting in serious block effects.
2.3 Dilated convolution
In the process of image reconstruction, it is significant to increase the receptive field. The large receptive field can capture more image information and improve the quality of image reconstruction. In CNN, a large-scale convolutional kernel, pooling layer, and deeper network are generally introduced to increase the receptive field of the network. However, as the size of the convolutional kernel and the number of network layers increase, it will increase the network computational complexity, resulting in a longer image reconstruction time. Although pooling does not increase the computational complexity of the network, it loses a lot of information of the image, resulting in the bad quality of the reconstructed image. Fortunately, the dilated convolution not only increases the receptive field of the network but also maintains computational complexity, which obtains better quality of image reconstruction. For example, using the dilated factor to expand the convolution kernel, which will obtain convolutional kernel. There are still nine nonzero values in the convolutional kernel, and the values in other positions are zero. The receptive field is changed from the original to . Fig. 2 shows dilated convolution with dilated factors and .
Although the dilated convolution increases the receptive field without increasing the number of network parameters, it also brings new problems. Since the dilated convolution is a sparse sampling method. When consecutive dilated convolution is used, some pixels are not involved in the calculation, which will lose the continuity and relevance of the data information resulting in the gridding effect.
To solve the above problems, we propose a novel multi-scale dilated convolution neural network (MsDCNN) for image CS measurement and reconstruction, and the motivations of this paper are as follows:
Most of the previous CS methods use a block-based method to measure. Although the block-based method reduces computational complexity, it also brings some new problems. Since each image block is measured independently in the measurement part, correspondingly, we can only reconstruct each image block respectively. Finally, all reconstructed image blocks are stitched together to obtain a complete image, which causes a serious block effect and obtains bad reconstruction quality.
In the reconstruction part, Most of the CS methods based on DL use CNN instead of Deep Neural Network (DNN), which further improves reconstruction performance. But each convolutional layer uses the convolutional kernel of the same receptive field to extract feature information, which can only collect single-scale spatial information, and the potential of CNN is not fully utilized to reconstruct the image with higher quality.
3 The proposed method
In this section, we propose the MsDCNN to measure and reconstruct the images. As shown in Fig. 3, the MsDCNN consists of two components, which are the full convolution measurement network and the reconstruction network. The reconstruction network also includes initial reconstruction and deep reconstruction. Since the basic operations of the measurement and reconstruction networks are convolution and deconvolution, which can directly process the input and output image matrix, there is no need to take a matrix-vector conversion in both measurement and reconstruction processes. Then we will describe the details of the network.
3.1 Network architecture
3.1.1 Full convolution measurement instead of the CS matrix
The existing CS methods based on deep learning usually adopt the block-by-block measurement method. In the CS measurement, the images are divided into small blocks of , and each small block is measured respectively by the compressed sample expression . Here is usually an artificially designed Gaussian random measurement matrix. The size of the input image block must be fixed which limits practical application, and it will correspondingly produce block effects in the final reconstruction result. In order to overcome this shortcoming, the MsDCNN uses a fully convolutional network to obtain measurement value from the input image, as shown in Fig. 4.
Where denotes convolutional measurement, denotes convolutional operation, and is weights of measurement. Since in the measurement, each convolutional kernel outputs one measurement value. For the measurement rate , the CS matrix has rows which will obtain measuring points. We set the number of kernels with
in the measurement layer. In addition, there are no biases in each convolutional kernel and the fully convolutional CS measurement layer has no activation function. The fully convolutional neural network replaces the artificial measurement matrix to adaptively learn the structure information of the input image. And the fully convolutional neural network can adapt to input images of various sizes.
We summarize the advantages of using a fully convolutional layer for measurement as follows:
It makes full use of the connection between adjacent data and eliminates the block effect caused by block-by-block measurement.
It can process images of any size, which breaks the limitation that the fully connected layer can only measure fixed-size images.
3.1.2 Initial reconstruction with deconvolutional network
After the full convolution measurement, the resolution of the image becomes lower and the size of the images is also compressed. To accurately reconstruct the original image, we first enhance the dimension to the original image size by deconvolution and generate the initial reconstructed image.
Where denotes initial reconstruction, and denote the weights and biases respectively of initial reconstruction layer. The deconvolution operation firstly adds zeros to the measured image to expand its dimension to the size of the original image, then transposes the convolution kernel in the previous full convolution measurement, and finally convolves the zero-filled image. Deconvolution is equivalent to the inverse process of convolutional, and deconvolution can be used as up-sampling to improve the dimension to the original image size, which prepares for subsequent deep reconstruction. Although the upsampling method of deconvolution cannot accurately restore the value of the original image, deconvolution has a good ability to learn image features from low level to high level. Therefore, deconvolution is applied as an initial reconstruction network as shown in Fig. 3.
3.1.3 Deep reconstruction network
In order to further improve the quality of image reconstruction, we imitate a human visual system using the multi-channel parallel network. Each channel applies convolutional kernels with different receptive fields to extract different scale information. So after getting the initial reconstructed image, we can obtain the multi-scale features information via the MFE module. In order to obtain richer feature information, multi-scale feature information is fused by the ‘Concat’ operation.
Where denotes concatenate operation, which concatenates different scale channel feature maps by each convolutional channel output to obtain multi-scale features information . The number of channel feature maps of is the sum of , and . Finally, we get the final reconstructed image through two convolutional layers as shown in Fig.3.
Where includes two convolution layers. The and
denote weights of two convolution layers, and the ReLUr28 is used as activation function. The last convolution layer has no activation function. The is the final reconstruction image.
3.2 The MFE module
The MFE is composed of multiple parallel convolutional channels. Each convolutional channel employs different dilated factors to obtain dilated convolutional kernels of different sizes which correspond to different receptive fields. To avoid the gridding effect caused by the continuous dilated convolution of the same dilated factor, we alternately set the dilated convolution and the normal convolution of the same receptive field in each convolutional channel. The MFE module can obtain different scale feature information after convolutional operation for each channel. The multi-scale feature information extracted by the MFE module contains structural information and image details, which provides sufficient information for subsequent multi-feature fusion. So textures, edges, and details of the image can be effectively reconstructed.
For this module, three situations about channel setting are as follow:
Single channel: This model is similar to the previous network reconstruction model and the size of the convolutional kernel is set to , which only has a single channel.
Two channels: In the first channel, the dilated factor is 1 liking the single channel. In the second channel, we use dilated convolution with the dilated factor of 2. The receptive field enlarges from to . In order to keep the receptive field consistent and avoid using continuous dilated convolution, we alternately set dilated convolution with the dilated factor of 2 and normal convolution with .
Three channels: On the basis of the two channels, we added the convolutional channel with a dilated factor of 3. The size of convolutional kernel changes from to which obtains a larger receptive field. Similarly, we alternately set dilated convolution with the dilated factor of 3 and normal convolution with .
We describe the three different MFE modules and the detailed structure parameters in Fig. 5.
3.3 The details of network structure
In the measurement part, the size of the convolutional kernel is set to
, The stride is set to 32 for non-overlapping measurement. In the reconstruction part, the kernel size of the deconvolutional layer is
and the stride also is 32. Each channel of the MFE network has layers, each convolutional layer utilizes 32 convolutional kernels, and the activation function ReLU is used after each convolutional operation. In order to be consistent with the input dimension, we perform a padding operation. After executing MFE, different scale feature information of every channel output is fused to obtain multi-scale features. Fig.5 shows the structural parameters of MFE in three different cases. In Fig. 5 (a), the MFE is a single channel, which is similar to most other reconstruction networks based on CNN. In Fig. 5 (b), the MFE has two channels, and according to channel dimensions, it merges all outputs to obtain 64 channel feature maps. In Fig.5 (c), the MFE has three channels, which gets 96 channel feature maps. And the final reconstructed image will be obtained through the last two convolutional layers. The size of the convolutional kernel is set to .
3.4 Network training
Given a training set
, our goal is to obtain a highly compressed measurement and accurately restore the original input image from measurement. From a practical point of view, it is not feasible to train the measurement network alone. This is because it is difficult to assess how good the quality of the measurement is without the reconstruction error as a reference, so the measurement network and reconstruction network are trained together, which means the CS measurement and reconstruction modules form an end-to-end network structure. The input and the label are images in training the network. Then our network is optimized with a loss function. The goal of training MsDCNN is to minimize the mean square error loss function:
Where represents the total number of training samples, represents the - input image, are the parameters that need to be trained, denotes convolutional measurement operation, denotes deconvolution initial reconstruction operation, denotes ’Concat’ operation, which concatenates multiple output feature information, obtains the final output.
4 Experimental results and discussion
In this section, we will evaluate the performance of the proposed method for CS image reconstruction. Firstly, we describe the data set for training and the training details. Then, we verify the effectiveness of the multi-channel and dilated convolution in MFE. Finally, we will compare the proposed method with the state-of-the-art methods with real-world images. The source codes are released at https://github.com/CCNUZFW/MsDCNN.
4.1 Datasets for training
The experiment uses 200 images as a training set and 200 images as a validation set from the BSDS500 database r29 for network training. We also use data augmentation (rotation or flip) to prepare the training data. Data augmentation (rotate or flip) is used to increase the data needed for network training, to improve the performance of the network model. We use the same test set of the ReconNet and DR2-Net methods as the benchmark.
4.2 Training details
optimization method to optimize all parameters of the network. For Adam’s other hyper-parameters, the exponential decay rates of the first and second moment estimates are set to 0.9 and 0.999, respectively. The number of epochs is set to 100, the learning rate of the first 50 epochs is 0.001, that of the 51 to 80 epochs is 0.0001 and that of the remaining 20 epochs is 0.00001. Although increasing the number of epochs may improve the performance of the network model, it also increases the training time of the network. Finally, we take the 100th training epoch as the final testing. We implement our model also using MatConvNetr32 package on MATLAB2016b and train the model on a GPU NVIDIA Quadro M4000.
4.3 Effectiveness of multi-channel in the MFE module
To test the effectiveness of MFE modules under different channel numbers, three MFE modules are set up to test the performance separately. MsDCNN-1 is a single channel network that is similar to the previous simple reconstruction network. MsDCNN-2 indicates MFE has two channels which add dilated convolution with dilated factors of 1 and 2. MsDCNN-3 indicates MFE has three parallel convolutional channels with dilated factors of 1, 2, and 3.
To demonstrate that multi-channel is beneficial for improving image reconstruction quality, we evaluate the CS image reconstruction quality with two widely used image quality metrics: PSNR and SSIM, and compare performances at the measurement rates of 0.01, 0.04, and 0.10. Table 2 shows the average PSNR and SSIM of CS image reconstruction for different channels on 5 test images in set5 r33 and 14 test images in set14 r34 . It can be seen that PSNR and SSIM for MsDCNN-2 and MsDCNN-3 are higher than the single-channel MsDCNN-1 without dilated convolution. The average PSNR of MsDCNN-2 increases by 0.45, and the average PSNR of MsDCNN-3 increases by 0.51 compared with MsDCNN-1. Because multi-channel networks can capture different scale features from the same feature map, the fused multi-scale feature contains richer image information than a single-channel networks. Accordingly, the original image can be reconstructed with high quality. Experimental results also indicate that multi-channel dilated convolution is beneficial to improve image reconstruction quality.
As the dilated factor increases, the receptive field expands correspondingly. However, when the number of parallel convolutional channels exceeds three, the improvement of reconstruction quality is little. In experiments, we design four parallel convolutional channels (The dilated factor of the fourth channel is 4, and the corresponding dilated convolutional kernel is ). The average PSNR of the four channels is only 0.02 higher than that of the three channels. Although a larger dilated factor (e.g., ) still can improve the quality of image reconstruction, the improvement effect is not obvious. The reason for this problem is that it will lose more spatial information for an image, as the receptive field continues to expand. And as the number of channels increases, the parameters of the network also increase, which increases the computational complexity of the network. Therefore, we use as the largest dilated factor in MFE.
4.4 Effectiveness of dilated convolution in the MFE
We use dilated convolution with different dilated factors in MFE to increase the receptive field. In order to show that dilated convolution is beneficial, we set different convolutional kernels to experiment in MFE. MsDCNN-2 indicates continuous dilated convolution is used in each convolutional layer in MFE. MsDCNN-2 uses general convolutional in MFE. MsDCNN-2 alternately sets dilated convolution and general convolutional in each parallel convolutional channel as shown in Fig. 5(b).
Firstly we calculate the numbers of the parameters of the three methods. It can be seen from Table 3 that the numbers of the parameters of MsDCNN-2 is much smaller than that of MsDCNN-2, and the number of parameters in MsDCNN-2 is less than that of MsDCNN-2 with general convolutional. We also compare the average time cost of the above three methods, as shown in Table 4. It can be seen, the time cost of MsDCNN-2 is the least for reconstructing an image compared with MsDCNN-2 and MsDCNN-2, and time cost of MsDCNN-2 is less than MsDCNN-2. It indicates that dilated convolution can maintain the number of parameters unchanged and increase the receptive field, thus reducing the computational complexity caused by the extended receptive field.
|Set||Time Cost||Time Cost||Time Cost|
Then, we evaluated the reconstruction quality of MsDCNN-2, MsDCNN-2 and MsDCNN-2 using the parameters PSNR and SSIM, as shown in Table 5. It can be seen that MsDCNN-2 has the worst reconstruction performance. That’s because continuous dilated convolution with the same dilated factor leads to the loss of internal data structure and spatial information, and the reconstruction quality is limited.
Through the analysis of Table 4 and Table 5, we can see that the reconstruction quality of MsDCNN-2 is slightly higher than MsDCNN-2, but the time cost of MsDCNN-2 is less than MsDCNN-2. So we alternately set dilated convolution and general convolution in each parallel convolutional channel, and it not only effectively improves the quality of image reconstruction but also costs almost no more reconstruction time. The MsDCNN-2 used in this paper has compromised reconstruction performance.
4.5 Effectiveness of the full convolution measurement
In order to verify the influence of the measurement methods on the reconstruction quality, we compared the reconstruction algorithms under the three measurement methods of partial-DCT measurement, random Gaussian measurement, and full convolution measurement. For the first two measurement methods, since the size of the input image is different, the input image is divided into multiple small blocks before measurement and then expanded into column vectors by row. After the measurement, a fully connected layer is up-sampled and deformed into a image as the initial reconstructed image, which is in the same way as ReconNet. The following deep reconstruction network structure is the same as MsDCNN. As shown in Table 6, DCT-MsDCNN-1 and DCT-MsDCNN-3 refer to single-channel and three-channel reconstruction algorithms under partial-DCT measurement, G-MsDCNN-1 and G-MsDCNN-3 refer to single-channel and three-channel reconstruction algorithms under random Gaussian measurement, and MsDCNN-3 refers to the proposed reconstruction algorithm under the full convolution measurement. Since ReconNet is very similar in structure to DCT-MsDCNN-1 and G-MsDCNN-1, and the number of network layers is the same, it is listed as the baseline under random Gaussian measurement.
In this experiment, the Gray11 dataset is applied as the testing dataset. As shown in Table 6, the reconstruction quality under the random Gaussian measurement is significantly better than the reconstruction quality under the partial-DCT measurement, so the random Gaussian measurement is used by most benchmark algorithms. While compared with the Gaussian random measurement, the full convolution measurement can significantly improve the reconstruction quality, and the PSNR is improved by 3.01 on average at the three sampling rates. There are three main reasons to explain the results of this experiment: Firstly, the measurement network has learned the data characteristics and adjusted the network parameters to adapt to the input image, while the random Gaussian measurement is independent of the input signal. Secondly, when the Gaussian random measurement is independent of the reconstruction network, the end-to-end network framework and training method closely integrate the links between measurement and reconstruction, and the measurement network adaptively promotes the reconstruction. Thirdly, the full convolution measurement does not divide the input image into blocks, which avoids the blocking effect.
The difference between G-MsDCNN-1 and ReconNet is less than 0.1, because they use the same measurement method and a similar reconstruction network structure. G-MsDCNN-3 is 0.68 higher than G-MsDCNN-1 on average, especially 1.29 higher at a measurement rate of 0.1, indicating that multi-channel can indeed improve the reconstruction quality. Therefore, this paper adopts the end-to-end framework of full convolution measurement and multi-channel reconstruction to improve the quality of reconstruction.
4.6 Comparisons with the state-of-the-art methods
We compare our method MsDCNN-2 and MsDCNN-3 with traditional algorithms TVAL3r35 and DL algorithms including ReconNet, DR2-Net, and MSRNet. The number of reconstruction network layers of DL-based algorithms are 6, 12, and 7 respectively, and our reconstruction network has 7 layers.
Firstly, we use PSNR as the evaluation parameter to quantitatively compare these methods. Table 7 shows the PSNR of different algorithms at different measurement rates. The results of TVAL3r35 and MSRNet are provided by literature r26 . As shown in this experiment, the DL based CS algorithms are better than traditional algorithms, and our method has the best performance at all measurement rates. Although our network has more layers than ReconNet, the PSNR of our method is higher than that of ReconNet by about 3.5 . Moreover, the number of our reconstruction network layers is equal to MSRNet and much less than DR2- Net, but PSNR is obviously improved than MSRNet and DR2- Net. At a low measurement rate of 0.01, the advantage of our method is particularly prominent. At the same time, we also compare MsDCNN-2 and MsDCNN-3 and found that most of the time, the quality of three-channel reconstruction is slightly higher than that of two-channel reconstruction. Experimental results further show that multi-channel parallel expansion convolutional is beneficial to improving image reconstruction quality.
Then, We compare the time complexity of these deep learning methods. It can be seen from Table 8 that when reconstructing a single image, the average time cost of our method is higher than that of the state-of-art methods. This is because these methods divide the image into small blocks before inputting the network. The dimension of the image block is much smaller than that of the complete image. Therefore, it is easier to reconstruct the image blocks separately than the complete image. At the same time, we can also see that the time cost of reconstructing an image is still slightly increasing as the number of channels increases. The reason for this problem is that it will bring other additional parameters as we add more parallel convolutional channels. Although our time cost is slightly higher than that of other methods, the reconstruction quality of the image is greatly improved.
|Time Cost||Time Cost||Time Cost|
Finally, we compare our method MsDCNN-3 with TVAL3, ReconNet, and DR2-Net in terms of visual effect, as shown in Fig. 6, Fig. 7, and Fig. 8, in which the CS measurement rates are 0.01, 0.04, and 0.10, respectively. It can be seen from the reconstruction images that our method achieves the best performance in visual effects. Even at a very low measurement rate of 0.01, MsDCNN-3 can effectively eliminate the block effect and retain sharper edges and finer detail.
In Fig. 8, although the measurement rate is 0.10, there is still a block effect in the region and edges of the person using ReconNet and DR2-Net. The reconstructed image by our method is not distorted in vision compared with the original image. As the measurement rate decreases, we can see that the images reconstructed by ReconNet and DR2-Net become blurry from Fig. 7. Images have serious block effects in high-frequency areas, and the edges seriously affect the visual effect. When the measurement rate is 0.01 in Fig. 6, images reconstructed by ReconNet and DR2-Net have severe block effects, and it is even difficult to judge the semantic information. However, the semantic meaning of the image can be clearly inferred from the image reconstructed by our method.
In this paper, we propose a multi-scale dilated convolutional neural network for image CS measurement and reconstruction, where fully convolutional is used as CS image measurement and the MFE module serves as multi-scale feature extraction to perform the deep reconstruction. In the measurement part, we use the fully convolutional measurement method instead of the previously block-by-block measurement, in which the measurement matrix can be automatically learned in a trained measurement network. Fully convolutional trained measurement effectively eliminates the block effect caused by the block-by-block measurement and preserves more structure information for subsequent image reconstruction. Specifically, in the reconstruction part, we propose the MFE architecture to imitate the human visual system to capture multi-scale feature information. In the MFE, there are multiple parallel convolutional channels and the dilated convolution are applied to obtain multi-scale receptive fields. In addition, the MFE can capture multi-scale feature information in images to improve the performance of image reconstruction. Experimental results show that the proposed end-to-end CS network achieves significant performance compared with the existing state-of-the-art methods. For future work, we will apply the residual network to obtain the multi-scale feature information of the image, adopt the weighted fusion method to fuse the feature information of different scales to further improve the quality of image reconstruction, and move our experiment implementations to the latest deep learning frameworks. Furthermore, the mathematical proofs of the reconstruction conditions of deep learning-based CS methods are a problem worthy of further study.
The research work of this paper were supported by the National Natural Science Foundation of China (No. 62177022, 61901165, 61501199), Collaborative Innovation Center for Informatization and Balanced Development of K-12 Education by MOE and Hubei Province (No. xtzd2021-005), and Self-determined Research Funds of CCNU from the Colleges’ Basic Research and Operation of MOE (No. CCNU22QN013).
Data Availability Statement
The datasets analyzed during the current study are available in IEEE Transactions On Pattern Analysis And Machine Intelligence paper “Contour detection and hierarchical image segmentation” r29 .
Ahn, N., Kang, B., Sohn, K. A. (2018). Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European conference on computer vision (ECCV) (pp. 252-268).
- (2) Arbelaez, P., Maire, M., Fowlkes, C., Malik, J. (2010). Contour detection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence, 33(5), 898-916.
- (3) Bo, L., Lu, H., Lu, Y., Meng, J., Wang, W. (2017, October). FompNet: Compressive sensing reconstruction with deep learning over wireless fading channels. In 2017 9th International Conference on Wireless Communications and Signal Processing (WCSP) (pp. 1-6). IEEE.
- (4) Candès, E. J., Romberg, J., Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52(2), 489-509.
- (5) Davenport, M. A., Needell, D., Wakin, M. B. (2013). Signal space CoSaMP for sparse recovery with redundant dictionaries. IEEE Transactions on Information Theory, 59(10), 6820-6829.
- (6) Deng, Z., Zhu, L., Hu, X., Fu, C. W., Xu, X., Zhang, Q., … Heng, P. A. (2019). Deep multi-model fusion for single-image dehazing. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 2453-2462).
- (7) Dong, C., Loy, C. C., He, K., Tang, X. (2014, September). Learning a deep convolutional network for image super-resolution. In European conference on computer vision (pp. 184-199). Springer, Cham.
- (8) Dong, X., Wang, L., Sun, X., Jia, X., Gao, L., Zhang, B. (2020). Remote sensing image super-resolution using second-order multi-scale networks. IEEE Transactions on Geoscience and Remote Sensing, 59(4), 3473-3485.
- (9) Fang, L., Wang, C., Li, S., Rabbani, H., Chen, X., Liu, Z. (2019). Attention to lesion: Lesion-aware convolutional neural network for retinal optical coherence tomography image classification. IEEE transactions on medical imaging, 38(8), 1959-1970.
- (10) Gan, L. (2007, July). Block compressed sensing of natural images. In 2007 15th International conference on digital signal processing (pp. 403-406). IEEE.
- (11) Han, X., Zhao, G., Li, X., Shu, T., Yu, W. (2019). Sparse signal reconstruction via expanded subspace pursuit. Journal of Applied Remote Sensing, 13(4), 046501.
He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
- (13) Kang, L., Huang, J. J., Huang, J. X. (2018, August). Adaptive subspace OMP for infrared small target image. In 2018 14th IEEE International Conference on Signal Processing (ICSP) (pp. 445-449). IEEE.
- (14) Kattenborn, T., Leitloff, J., Schiefer, F., Hinz, S. (2021). Review on Convolutional Neural Networks (CNN) in vegetation remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing, 173, 24-49.
- (15) Katoch, S., Chauhan, S. S., Kumar, V. (2021). A review on genetic algorithm: past, present, and future. Multimedia Tools and Applications, 80(5), 8091-8126.
- (16) Kulkarni, K., Lohit, S., Turaga, P., Kerviche, R., Ashok, A. (2016). Reconnet: Non-iterative reconstruction of images from compressively sensed measurements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 449-458).
- (17) Lai, W. S., Huang, J. B., Ahuja, N., Yang, M. H. (2018). Fast and accurate image super-resolution with deep laplacian pyramid networks. IEEE transactions on pattern analysis and machine intelligence, 41(11), 2599-2613.
- (18) Li, C., Liu, X., Yu, K., Wang, X., Zhang, F. (2020). Debiasing of seismic reflectivity inversion using basis pursuit de-noising algorithm. Journal of Applied Geophysics, 177, 104028.
- (19) Li, C., Yin, W., Jiang, H., Zhang, Y. (2013). An efficient augmented Lagrangian method with applications to total variation minimization. Computational Optimization and Applications, 56(3), 507-530.
- (20) Li, J., Fang, F., Mei, K., Zhang, G. (2018). Multi-scale residual network for image super-resolution. In Proceedings of the European conference on computer vision (ECCV) (pp. 517-532).
- (21) Li, W., Niu, M., Zhang, Y., Huang, Y., Yang, J. (2020). Forward-looking scanning radar superresolution imaging based on second-order accelerated iterative shrinkage-thresholding algorithm. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 620-631.
- (22) Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J. (2018). Detnet: Design backbone for object detection. In Proceedings of the European conference on computer vision (ECCV) (pp. 334-350).
- (23) Lian, Q., Fu, L., Chen, S., Shi, B. (2019). A compressed sensing algorithm based on multi-scale residual reconstruction network. Acta Autom, 45(11), 2082-2091.
Lin, T., Ma, S., Ye, Y., Zhang, S. (2021). An ADMM-based interior-point method for large-scale linear programming. Optimization Methods and Software, 36(2-3), 389-424.
- (25) Liu, J. K., Du, X. L. (2018). A gradient projection method for the sparse signal reconstruction in compressive sensing. Applicable Analysis, 97(12), 2122-2131.
- (26) Mousavi, A., Patel, A. B., Baraniuk, R. G. (2015, September). A deep learning approach to structured signal recovery. In 2015 53rd annual allerton conference on communication, control, and computing (Allerton) (pp. 1336-1343). IEEE.
- (27) Mujahid, A., Awan, M. J., Yasin, A., Mohammed, M. A., Damaševičius, R., Maskeliūnas, R., Abdulkareem, K. H. (2021). Real-time hand gesture recognition based on deep learning YOLOv3 model. Applied Sciences, 11(9), 4164.
- (28) Needell, D., Vershynin, R. (2009). Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit. Foundations of computational mathematics, 9(3), 317-334.
- (29) Prabhu, R., Yu, X., Wang, Z., Liu, D., Jiang, A. A. (2019). U-finger: Multi-scale dilated convolutional network for fingerprint image denoising and inpainting. In Inpainting and Denoising Challenges (pp. 45-50). Springer, Cham.
- (30) Saha, T., Srivastava, S., Khare, S., Stanimirović, P. S., Petković, M. D. (2019). An improved algorithm for basis pursuit problem and its applications. Applied Mathematics and Computation, 355, 385-398.
- (31) Schnass, K. (2018). Average performance of orthogonal matching pursuit (OMP) for sparse approximation. IEEE Signal Processing Letters, 25(12), 1865-1869.
- (32) Shi, W., Jiang, F., Liu, S., Zhao, D. (2019). Image compressed sensing using convolutional neural network. IEEE Transactions on Image Processing, 29, 375-388.
- (33) Tirer, T., Giryes, R. (2020). Generalizing CoSaMP to signals from a union of low dimensional linear subspaces. Applied and Computational Harmonic Analysis, 49(1), 99-122.
Wang, Z., Yang, Y., Zeng, C., Kong, S., Feng, S., Zhao, N. (2022). Shallow and deep feature fusion for digital audio tampering detection. EURASIP Journal on Advances in Signal Processing, 2022(1), 1-20.
- (35) Wang, Z., Zuo, C., Zeng, C. (2021). SAE based unified double JPEG compression detection system for Web image forensics. International Journal of Web Information Systems. 17(2), 84-98.
- (36) Wang, Z. F., Wang, J., Zeng, C. Y., Min, Q. S., Tian, Y., Zuo, M. Z. (2018, July). Digital audio tampering detection based on ENF consistency. In 2018 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR) (pp. 209-214). IEEE.
Wang, Z. F., Zhu, L., Min, Q. S., Zeng, C. Y. (2017, July). Double compression detection based on feature fusion. In 2017 International Conference on Machine Learning and Cybernetics (ICMLC) (Vol. 2, pp. 379-384). IEEE.
- (38) Wang, Z., Duan, S., Zeng, C., Yu, X., Yang, Y., Wu, H. (2020, November). Robust Speaker Identification of IoT based on Stacked Sparse Denoising Auto-encoders. In 2020 International Conferences on Internet of Things (iThings) (pp. 252-257). IEEE.
- (39) Wang, Z., Liu, Q., Yao, H., Chen, J. (2015, October). Virtual chime-bells experimental system based on multi-modal fusion. In 2015 International Conference of Educational Innovation through Technology (EITT) (pp. 64-67). IEEE.
- (40) Wang, Z., Duan, S., Zeng, C., Yu, X., Yang, Y., Wu, H. (2020, November). Robust Speaker Identification of IoT based on Stacked Sparse Denoising Auto-encoders. In 2020 International Conferences on Internet of Things (iThings) (pp. 252-257). IEEE.
- (41) Wang, Z., Zeng, C., Duan, S., Ouyang, H., Xu, H. (2020, August). Robust Speaker Recognition Based on Stacked Auto-encoders. In International Conference on Network-Based Information Systems (pp. 390-399). Springer, Cham.
- (42) Wang, Z., Liu, Q., Chen, J., Yao, H. (2015, October). Recording source identification using device universal background model. In 2015 International Conference of Educational Innovation through Technology (EITT) (pp. 19-23). IEEE.
- (43) Yao, H., Dai, F., Zhang, S., Zhang, Y., Tian, Q., Xu, C. (2019). Dr2-net: Deep residual reconstruction network for image compressive sensing. Neurocomputing, 359, 483-493.
- (44) Yao, S., Guan, Q., Wang, S., Xie, X. (2018). Fast sparsity adaptive matching pursuit algorithm for large-scale image reconstruction. EURASIP journal on wireless communications and networking, 2018(1), 1-8.
- (45) Zarei, A., Asl, B. M. (2021). Automatic seizure detection using orthogonal matching pursuit, discrete wavelet transform, and entropy based features of EEG signals. Computers in Biology and Medicine, 131, 104250.
- (46) Zeng, C., Ye, J., Wang, Z., Zhao, N., Wu, M. (2022). Cascade neural network-based joint sampling and reconstruction for image compressed sensing. Signal, Image and Video Processing, 16(1), 47-54.
- (47) Zeng, C., Yan, K., Wang, Z., Yu, Y., Xia, S., Zhao, N. (2022). Abs-CAM: a gradient optimization interpretable approach for explanation of convolutional neural networks. Signal, Image and Video Processing, 1-8.
Zeng, C. Y., Ma, C. F., Wang, Z. F., Ye, J. X. (2018, July). Stacked autoencoder networks based speaker recognition. In 2018 International Conference on Machine Learning and Cybernetics (ICMLC) (Vol. 1, pp. 294-299). IEEE.
- (49) Zeng, C., Zhu, D., Wang, Z., Wu, M., Xiong, W., Zhao, N. (2021). Spatial and temporal learning representation for end-to-end recording device identification. EURASIP Journal on Advances in Signal Processing, 2021(1), 1-19.
- (50) Zeng, C., Yang, Y., Wang, Z., Kong, S., Feng, S. (2022). Audio Tampering Forensics Based on Representation Learning of ENF Phase Sequence. International Journal of Digital Crime and Forensics (IJDCF), 14(1), 1-19.
- (51) Zeng, C., Zhu, D., Wang, Z., Wang, Z., Zhao, N., He, L. (2020). An end-to-end deep source recording device identification system for web media forensics. International Journal of Web Information Systems, 16(4), 413-425.
- (52) Zeng, C., Zhu, D., Wang, Z., Yang, Y. (2020, August). Deep and shallow feature fusion and recognition of recording devices based on attention mechanism. In International Conference on Intelligent Networking and Collaborative Systems (pp. 372-381). Springer, Cham.
- (53) Zeng, C., Wang, Z., Wang, Z., Yan, K., Yu, Y. (2021, September). Image Compressed Sensing and Reconstruction of Multi-Scale Residual Network Combined with Channel Attention Mechanism. In Journal of Physics: Conference Series (Vol. 2010, No. 1). IOP Publishing.
- (54) Zeng, C., Wang, Z., Wang, Z. (2020, November). Image Reconstruction of IoT based on Parallel CNN. In 2020 International Conferences on Internet of Things (iThings) (pp. 258-263). IEEE.
- (55) Zhang, J., Ghanem, B. (2018). ISTA-Net: Interpretable optimization-inspired deep network for image compressive sensing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1828-1837).
- (56) Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L. (2017). Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE transactions on image processing, 26(7), 3142-3155.
- (57) Zhang, L. (2015, April). Image adaptive reconstruction based on compressive sensing via CoSaMP. In 2015 2nd International Conference on Information Science and Control Engineering (pp. 760-763). IEEE.
- (58) Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y. (2018). Image super-resolution using very deep residual channel attention networks. In Proceedings of the European conference on computer vision (ECCV) (pp. 286-301).