1 Introduction
With the continuously increasing of modern data information, it brings a great challenge to signal acquisition, storage, and transmission technologies zeng1_i ; zeng7_i ; zeng8_i . Nyquist sampling theorem does not meet the needs of practical application wang2_i ; wang4_i ; wang6_i . In recent years, the theory of compressed sensing (CS) proposed by Candes r1 shows that if the signal is sparse or compressible and the measurement matrix satisfies the Restricted Isometry Property (RIP) condition, the original signal can be accurately restored from less than that of the Nyquist sampling theorem, and it saves a lot of memory for data sampling, transmission, and storage zeng7_i ; wang4_i .
For traditional compressed sensing algorithms, there are two main categories of CS reconstruction methods: convex relaxation methods r2 ; r3 and greedy matching pursuit methods r4 ; r8_co . Convex relaxation methods, which mainly include Interior Point Method (IPM) r6 , Gradient Projection for Sparse Reconstruction (GPSR) r7 , and Iterative Soft Thresholding Algorithm (ISTA) r8 , can solve the optimization problem of compressed sensing based on gradient descent r2 . IPM uses the preconditioned conjugate gradients algorithm to compute the search direction. It can efficiently settle large dense problems, which arise in sparse signal recovery with orthogonal transforms, by exploiting fast algorithms for those transforms. GPSR is based on the gradient descent method. It introduces hidden variables which transform the nondifferentiable optimization function into a differentiable unconstrained convex function to reconstruct the original signal. ISTA uses the contraction thresholding function to solve the suboptimization problem instead of the optimization problem and sets a fixed threshold to select the support set of meeting the conditions. But the calculation of these algorithms is very complicated and the calculation speed is slow.
To accelerate the convergence speed of the algorithms, researchers propose some greedy matching pursuit methods. Representative algorithms are the Orthogonal Matching Pursuit (OMP) algorithm r8_omp and Compressive Sampling Matching Pursuit (CoSaMP) r8_co . As these methods converge faster and are easy to implement in practice, researchers continue to study these methods. Needell et al. propose Regularized Orthogonal Matching Pursuit (ROMP) algorithm r9 . It is faster than OMP r8_omp , but the stability becomes worse. Kang et al. propose an adaptive subspace OMP method r10 which utilizes the prior knowledge of target size and coherence of target distribution to change the structure of subspace adaptively. Furthermore, it takes advantage of OMP r8_omp , SP r10_sp , and SaMP r10_sasp to improve the performance of reconstruction. Davenport describes a variant of the CoSaMP algorithm r11 which uses the DRIP (a condition on the CS matrix analogous to restricted isometry property). This method mainly focuses on an orientation around recovering the signal rather than its dictionary coefficients. Zhang et al. combine the CoSaMP r8_co
and the genetic algorithm (GA)
r11_ga and propose new signal recovery framework r12 which has better reconstruction quality and effectively avoids premature convergence. Although the performance of traditional algorithms is improved in reconstruction speed and quality, these methods have high computational complexity, and the reconstruction accuracy is limited.Recently, methods based on deep learning (DL) have been widely applied in multimedia tasks ranging from image classification r13 , object detection r14 , recognition r15
, image superresolution
r16_sda , CS image reconstruction zeng1_i , speaker recognition wang5_s ; wang7_s ; wang8_s ; zeng2_s , to digital forensics wang1_ad ; wang3_ad ; wang9_ad ; zeng3_ad ; zeng4_ad ; zeng5_ad ; zeng6_ad. Stacked Denoising Autoencoders (SDA)
r16_sda is the first deep learning technology applied in the CS field. Mousavi et al. r17 apply SDA to solve the CS recovery problem, which captures statistical dependencies between different elements of image signals to improve image reconstruction quality. Since convolutional neural network (CNN) r17_cnn has achieved great results in image processing, CNN is also applied in CS. Kulkarni et al. r18 are inspired by SRCNN r19 and propose a noniterative reconstruction network (ReconNet), which uses CNN to learn the mapping from CS measurement to the original image. In r20 , Bo et al. propose FompNet based on CNN, which is used as postprocessing for fast matching pursuit algorithm. In r21 , Zhang et al. are inspired by the Iterative ShrinkageThresholding Algorithm (ISTA) to propose ISTANet which casts ISTA into deep network form for image CS reconstruction. After the deep Residual Network (ResNets) r22 is proposed, researchers introduce residual learning to the network to improve reconstruction quality. Yao et al. propose a deep residual reconstruction network (DR2Net) r23 which increases network depth based on ReconNet to further improve image reconstruction quality. All the above methods utilize the blockbyblock measurement, and each measurement block is restored separately, which ignores the association between image blocks. Therefore, these may produce serious block effects, especially at low measurement rates (e.g., ). Shi et al. r23_CS propose an endtoend framework dubbed CSNet, which does not directly block the input image during measurement, but uses convolution to obtain information about each image block. CSNet significantly improves image reconstruction quality and achieves fast running speed. In the reconstruction part, these methods only utilize standard CNN whose neuronal receptive fields are designed to the same size in each layer, which is inconsistent with the actual observation of the human visual system, hence hindering the representational ability of CNN.Compared to the previous convolutional network, the multiscale network can extract richer feature information and improve the representational ability of CNN. In r24 , Prabhu et al. propose a multiscale convolutional network termed UFinger for fingerprint image denoising and inpainting. UFinger obtains three different scales by downsampling and upsampling and merges each scale features information that effectively improves image denoising ability. In r25 , Dong et al. propose a secondorder multiscale superresolution network that concatenates the output of each RACB module to obtain a multiscale group and concatenates the output of each group to obtain secondorder multiscale features information. The SMSR network achieves superresolution reconstruction with high quality, even when dealing with remote sensing image which has highly complex spatial distribution. In r26 Lian et al. propose multiscale residual reconstruction network (MSRNet) for CS image reconstruction. Although MSRNet also employs dilated convolution in the reconstruction part, multiscale feature information is directly fused after extracting feature by dilated convolution with different dilated factors, which does not fully utilize multiscale feature information.
To solve the above problems, we propose MsDCNN to learn the endtoend mapping between the original images and the reconstructed images for CS image reconstruction in this paper. Firstly, by completely measuring the original image, we apply a fully convolutional network instead of a traditional CS matrix. The fully convolutional network directly measures the complete image and it doesn’t need to be cut into blocks, which effectively uses the image structure information. In the reconstruction period, we design the multiscale feature extraction (MFE) network architecture which consists of multiple parallel convolutional channels to obtain multiscale feature information. In each convolutional channel, we apply the dilated convolution with different dilation factors to obtain different receptive fields. Convolutional kernels of different receptive fields can extract different scale feature information after the convolutional operation. The MFE module can not only obtain multiscale features which provide rich information for subsequent image reconstruction and improve the performance of image reconstruction but also avoid the increase of parameters. The contributions of our research work are mainly in three aspects:

We propose a novel multiscale dilated convolutional neural network for high quality image compressed sensing. The MsDCNN combines and jointly trains the measurement and reconstruction modules to learn the endtoend mapping between the original image and the reconstructed image, which outperforms many other stateoftheart methods in the quality of reconstruction.

During the measurement period, we train a fully convolutional measurement network to obtain all measurements from the complete input image. Therefore, these adjacent measurement data are closely related to each other, which is totally different from traditional blockbyblock measurements. The measurement method effectively uses the structural information of adjacent data to improve the quality of subsequent image reconstruction and eliminates the block effect.

During the reconstruction period, in order to improve the feature extraction ability of the traditional deep CS methods with a fixed size feature map, we propose MFE architecture to imitate the human visual system to capture multiscale feature information, which consists of multiple parallel dilated convolutional channels. We apply dilated convolution with different dilation factors to increase the receptive fields, which capture multiscale features in the image. Finally, we fuse multiple feature information to further improve the quality of image reconstruction.
Notations  Description 

the length of the measurement vector 

the length of the original signal vector  
the measurement rate defined as  
an original signal vector with size of  
a measurement vector  
a CS matrix  
a sparse representation matrix  
a sparse transform coefficients vector  
the sparsity of  
the RIP parameter  
the dilated factor  
the size of blocks of the input image  
the  block vecotr of the original signal  
the  block vecotr of the measurement  
the block measurement matrix  
the original input image matrix  
the measurement matrix for image compression  
the initial reconstruction image matrix  
the  input image matrix  
the reconstruction image matrix  
a tensor of multiscale feature maps 

the weight matrix of the covolutional layer  
the bias vector of the covolutional layer 

the parameter set of the MsDCNN  
2 Related work
2.1 Compressed sensing theory
The mathematical model of CS measurement is expressed as follows:
(1) 
Generally, the above data reconstruction is an illposed inverse problem. However, as long as the signal is sparse or compressible, the CS still can recover the signal from the measurement . When the original signal is an image, the matrix needs to be flattened into a vector row by row, and then the measurement is performed. After that, the output vector obtained by the reconstruction algorithm is spliced into a reconstruction image matrix. In the sparse domain, the original signal can be presented as follows:
(2) 
The task of the above signal reconstruction is essentially the problem of norm minimization.
(3) 
With , satisfies the  RIP condition:
(4) 
Then we can reconstruct the original signal from the CS measurement.
2.2 Blockbased CS measurement
Since the measurement can be obtained by , the size of the CS matrix increases rapidly with increasing of the input image size. Direct measurement of the whole image requires large storage space and expensive computation. To overcome this problem, the blockbased CS method (BCS) r27 is proposed. The block measurement method of CS is shown in Fig. 1. In this method, the input image is divided into several nonoverlapping blocks and each block uses an independent and smaller CS matrix. The block measurement effectively reduces the dimension of the measurement data and the memory required for the calculation. Although this method effectively reduces computational complexity, it ignores the correlation between adjacent blocks resulting in serious block effects.
2.3 Dilated convolution
In the process of image reconstruction, it is significant to increase the receptive field. The large receptive field can capture more image information and improve the quality of image reconstruction. In CNN, a largescale convolutional kernel, pooling layer, and deeper network are generally introduced to increase the receptive field of the network. However, as the size of the convolutional kernel and the number of network layers increase, it will increase the network computational complexity, resulting in a longer image reconstruction time. Although pooling does not increase the computational complexity of the network, it loses a lot of information of the image, resulting in the bad quality of the reconstructed image. Fortunately, the dilated convolution not only increases the receptive field of the network but also maintains computational complexity, which obtains better quality of image reconstruction. For example, using the dilated factor to expand the convolution kernel, which will obtain convolutional kernel. There are still nine nonzero values in the convolutional kernel, and the values in other positions are zero. The receptive field is changed from the original to . Fig. 2 shows dilated convolution with dilated factors and .
Although the dilated convolution increases the receptive field without increasing the number of network parameters, it also brings new problems. Since the dilated convolution is a sparse sampling method. When consecutive dilated convolution is used, some pixels are not involved in the calculation, which will lose the continuity and relevance of the data information resulting in the gridding effect.
2.4 Motivations
To solve the above problems, we propose a novel multiscale dilated convolution neural network (MsDCNN) for image CS measurement and reconstruction, and the motivations of this paper are as follows:

Most of the previous CS methods use a blockbased method to measure. Although the blockbased method reduces computational complexity, it also brings some new problems. Since each image block is measured independently in the measurement part, correspondingly, we can only reconstruct each image block respectively. Finally, all reconstructed image blocks are stitched together to obtain a complete image, which causes a serious block effect and obtains bad reconstruction quality.

In the reconstruction part, Most of the CS methods based on DL use CNN instead of Deep Neural Network (DNN), which further improves reconstruction performance. But each convolutional layer uses the convolutional kernel of the same receptive field to extract feature information, which can only collect singlescale spatial information, and the potential of CNN is not fully utilized to reconstruct the image with higher quality.
3 The proposed method
In this section, we propose the MsDCNN to measure and reconstruct the images. As shown in Fig. 3, the MsDCNN consists of two components, which are the full convolution measurement network and the reconstruction network. The reconstruction network also includes initial reconstruction and deep reconstruction. Since the basic operations of the measurement and reconstruction networks are convolution and deconvolution, which can directly process the input and output image matrix, there is no need to take a matrixvector conversion in both measurement and reconstruction processes. Then we will describe the details of the network.
3.1 Network architecture
3.1.1 Full convolution measurement instead of the CS matrix
The existing CS methods based on deep learning usually adopt the blockbyblock measurement method. In the CS measurement, the images are divided into small blocks of , and each small block is measured respectively by the compressed sample expression . Here is usually an artificially designed Gaussian random measurement matrix. The size of the input image block must be fixed which limits practical application, and it will correspondingly produce block effects in the final reconstruction result. In order to overcome this shortcoming, the MsDCNN uses a fully convolutional network to obtain measurement value from the input image, as shown in Fig. 4.
(5) 
Where denotes convolutional measurement, denotes convolutional operation, and is weights of measurement. Since in the measurement, each convolutional kernel outputs one measurement value. For the measurement rate , the CS matrix has rows which will obtain measuring points. We set the number of kernels with
in the measurement layer. In addition, there are no biases in each convolutional kernel and the fully convolutional CS measurement layer has no activation function. The fully convolutional neural network replaces the artificial measurement matrix to adaptively learn the structure information of the input image. And the fully convolutional neural network can adapt to input images of various sizes.
We summarize the advantages of using a fully convolutional layer for measurement as follows:

It makes full use of the connection between adjacent data and eliminates the block effect caused by blockbyblock measurement.

It can process images of any size, which breaks the limitation that the fully connected layer can only measure fixedsize images.
3.1.2 Initial reconstruction with deconvolutional network
After the full convolution measurement, the resolution of the image becomes lower and the size of the images is also compressed. To accurately reconstruct the original image, we first enhance the dimension to the original image size by deconvolution and generate the initial reconstructed image.
(6) 
Where denotes initial reconstruction, and denote the weights and biases respectively of initial reconstruction layer. The deconvolution operation firstly adds zeros to the measured image to expand its dimension to the size of the original image, then transposes the convolution kernel in the previous full convolution measurement, and finally convolves the zerofilled image. Deconvolution is equivalent to the inverse process of convolutional, and deconvolution can be used as upsampling to improve the dimension to the original image size, which prepares for subsequent deep reconstruction. Although the upsampling method of deconvolution cannot accurately restore the value of the original image, deconvolution has a good ability to learn image features from low level to high level. Therefore, deconvolution is applied as an initial reconstruction network as shown in Fig. 3.
3.1.3 Deep reconstruction network
In order to further improve the quality of image reconstruction, we imitate a human visual system using the multichannel parallel network. Each channel applies convolutional kernels with different receptive fields to extract different scale information. So after getting the initial reconstructed image, we can obtain the multiscale features information via the MFE module. In order to obtain richer feature information, multiscale feature information is fused by the ‘Concat’ operation.
(7) 
Where denotes concatenate operation, which concatenates different scale channel feature maps by each convolutional channel output to obtain multiscale features information . The number of channel feature maps of is the sum of , and . Finally, we get the final reconstructed image through two convolutional layers as shown in Fig.3.
(8) 
Where includes two convolution layers. The and
denote weights of two convolution layers, and the ReLU
r28 is used as activation function. The last convolution layer has no activation function. The is the final reconstruction image.3.2 The MFE module
The MFE is composed of multiple parallel convolutional channels. Each convolutional channel employs different dilated factors to obtain dilated convolutional kernels of different sizes which correspond to different receptive fields. To avoid the gridding effect caused by the continuous dilated convolution of the same dilated factor, we alternately set the dilated convolution and the normal convolution of the same receptive field in each convolutional channel. The MFE module can obtain different scale feature information after convolutional operation for each channel. The multiscale feature information extracted by the MFE module contains structural information and image details, which provides sufficient information for subsequent multifeature fusion. So textures, edges, and details of the image can be effectively reconstructed.
For this module, three situations about channel setting are as follow:

Single channel: This model is similar to the previous network reconstruction model and the size of the convolutional kernel is set to , which only has a single channel.

Two channels: In the first channel, the dilated factor is 1 liking the single channel. In the second channel, we use dilated convolution with the dilated factor of 2. The receptive field enlarges from to . In order to keep the receptive field consistent and avoid using continuous dilated convolution, we alternately set dilated convolution with the dilated factor of 2 and normal convolution with .

Three channels: On the basis of the two channels, we added the convolutional channel with a dilated factor of 3. The size of convolutional kernel changes from to which obtains a larger receptive field. Similarly, we alternately set dilated convolution with the dilated factor of 3 and normal convolution with .
We describe the three different MFE modules and the detailed structure parameters in Fig. 5.
3.3 The details of network structure
In the measurement part, the size of the convolutional kernel is set to
, The stride is set to 32 for nonoverlapping measurement. In the reconstruction part, the kernel size of the deconvolutional layer is
and the stride also is 32. Each channel of the MFE network has layers, each convolutional layer utilizes 32 convolutional kernels, and the activation function ReLU is used after each convolutional operation. In order to be consistent with the input dimension, we perform a padding operation. After executing MFE, different scale feature information of every channel output is fused to obtain multiscale features. Fig.
5 shows the structural parameters of MFE in three different cases. In Fig. 5 (a), the MFE is a single channel, which is similar to most other reconstruction networks based on CNN. In Fig. 5 (b), the MFE has two channels, and according to channel dimensions, it merges all outputs to obtain 64 channel feature maps. In Fig.5 (c), the MFE has three channels, which gets 96 channel feature maps. And the final reconstructed image will be obtained through the last two convolutional layers. The size of the convolutional kernel is set to .3.4 Network training
Given a training set
, our goal is to obtain a highly compressed measurement and accurately restore the original input image from measurement. From a practical point of view, it is not feasible to train the measurement network alone. This is because it is difficult to assess how good the quality of the measurement is without the reconstruction error as a reference, so the measurement network and reconstruction network are trained together, which means the CS measurement and reconstruction modules form an endtoend network structure. The input and the label are images in training the network. Then our network is optimized with a loss function. The goal of training MsDCNN is to minimize the mean square error loss function:
(9) 
Where represents the total number of training samples, represents the  input image, are the parameters that need to be trained, denotes convolutional measurement operation, denotes deconvolution initial reconstruction operation, denotes ’Concat’ operation, which concatenates multiple output feature information, obtains the final output.
4 Experimental results and discussion
In this section, we will evaluate the performance of the proposed method for CS image reconstruction. Firstly, we describe the data set for training and the training details. Then, we verify the effectiveness of the multichannel and dilated convolution in MFE. Finally, we will compare the proposed method with the stateoftheart methods with realworld images. The source codes are released at https://github.com/CCNUZFW/MsDCNN.
4.1 Datasets for training
The experiment uses 200 images as a training set and 200 images as a validation set from the BSDS500 database r29 for network training. We also use data augmentation (rotation or flip) to prepare the training data. Data augmentation (rotate or flip) is used to increase the data needed for network training, to improve the performance of the network model. We use the same test set of the ReconNet and DR2Net methods as the benchmark.
4.2 Training details
In this subsection, we use the method described in r30 to initialize the weights. We also use the Adam r31
optimization method to optimize all parameters of the network. For Adam’s other hyperparameters, the exponential decay rates of the first and second moment estimates are set to 0.9 and 0.999, respectively. The number of epochs is set to 100, the learning rate of the first 50 epochs is 0.001, that of the 51 to 80 epochs is 0.0001 and that of the remaining 20 epochs is 0.00001. Although increasing the number of epochs may improve the performance of the network model, it also increases the training time of the network. Finally, we take the 100th training epoch as the final testing. We implement our model also using MatConvNet
r32 package on MATLAB2016b and train the model on a GPU NVIDIA Quadro M4000.4.3 Effectiveness of multichannel in the MFE module
To test the effectiveness of MFE modules under different channel numbers, three MFE modules are set up to test the performance separately. MsDCNN1 is a single channel network that is similar to the previous simple reconstruction network. MsDCNN2 indicates MFE has two channels which add dilated convolution with dilated factors of 1 and 2. MsDCNN3 indicates MFE has three parallel convolutional channels with dilated factors of 1, 2, and 3.
Image  MR  MsDCNN1  MsDCNN2  MsDCNN3 
set  PSNR/SSIM  PSNR/SSIM  PSNR/SSIM  
Set5  0.01  23.74/0.6219  24.05/0.6391  24.15/0.6453 
0.04  27.89/0.7816  28.54/0.8127  28.58/0.8136  
0.10  31.10/0.8709  31.64/0.8867  31.75/0.8892  
Set14  0.01  22.47/0.5408  22.74/0.5568  22.79/0.5612 
0.04  25.57/0.6758  26.09/0.6982  26.07/0.6989  
0.10  27.96/0.7860  28.36/0.7981  28.47/0.8006  
To demonstrate that multichannel is beneficial for improving image reconstruction quality, we evaluate the CS image reconstruction quality with two widely used image quality metrics: PSNR and SSIM, and compare performances at the measurement rates of 0.01, 0.04, and 0.10. Table 2 shows the average PSNR and SSIM of CS image reconstruction for different channels on 5 test images in set5 r33 and 14 test images in set14 r34 . It can be seen that PSNR and SSIM for MsDCNN2 and MsDCNN3 are higher than the singlechannel MsDCNN1 without dilated convolution. The average PSNR of MsDCNN2 increases by 0.45, and the average PSNR of MsDCNN3 increases by 0.51 compared with MsDCNN1. Because multichannel networks can capture different scale features from the same feature map, the fused multiscale feature contains richer image information than a singlechannel networks. Accordingly, the original image can be reconstructed with high quality. Experimental results also indicate that multichannel dilated convolution is beneficial to improve image reconstruction quality.
As the dilated factor increases, the receptive field expands correspondingly. However, when the number of parallel convolutional channels exceeds three, the improvement of reconstruction quality is little. In experiments, we design four parallel convolutional channels (The dilated factor of the fourth channel is 4, and the corresponding dilated convolutional kernel is ). The average PSNR of the four channels is only 0.02 higher than that of the three channels. Although a larger dilated factor (e.g., ) still can improve the quality of image reconstruction, the improvement effect is not obvious. The reason for this problem is that it will lose more spatial information for an image, as the receptive field continues to expand. And as the number of channels increases, the parameters of the network also increase, which increases the computational complexity of the network. Therefore, we use as the largest dilated factor in MFE.
4.4 Effectiveness of dilated convolution in the MFE
We use dilated convolution with different dilated factors in MFE to increase the receptive field. In order to show that dilated convolution is beneficial, we set different convolutional kernels to experiment in MFE. MsDCNN2 indicates continuous dilated convolution is used in each convolutional layer in MFE. MsDCNN2 uses general convolutional in MFE. MsDCNN2 alternately sets dilated convolution and general convolutional in each parallel convolutional channel as shown in Fig. 5(b).
Model  MsDCNN2  MsDCNN2  MsDCNN2 

Parameters  56k  105k  88k 
Firstly we calculate the numbers of the parameters of the three methods. It can be seen from Table 3 that the numbers of the parameters of MsDCNN2 is much smaller than that of MsDCNN2, and the number of parameters in MsDCNN2 is less than that of MsDCNN2 with general convolutional. We also compare the average time cost of the above three methods, as shown in Table 4. It can be seen, the time cost of MsDCNN2 is the least for reconstructing an image compared with MsDCNN2 and MsDCNN2, and time cost of MsDCNN2 is less than MsDCNN2. It indicates that dilated convolution can maintain the number of parameters unchanged and increase the receptive field, thus reducing the computational complexity caused by the extended receptive field.
Image  MR  MsDCNN2  MsDCNN2  MsRFCNN2 

Set  Time Cost  Time Cost  Time Cost  
Set5  0.01  40  61  44 
0.04  44  53  46  
0.10  46  47  47  
Set14  0.01  126  149  129 
0.04  139  150  147  
0.10  125  150  126  
Image  MR  MsDCNN2  MsDCNN2  MsRFCNN2 
Set  PSNR/SSIM  PSNR/SSIM  PSNR/SSIM  
Set5  0.01  23.87/0.6305  24.11/0.6422  24.05/0.6391 
0.04  28.15/0.7928  28.55/0.8314  28.54/0.8127  
0.10  31.69/0.8857  31.85/0.8912  31.64/0.8867  
Set14  0.01  22.57/0.5486  22.78/0.5590  22.74/0.5568 
0.04  25.78/0.6844  26.11/0.6995  26.09/0.6982  
0.10  28.45/0.7978  28.53/0.8026  28.36/0.7981  
Then, we evaluated the reconstruction quality of MsDCNN2, MsDCNN2 and MsDCNN2 using the parameters PSNR and SSIM, as shown in Table 5. It can be seen that MsDCNN2 has the worst reconstruction performance. That’s because continuous dilated convolution with the same dilated factor leads to the loss of internal data structure and spatial information, and the reconstruction quality is limited.
Through the analysis of Table 4 and Table 5, we can see that the reconstruction quality of MsDCNN2 is slightly higher than MsDCNN2, but the time cost of MsDCNN2 is less than MsDCNN2. So we alternately set dilated convolution and general convolution in each parallel convolutional channel, and it not only effectively improves the quality of image reconstruction but also costs almost no more reconstruction time. The MsDCNN2 used in this paper has compromised reconstruction performance.
4.5 Effectiveness of the full convolution measurement
In order to verify the influence of the measurement methods on the reconstruction quality, we compared the reconstruction algorithms under the three measurement methods of partialDCT measurement, random Gaussian measurement, and full convolution measurement. For the first two measurement methods, since the size of the input image is different, the input image is divided into multiple small blocks before measurement and then expanded into column vectors by row. After the measurement, a fully connected layer is upsampled and deformed into a image as the initial reconstructed image, which is in the same way as ReconNet. The following deep reconstruction network structure is the same as MsDCNN. As shown in Table 6, DCTMsDCNN1 and DCTMsDCNN3 refer to singlechannel and threechannel reconstruction algorithms under partialDCT measurement, GMsDCNN1 and GMsDCNN3 refer to singlechannel and threechannel reconstruction algorithms under random Gaussian measurement, and MsDCNN3 refers to the proposed reconstruction algorithm under the full convolution measurement. Since ReconNet is very similar in structure to DCTMsDCNN1 and GMsDCNN1, and the number of network layers is the same, it is listed as the baseline under random Gaussian measurement.
In this experiment, the Gray11 dataset is applied as the testing dataset. As shown in Table 6, the reconstruction quality under the random Gaussian measurement is significantly better than the reconstruction quality under the partialDCT measurement, so the random Gaussian measurement is used by most benchmark algorithms. While compared with the Gaussian random measurement, the full convolution measurement can significantly improve the reconstruction quality, and the PSNR is improved by 3.01 on average at the three sampling rates. There are three main reasons to explain the results of this experiment: Firstly, the measurement network has learned the data characteristics and adjusted the network parameters to adapt to the input image, while the random Gaussian measurement is independent of the input signal. Secondly, when the Gaussian random measurement is independent of the reconstruction network, the endtoend network framework and training method closely integrate the links between measurement and reconstruction, and the measurement network adaptively promotes the reconstruction. Thirdly, the full convolution measurement does not divide the input image into blocks, which avoids the blocking effect.
The difference between GMsDCNN1 and ReconNet is less than 0.1, because they use the same measurement method and a similar reconstruction network structure. GMsDCNN3 is 0.68 higher than GMsDCNN1 on average, especially 1.29 higher at a measurement rate of 0.1, indicating that multichannel can indeed improve the reconstruction quality. Therefore, this paper adopts the endtoend framework of full convolution measurement and multichannel reconstruction to improve the quality of reconstruction.
Methods  MR=0.10  MR=0.04  MR=0.01 

ReconNet  22.68  19.99  17.27 
DCTMsDCNN1  16.09  15.55  15.12 
DCTMsDCNN3  16.62  15.98  15.21 
GMsDCNN1  22.58  19.81  17.16 
GMsDCNN3  23.87  20.29  17.43 
MsDCNN3  26.43  23.96  20.22 
4.6 Comparisons with the stateoftheart methods
Image Name  Methods  MR=0.10  MR=0.04  MR=0.01 
Brbara  TVAL3  21.88  18.98  11.94 
ReconNet  21.89  20.38  18.61  
DR2Net  22.69  20.70  18.65  
MSRNet  23.04  21.01  18.60  
MsDCNN2  24.26  23.53  21.75  
MsDCNN3  24.28  23.53  21.78  
Boats  TVAL3  23.86  19.20  11.86 
ReconNet  24.15  21.36  18.61  
DR2Net  25.58  22.11  18.67  
MSRNet  26.32  22.58  18.65  
MsDCNN2  28.83  25.98  21.88  
MsDCNN3  28.96  26.05  21.88  
Flinstones  TVAL3  18.88  14.88  9.75 
ReconNet  18.92  16.30  13.96  
DR2Net  21.09  16.93  14.01  
MSRNet  21.72  17.28  13.83  
MsDCNN2  22.91  20.07  16.57  
MsDCNN3  22.98  20.08  16.64  
Lena  TVAL3  24.16  19.46  11.87 
ReconNet  23.83  21.28  17.87  
DR2Net  25.39  22.13  17.97  
MSRNet  26.28  22.76  18.06  
MsDCNN2  28.46  25.92  22.31  
MsDCNN3  28.53  25.91  22.44  
Monarch  TVAL3  21.16  16.73  11.09 
ReconNet  21.10  18.19  15.39  
DR2Net  23.10  18.93  15.33  
MSRNet  23.98  19.26  15.41  
MsDCNN2  27.22  23.89  17.81  
MsDCNN3  27.46  23.80  18.00  
Peppers  TVAL3  22.64  18.21  11.35 
ReconNet  22.15  19.56  16.82  
DR2Net  23.73  20.32  16.90  
MSRNet  24.91  20.90  17.10  
MsDCNN2  26.24  24.34  20.56  
MsDCNN3  26.59  24.31  20.63  
Mean PSNR  TVAL3  22.84  18.39  11.31 
ReconNet  22.68  19.99  17.27  
DR2Net  24.32  20.80  17.44  
MSRNet  25.16  21.41  17.54  
MsDCNN2  26.32  23.96  20.15  
MsDCNN3  26.43  23.96  20.22  
We compare our method MsDCNN2 and MsDCNN3 with traditional algorithms TVAL3r35 and DL algorithms including ReconNet, DR2Net, and MSRNet. The number of reconstruction network layers of DLbased algorithms are 6, 12, and 7 respectively, and our reconstruction network has 7 layers.
Firstly, we use PSNR as the evaluation parameter to quantitatively compare these methods. Table 7 shows the PSNR of different algorithms at different measurement rates. The results of TVAL3r35 and MSRNet are provided by literature r26 . As shown in this experiment, the DL based CS algorithms are better than traditional algorithms, and our method has the best performance at all measurement rates. Although our network has more layers than ReconNet, the PSNR of our method is higher than that of ReconNet by about 3.5 . Moreover, the number of our reconstruction network layers is equal to MSRNet and much less than DR2 Net, but PSNR is obviously improved than MSRNet and DR2 Net. At a low measurement rate of 0.01, the advantage of our method is particularly prominent. At the same time, we also compare MsDCNN2 and MsDCNN3 and found that most of the time, the quality of threechannel reconstruction is slightly higher than that of twochannel reconstruction. Experimental results further show that multichannel parallel expansion convolutional is beneficial to improving image reconstruction quality.
Then, We compare the time complexity of these deep learning methods. It can be seen from Table 8 that when reconstructing a single image, the average time cost of our method is higher than that of the stateofart methods. This is because these methods divide the image into small blocks before inputting the network. The dimension of the image block is much smaller than that of the complete image. Therefore, it is easier to reconstruct the image blocks separately than the complete image. At the same time, we can also see that the time cost of reconstructing an image is still slightly increasing as the number of channels increases. The reason for this problem is that it will bring other additional parameters as we add more parallel convolutional channels. Although our time cost is slightly higher than that of other methods, the reconstruction quality of the image is greatly improved.
Methods  MR=0.01  MR=0.04  MR=0.10 
Time Cost  Time Cost  Time Cost  
ReconNet  10.7  10.0  10.1 
DR2Net  31.7  11.7  31.4 
MSRNet  12.1  12.4  11.7 
MsDCNN1  30.0  28.3  28.1 
MsDCNN2  34.0  36.4  35.3 
MsDCNN3  41.2  43.1  42.2 
Finally, we compare our method MsDCNN3 with TVAL3, ReconNet, and DR2Net in terms of visual effect, as shown in Fig. 6, Fig. 7, and Fig. 8, in which the CS measurement rates are 0.01, 0.04, and 0.10, respectively. It can be seen from the reconstruction images that our method achieves the best performance in visual effects. Even at a very low measurement rate of 0.01, MsDCNN3 can effectively eliminate the block effect and retain sharper edges and finer detail.
In Fig. 8, although the measurement rate is 0.10, there is still a block effect in the region and edges of the person using ReconNet and DR2Net. The reconstructed image by our method is not distorted in vision compared with the original image. As the measurement rate decreases, we can see that the images reconstructed by ReconNet and DR2Net become blurry from Fig. 7. Images have serious block effects in highfrequency areas, and the edges seriously affect the visual effect. When the measurement rate is 0.01 in Fig. 6, images reconstructed by ReconNet and DR2Net have severe block effects, and it is even difficult to judge the semantic information. However, the semantic meaning of the image can be clearly inferred from the image reconstructed by our method.
5 Conclusion
In this paper, we propose a multiscale dilated convolutional neural network for image CS measurement and reconstruction, where fully convolutional is used as CS image measurement and the MFE module serves as multiscale feature extraction to perform the deep reconstruction. In the measurement part, we use the fully convolutional measurement method instead of the previously blockbyblock measurement, in which the measurement matrix can be automatically learned in a trained measurement network. Fully convolutional trained measurement effectively eliminates the block effect caused by the blockbyblock measurement and preserves more structure information for subsequent image reconstruction. Specifically, in the reconstruction part, we propose the MFE architecture to imitate the human visual system to capture multiscale feature information. In the MFE, there are multiple parallel convolutional channels and the dilated convolution are applied to obtain multiscale receptive fields. In addition, the MFE can capture multiscale feature information in images to improve the performance of image reconstruction. Experimental results show that the proposed endtoend CS network achieves significant performance compared with the existing stateoftheart methods. For future work, we will apply the residual network to obtain the multiscale feature information of the image, adopt the weighted fusion method to fuse the feature information of different scales to further improve the quality of image reconstruction, and move our experiment implementations to the latest deep learning frameworks. Furthermore, the mathematical proofs of the reconstruction conditions of deep learningbased CS methods are a problem worthy of further study.
Acknowledgments
The research work of this paper were supported by the National Natural Science Foundation of China (No. 62177022, 61901165, 61501199), Collaborative Innovation Center for Informatization and Balanced Development of K12 Education by MOE and Hubei Province (No. xtzd2021005), and Selfdetermined Research Funds of CCNU from the Colleges’ Basic Research and Operation of MOE (No. CCNU22QN013).
Data Availability Statement
The datasets analyzed during the current study are available in IEEE Transactions On Pattern Analysis And Machine Intelligence paper “Contour detection and hierarchical image segmentation” r29 .
References

(1)
Ahn, N., Kang, B., Sohn, K. A. (2018). Fast, accurate, and lightweight superresolution with cascading residual network. In Proceedings of the European conference on computer vision (ECCV) (pp. 252268).
 (2) Arbelaez, P., Maire, M., Fowlkes, C., Malik, J. (2010). Contour detection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence, 33(5), 898916.
 (3) Bo, L., Lu, H., Lu, Y., Meng, J., Wang, W. (2017, October). FompNet: Compressive sensing reconstruction with deep learning over wireless fading channels. In 2017 9th International Conference on Wireless Communications and Signal Processing (WCSP) (pp. 16). IEEE.
 (4) Candès, E. J., Romberg, J., Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52(2), 489509.
 (5) Davenport, M. A., Needell, D., Wakin, M. B. (2013). Signal space CoSaMP for sparse recovery with redundant dictionaries. IEEE Transactions on Information Theory, 59(10), 68206829.
 (6) Deng, Z., Zhu, L., Hu, X., Fu, C. W., Xu, X., Zhang, Q., … Heng, P. A. (2019). Deep multimodel fusion for singleimage dehazing. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 24532462).
 (7) Dong, C., Loy, C. C., He, K., Tang, X. (2014, September). Learning a deep convolutional network for image superresolution. In European conference on computer vision (pp. 184199). Springer, Cham.
 (8) Dong, X., Wang, L., Sun, X., Jia, X., Gao, L., Zhang, B. (2020). Remote sensing image superresolution using secondorder multiscale networks. IEEE Transactions on Geoscience and Remote Sensing, 59(4), 34733485.
 (9) Fang, L., Wang, C., Li, S., Rabbani, H., Chen, X., Liu, Z. (2019). Attention to lesion: Lesionaware convolutional neural network for retinal optical coherence tomography image classification. IEEE transactions on medical imaging, 38(8), 19591970.
 (10) Gan, L. (2007, July). Block compressed sensing of natural images. In 2007 15th International conference on digital signal processing (pp. 403406). IEEE.
 (11) Han, X., Zhao, G., Li, X., Shu, T., Yu, W. (2019). Sparse signal reconstruction via expanded subspace pursuit. Journal of Applied Remote Sensing, 13(4), 046501.

(12)
He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770778).
 (13) Kang, L., Huang, J. J., Huang, J. X. (2018, August). Adaptive subspace OMP for infrared small target image. In 2018 14th IEEE International Conference on Signal Processing (ICSP) (pp. 445449). IEEE.
 (14) Kattenborn, T., Leitloff, J., Schiefer, F., Hinz, S. (2021). Review on Convolutional Neural Networks (CNN) in vegetation remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing, 173, 2449.
 (15) Katoch, S., Chauhan, S. S., Kumar, V. (2021). A review on genetic algorithm: past, present, and future. Multimedia Tools and Applications, 80(5), 80918126.
 (16) Kulkarni, K., Lohit, S., Turaga, P., Kerviche, R., Ashok, A. (2016). Reconnet: Noniterative reconstruction of images from compressively sensed measurements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 449458).
 (17) Lai, W. S., Huang, J. B., Ahuja, N., Yang, M. H. (2018). Fast and accurate image superresolution with deep laplacian pyramid networks. IEEE transactions on pattern analysis and machine intelligence, 41(11), 25992613.
 (18) Li, C., Liu, X., Yu, K., Wang, X., Zhang, F. (2020). Debiasing of seismic reflectivity inversion using basis pursuit denoising algorithm. Journal of Applied Geophysics, 177, 104028.
 (19) Li, C., Yin, W., Jiang, H., Zhang, Y. (2013). An efficient augmented Lagrangian method with applications to total variation minimization. Computational Optimization and Applications, 56(3), 507530.
 (20) Li, J., Fang, F., Mei, K., Zhang, G. (2018). Multiscale residual network for image superresolution. In Proceedings of the European conference on computer vision (ECCV) (pp. 517532).
 (21) Li, W., Niu, M., Zhang, Y., Huang, Y., Yang, J. (2020). Forwardlooking scanning radar superresolution imaging based on secondorder accelerated iterative shrinkagethresholding algorithm. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 620631.
 (22) Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J. (2018). Detnet: Design backbone for object detection. In Proceedings of the European conference on computer vision (ECCV) (pp. 334350).
 (23) Lian, Q., Fu, L., Chen, S., Shi, B. (2019). A compressed sensing algorithm based on multiscale residual reconstruction network. Acta Autom, 45(11), 20822091.

(24)
Lin, T., Ma, S., Ye, Y., Zhang, S. (2021). An ADMMbased interiorpoint method for largescale linear programming. Optimization Methods and Software, 36(23), 389424.
 (25) Liu, J. K., Du, X. L. (2018). A gradient projection method for the sparse signal reconstruction in compressive sensing. Applicable Analysis, 97(12), 21222131.
 (26) Mousavi, A., Patel, A. B., Baraniuk, R. G. (2015, September). A deep learning approach to structured signal recovery. In 2015 53rd annual allerton conference on communication, control, and computing (Allerton) (pp. 13361343). IEEE.
 (27) Mujahid, A., Awan, M. J., Yasin, A., Mohammed, M. A., Damaševičius, R., Maskeliūnas, R., Abdulkareem, K. H. (2021). Realtime hand gesture recognition based on deep learning YOLOv3 model. Applied Sciences, 11(9), 4164.
 (28) Needell, D., Vershynin, R. (2009). Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit. Foundations of computational mathematics, 9(3), 317334.
 (29) Prabhu, R., Yu, X., Wang, Z., Liu, D., Jiang, A. A. (2019). Ufinger: Multiscale dilated convolutional network for fingerprint image denoising and inpainting. In Inpainting and Denoising Challenges (pp. 4550). Springer, Cham.
 (30) Saha, T., Srivastava, S., Khare, S., Stanimirović, P. S., Petković, M. D. (2019). An improved algorithm for basis pursuit problem and its applications. Applied Mathematics and Computation, 355, 385398.
 (31) Schnass, K. (2018). Average performance of orthogonal matching pursuit (OMP) for sparse approximation. IEEE Signal Processing Letters, 25(12), 18651869.
 (32) Shi, W., Jiang, F., Liu, S., Zhao, D. (2019). Image compressed sensing using convolutional neural network. IEEE Transactions on Image Processing, 29, 375388.
 (33) Tirer, T., Giryes, R. (2020). Generalizing CoSaMP to signals from a union of low dimensional linear subspaces. Applied and Computational Harmonic Analysis, 49(1), 99122.

(34)
Wang, Z., Yang, Y., Zeng, C., Kong, S., Feng, S., Zhao, N. (2022). Shallow and deep feature fusion for digital audio tampering detection. EURASIP Journal on Advances in Signal Processing, 2022(1), 120.
 (35) Wang, Z., Zuo, C., Zeng, C. (2021). SAE based unified double JPEG compression detection system for Web image forensics. International Journal of Web Information Systems. 17(2), 8498.
 (36) Wang, Z. F., Wang, J., Zeng, C. Y., Min, Q. S., Tian, Y., Zuo, M. Z. (2018, July). Digital audio tampering detection based on ENF consistency. In 2018 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR) (pp. 209214). IEEE.

(37)
Wang, Z. F., Zhu, L., Min, Q. S., Zeng, C. Y. (2017, July). Double compression detection based on feature fusion. In 2017 International Conference on Machine Learning and Cybernetics (ICMLC) (Vol. 2, pp. 379384). IEEE.
 (38) Wang, Z., Duan, S., Zeng, C., Yu, X., Yang, Y., Wu, H. (2020, November). Robust Speaker Identification of IoT based on Stacked Sparse Denoising Autoencoders. In 2020 International Conferences on Internet of Things (iThings) (pp. 252257). IEEE.
 (39) Wang, Z., Liu, Q., Yao, H., Chen, J. (2015, October). Virtual chimebells experimental system based on multimodal fusion. In 2015 International Conference of Educational Innovation through Technology (EITT) (pp. 6467). IEEE.
 (40) Wang, Z., Duan, S., Zeng, C., Yu, X., Yang, Y., Wu, H. (2020, November). Robust Speaker Identification of IoT based on Stacked Sparse Denoising Autoencoders. In 2020 International Conferences on Internet of Things (iThings) (pp. 252257). IEEE.
 (41) Wang, Z., Zeng, C., Duan, S., Ouyang, H., Xu, H. (2020, August). Robust Speaker Recognition Based on Stacked Autoencoders. In International Conference on NetworkBased Information Systems (pp. 390399). Springer, Cham.
 (42) Wang, Z., Liu, Q., Chen, J., Yao, H. (2015, October). Recording source identification using device universal background model. In 2015 International Conference of Educational Innovation through Technology (EITT) (pp. 1923). IEEE.
 (43) Yao, H., Dai, F., Zhang, S., Zhang, Y., Tian, Q., Xu, C. (2019). Dr2net: Deep residual reconstruction network for image compressive sensing. Neurocomputing, 359, 483493.
 (44) Yao, S., Guan, Q., Wang, S., Xie, X. (2018). Fast sparsity adaptive matching pursuit algorithm for largescale image reconstruction. EURASIP journal on wireless communications and networking, 2018(1), 18.
 (45) Zarei, A., Asl, B. M. (2021). Automatic seizure detection using orthogonal matching pursuit, discrete wavelet transform, and entropy based features of EEG signals. Computers in Biology and Medicine, 131, 104250.
 (46) Zeng, C., Ye, J., Wang, Z., Zhao, N., Wu, M. (2022). Cascade neural networkbased joint sampling and reconstruction for image compressed sensing. Signal, Image and Video Processing, 16(1), 4754.
 (47) Zeng, C., Yan, K., Wang, Z., Yu, Y., Xia, S., Zhao, N. (2022). AbsCAM: a gradient optimization interpretable approach for explanation of convolutional neural networks. Signal, Image and Video Processing, 18.

(48)
Zeng, C. Y., Ma, C. F., Wang, Z. F., Ye, J. X. (2018, July). Stacked autoencoder networks based speaker recognition. In 2018 International Conference on Machine Learning and Cybernetics (ICMLC) (Vol. 1, pp. 294299). IEEE.
 (49) Zeng, C., Zhu, D., Wang, Z., Wu, M., Xiong, W., Zhao, N. (2021). Spatial and temporal learning representation for endtoend recording device identification. EURASIP Journal on Advances in Signal Processing, 2021(1), 119.
 (50) Zeng, C., Yang, Y., Wang, Z., Kong, S., Feng, S. (2022). Audio Tampering Forensics Based on Representation Learning of ENF Phase Sequence. International Journal of Digital Crime and Forensics (IJDCF), 14(1), 119.
 (51) Zeng, C., Zhu, D., Wang, Z., Wang, Z., Zhao, N., He, L. (2020). An endtoend deep source recording device identification system for web media forensics. International Journal of Web Information Systems, 16(4), 413425.
 (52) Zeng, C., Zhu, D., Wang, Z., Yang, Y. (2020, August). Deep and shallow feature fusion and recognition of recording devices based on attention mechanism. In International Conference on Intelligent Networking and Collaborative Systems (pp. 372381). Springer, Cham.
 (53) Zeng, C., Wang, Z., Wang, Z., Yan, K., Yu, Y. (2021, September). Image Compressed Sensing and Reconstruction of MultiScale Residual Network Combined with Channel Attention Mechanism. In Journal of Physics: Conference Series (Vol. 2010, No. 1). IOP Publishing.
 (54) Zeng, C., Wang, Z., Wang, Z. (2020, November). Image Reconstruction of IoT based on Parallel CNN. In 2020 International Conferences on Internet of Things (iThings) (pp. 258263). IEEE.
 (55) Zhang, J., Ghanem, B. (2018). ISTANet: Interpretable optimizationinspired deep network for image compressive sensing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 18281837).
 (56) Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L. (2017). Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE transactions on image processing, 26(7), 31423155.
 (57) Zhang, L. (2015, April). Image adaptive reconstruction based on compressive sensing via CoSaMP. In 2015 2nd International Conference on Information Science and Control Engineering (pp. 760763). IEEE.
 (58) Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y. (2018). Image superresolution using very deep residual channel attention networks. In Proceedings of the European conference on computer vision (ECCV) (pp. 286301).