I Introduction
Advanced techniques about 3D video [1], 360 panorama video [2], light field [3], etc., have received more and more attentions and have widely researched due to their practically applied values. However, the information carrier of these techniques mainly refers to image, thus Internet congestion may occur, because of explosive growth image data among social media and other new media. With this trend of rapidly increasing, the main source of Internet congestion will be caused by image/video transmission [4], so different kinds of images, especially natural image, should be extremely compressed to alleviate this problem.
Image compression aims at reducing the amounts of data to benefit image storage and transmission. Still image compression has been developed from early image compression standards such as JPEG and JPEG2000 to Google’s WebP and BPG, etc. In the earlier times, a lot of works [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17] mainly put their emphasis on postprocessing to reduce coding artifacts so as to improve coding efficiency, whose priority exists in that it doesn’t need change any part of existing coding standard. Lately, several works [18, 19, 20, 21, 22, 23, 24, 25] employ convolutional neural network (CNN) to remove image blurring and quantization artifacts caused by image compression. Among these works, a very special work [25] is an effective compression framework based on two collaborative convolutional neural networks, where one network is used to compactly represent image and the other one works as postprocessing to reduce coding distortion. This method has good performances at the case of very low bitrate coding, but it doesn’t explore how to improve coding efficiency at high bitrate. Thus, this method’s practical application is very limited, because image coding at low bitrate is required, only when the bandwidth is very narrow. Meanwhile, this method directly trains collaborative convolutional neural networks without considering quantization’s effects on neural network ahead of standard codec during backpropagation, so it’s a suboptimal solution for image compression.
Recently, image compression with deep neural networks (DNN) has achieved many great breakthroughs, such as [26, 27, 28, 29, 30, 31, 32, 33], among which some methods have exceeded JPEG2000 and even can compete with BPG. These methods target at resolving the challenging problem: quantization function within objective compression loss is nondifferentiable. The pioneering work [26]
leverages recurrent neural networks to compress image in fullresolution, where the binarization layer with stochastic binarization is used to backpropagate gradients. In
[28, 29], the quantizer in general nonlinear transform coding framework is replaced by an additive independent identically distributed uniform noise, which can make image compression optimized by gradient descent method. In [30], identity smooth function’s derivation is used as an approximation of rounding function’s derivation in the compressive autoencoders, but no modification is required during passing gradients from decoder to encoder. Most recently, soft assignment is formed by converting Euclidean distance between vector and each quantization center into probability model via softmax function
[31]. After that, soft quantization is defined by soft assignment, and then this smooth relaxation is used as the approximation of the quantized function, so that compression loss of autoencoder networks in terms of quantization can be optimized by stochastic gradient descent method.
Our intuitive idea is to learn projection from resampled vector to the quantized vector, so that we can jointly train our RSN network and IDN network together. However, we find it’s difficult to learn this projection directly with DNN. Fortunately, the projection can be well intimated by neural network from the above resampled vectors to the decoded image. Therefore, we propose an image resampling compression method (IRSC) by learning virtual codec network (VCN) to supervise resampling network (RSN) to resolve the nondifferentiable problem of quantization function within compression loss. For simplicity, we give a diagram of deep neural networks based compression framework (DNNC) for one dimension signal, as shown in Fig.1.
Our IRSC method can not only be used for DNNC framework, but it also can be applied to standardcompliant image compression framework (SCIC). Firstly, an input image is measured by RSN network to get resampled vectors. Secondly, these vectors are directly quantized in the resampling feature space for DNNC, or transformation coefficients of these vectors are quantized to further improve coding efficiency for SCIC after discrete cosine transform (DCT). At the encoder, the quantized vectors or transformation coefficients are losslessly compressed by arithmetic coding. At the decoder, the decoded vectors are utilized to restore input image by image decoder network (IDN). Both of SCIC and DNNC frameworks are built on autoencoder architecture, whose encoder is the RSN network and whose decoder is the IDN network. The encoder is used to condense input’s dimensionality inside the networks. Meanwhile, quantization reduces dimensionality in some dimensional space, no matter whether resampled vectors is processed by DCT transform. The decoder of autoencoder architecture reproduces the input image from these quantized vectors. The difference between SCIC and DNNC mainly comes from whether classical transformation such as DCT transform is explicitly applied to reduce statistical correlation of resampled vectors.
Obviously, the main difference between our SCIC and [25] is that our VCN network bridges the huge gaps of gradient backpropagation between RSN and IDN caused by quantization function’s nondifferentiability. Another difference lies in that our IRSC method is not restricted to image compression at very low bitrate. Because our VCN network could well backpropagate gradient from decoder to encoder, our method could conduct fullresolution image resampling. The third important difference is that our IRSC method can be applied into DNNC framework. Although our IRSC as well as [26, 27, 28, 29, 30, 31, 32] can process nondifferentiability of quantization function for image compression, our IRSC method’s application is not restricted to the application of DNNbased image compression.
The rest of this paper is arranged as follows. Firstly, we review traditional postprocessing methods and neural networkbased artifact removal techniques, as well as image compression methods with DNN in Section 2. Then, we introduce the proposed method in Section 3, which is followed by experimental results in the Section 4. At last, we give a conclusion in the Section 5.
Ii Related work
We firstly give a review of traditional artifact removal methods, where loop filtering and postprocessing filtering are included. Then, we look back on several stateoftheart artifact removal approaches based on neural network. At last, we give an overview of image compression methods with DNN.
Iia Traditional artifact removal approaches
Within coding loop, loop filtering can be explicitly embedded to improve coding efficiency and reduce artifacts caused by the coarse quantization. For example, adaptive deblocking filtering [6] is designed as a loop filter and integrated into H.264/MPEG4 AVC video coding standard, which does not require an extra frame buffer at the decoder. The priority of deblocking filtering inside coding loop lies in guaranteeing that an established level of image quality is coded and conveyed in the transmission channel. However, this kind of filtering always has comparatively high computational complexity. Meanwhile, loop filtering should be done at the decoder so as to be synchronized with the encoder, which prevents from adaptively decoding via turning on/off loop filtering, when making a balance between visual quality and computational cost.
In order to avoid these drawbacks and make filtering compatible to standard codec, the alternative flexible manner is to do postprocessing. For instance, a waveletbased algorithm uses threescale overcomplete wavelet to deblock via a theoretical analysis of blocking artifacts [7]. Later, through image’s total variation analysis, adaptive bilateral filters is used as a deblocking method to process two different kinds of regions [8]. In contrast, by defining a new metric to evaluate blocking artifacts, quantization noise on blocks is removed by nonlocal means filtering [9]. The above methods target at deblocking. However, coarse quantization on blockbased DCT domain usually causes visually unpleasant blocking artifacts as well as ringing artifacts. Thus, both deblocking and deartifacts should be carefully considered for better visual quality. In [10], both hardthresholding and empirical Wiener filtering are carried on shape adaptive DCT for denoising and deblocking.
Unlike the above mentioned methods [6, 7, 8, 9, 10], many works have incorporated some priors or expert knowledge into their models. In [13], compression artifacts are reduced by integrating quantization noise model with block similarity prior model. In [14], maximum a posteriori criterion is used for compressed image’s postprocessing by treating postprocessing as an inverse problem. In [15], an artifact reducing approach is developed by dictionary learning and total variation regularization. In [17], image deblocking is formulated as an optimization problem with constrained nonconvex lowrank model. In [18], both JPEG prior knowledge and sparse coding expertise are combined for JPEGcompressed images. In [34], sparse coding process is carried out jointly in the DCT and pixel domains for compressed image’s restoration. Image denoising is a more general technique, which is not designed for specific task. It can be applied for removing additive Gaussian noise, environment noise, and compression artifact, etc. For example, an advanced image denoising strategy is used to achieve collaborative filtering based on a sparse representation on transform domain [11]. In [35], selflearning based image decomposition is applied for single image denoising with an overcomplete dictionary, which can be used to alleviate coding artifacts. Although the above methods have good performances on artifacts removal, they always have a fairly high computational complexity via iterative optimization algorithms, which are timeconsuming.
IiB CNNbased postprocessing for standard compression
Due to neural network’s strong capacity, it has been successfully applied for some lowlevel tasks: such as image superresolution, image smoothing and edge detection
[36]. With this trend, many works such as [19, 20, 21, 22, 23, 24, 25] put their research on the issue of CNNbased postprocessing to improve user’s visual experience. In [19], artifacts reduction convolutional neural network is presented to effectively deal with various compression artifacts. To get better results, a 12layer deep convolutional neural network with hierarchical skip connections is trained with a multiscale loss function
[21]. Meanwhile, a deeper CNN model is used for image deblocking to obtain more improvements [22]. However, these methods are trained by minimizing mean square error, so the reconstructed image usually loses detail at the high frequencies and may be blurry around visually sensitive discontinuities. In order to generate more details, a conditional generative adversarial framework is trained to remove compression artifacts and make generated image very realistic as much as possible [24].Although the above methods have greatly alleviated the problem of ringing artifacts and blocking artifacts, their improvements are usually limited. This raises a new question: whether it’s possible to compactly represent image so that codec can more efficiently compress these images. The pioneering work [25] resolves this problem by directly training two collaborative neural networks: compact convolutional neural network and reconstruction convolutional neural network. This method performs well at very low bitrate, but it doesn’t consider how to resolve this problem at high bitrate, which strictly restricts their method’s wide applications.
For video coding’s postprocessing, there are several latest works about this issue, such as [20, 23]. For example, deep CNNbased decoder is presented to reduce coding artifacts and enhance the details of HEVCcompressed videos at the same time [20]. In [23], a convolutional neural network with scalable structure is used to reduce distortion of I and B/P frames in HEVC for quality enhancement. Despite that these approaches greatly reduce coding artifacts by postprocessing, these methods [19, 21, 22, 24, 20, 23] are limited, since their inputs directly use natural images/videos without compactly representing them, when comparing with [25].
IiC Deep neural networks based image compression
To achieve variablerate image compression, a general framework is presented based on convolutional and deconvolutional LSTM recurrent networks [26]. This method can address 32x32 thumbnails compression, but it may not be suitable for lossy image compression with fullresolution. To resolve this problem, the authors carefully design a fullresolution lossy image compression method, which is composed of a recurrent neural networkbased encoder and decoder, a binarizer, and a neural network for entropy coding [27]. At the same period, nonlinear transform coding optimization framework is introduced to jointly optimize their entire model in terms of the tradeoff between coding rate and reconstruction distortion [28, 29]. Later, a compressive autoencoder architecture is efficiently trained for highresolution images using convolutional sampling layer and subpixel convolution [30]. After that, using the same neural network architecture, soft assignments with softmax function is leveraged to softly relax quantization so as to optimize the ratedistortion loss [31]
. In the meanwhile, bitrate is allocated for image compression by learning a content weighted importance map and this map is used as a continuous estimation of entropy so as to control image compression’s bitrate. Although these methods have greatly improve coding efficiency, their compressed image always doesn’t have pleasing details, especially at very low bitrate.
Due to generative model’s huge progress, image generation becomes better and better. Particularly, generative adversarial networks (GAN) has been widely researched and achieved more stable results than previous methods for image generation, style transfer, and superresolution, etc [36]. Following this trend, adversarial loss is introduced into adaptive image compression approach so as to achieve visually realistic reconstructions [33]. Most recently, semantic label map is leveraged as supplementary information to help GAN’s generator to produce more realistic images, especially at extremely low bitrate [37]. Although image compression based on DNN has made great progress in some respect, there is still a lot of development space for image/video compression. More importantly, a general image compression method with DNN is required for both standardcompliant image compression and DNNbased image compression.
Iii Methodology
Given an input image , we use RSN network to get resampled vectors in the lowdimension space. For the sake of simplicity, RSN network is expressed with a nonlinear function , whose parameter set is denoted as . After resampling, these vectors are quantized, which is described as a mapping function , where is the parameter of quantization. This function will be detailed later. The quantized vectors are losslessly encoded by arithmetic coding to facilitate channel transmission. Because the vectors lose some information caused by quantization, there are coding distortions between input image and decoded image . At the receiver, IDN network parameterized by learns a nonlinear function to restore the input image from .
Since quantization function is nondifferentiable, we can find that this function can’t directly optimized by gradient descent method. Several approaches [26, 27, 28, 29, 30, 31, 32, 33, 37] give their solutions for this problem. Different from these approaches, we learn a approximation function from the resampled vectors to the decoded image with the VCN network, and thus we can use its derivation to approximate the quantization function’s derivation during backpropagation. As a consequence, we can optimize our RSN network and IDN network in an endtoend fashion with the learned VCN network. In order to verify the proposed method’s generalization, we use this method for SCIC framework and DNNC framework, which will be detailed next. Note that we employ function in Fig.2 rather than directly using quantization function in Fig.1. In the SCIC framework, represents the mapping function from the resampled vector to the decoded lossy resampled vector through several steps: transformation such as DCT transform, quantization, arithmetic coding, dequantization and inverse transformation.
Iiia Standardcompliant image compression framework
To make our framework suitable for different scenarios, we use mixresolution image compression in this framework, so that our method has high coding efficiency ranging from low bitrate to high bitrate. Specifically, fullresolution resampling for RSN network is designed for image compression at high bitrate. When compressing image below certain low bitrate, that is, each pixel’s quality is very low, image can’t be well restored from fullresolution resampled vectors due to each pixel having little bits to be assigned. Furthermore, there is almost no more bit assigned for image details, so only image structures are mainly kept after decoding. Therefore, downsampling layer for RSN network is leveraged to greatly reduce image information, so that each pixel of lowresolution resampled image can be assigned with more bits, as compared to fullresolution resampled vectors. As a result, in relative to fullresolution, we can get highquality but lowresolution images at the decoder, which are leveraged to restore highquality yet fullresolution image by IDN network. The details about how to choose lowresolution resampling or fullresolution resampling will be presented in the experimental section.
IiiA1 Objective function
Our objective compressive function for SCIC framework can be written as follows:
(1) 
where the first term is image decoding loss for IDN network, the second term is virtual codec loss for VCN network. Meanwhile, the last term is structural dissimilarity (DSSIM) loss, which is explicitly used to regularize RSN network. Here, RSN, IDN, and VCN are parameterized by , , and respectively, while is the linear upsampling operator so that could keep the same image size with for lowresolution resampling. But, , when image compression takes fullresolution resampling.
In order to decode image with IDN network from , as shown in Fig.2, we use data loss and gradient difference loss to regularize IDN network. Meanwhile, our VCN network is trained with data loss as well as gradient different loss between and . It has been reported that the L1 norm has better performance to supervise convolutional neural network’s training than L2 norm [16] and [36]. For example, future image prediction from video sequences is learned via loss function with L1 norm [16]. Moreover, both gradient difference loss and data loss with L1 norm are used to supervise tasks of simultaneous colordepth image superresolution or concurrent edge detection and image smoothing with conditional GAN [36]. Accordingly, we use the L1 norm for our data loss and gradient difference loss. Data loss can be defined as:
(2) 
Gradient difference loss can be written as:
(3) 
where is the th gradient between each pixel and th pixels among its 8neighbourhood .
Usually, it’s hoped that decoded resampled vectors are able to be watched by the receivers, even though without IDN network’s processing, so the resampled vector’s structural information should be similar to the input image . As a consequence, is used to further supervise the learning of RSN network, in addition to loss from the IDN network. Based on [38], DSSIM loss between and can be defined as follows:
(4) 
(5) 
where and are two constants. We set them respectively to be and . and
respectively are mean value and variance of the neighborhood window centered by pixel
in the image . Similarly, as well as can be denoted in this way. Meanwhile, refers to the covariance between neighborhood windows centered by pixel in in the image and in the image . As we all know, the function of structural similarity index (SSIM) is differentiable, so can be optimized with gradient descent method. Besides, DSSIM loss between and is explicitly used to regularize the VCN network, when downsampling layer is employed in the RSN network, since the mapping from the resampled vectors to the compressed lossy image should be well learned for efficient back propagation at very low bitrate.IiiA2 Network
In the RSN network, seven convolutional layers are used to resample to get , as shown in Fig.2
. Within this network, the weight’s spatial size of these convolutional layers is 9x9 in the first layer and last layer, which makes convolutional neural network’s receptive field large. Other five convolutional layers in the RSN network use 3x3 convolution kernel to further enlarge the size of receptive field. These convolutional layers are used to increase the nonlinearity of the network, when ReLU is followed to activate the output features of these convolutional layers. The output feature map number of 16 convolutional layers is 128, but the last layer only has one output feature map so as to keep consistent with the input image
. Each convolutional layer is operated with a stride of 1, except that the second layer uses stride step of 2 to downsample feature maps, so that the convolution operation is carried out in the lowdimension space to reduce computational complexity from the third convolutional layer to the 7th convolutional layer. It’s worthy to noticing that the second layer uses stride step of 1 so that
is resampled with fullresolution, when the coding bitrate is beyond certain values. All the convolutional layers are followed by an activation layer with ReLU function, except the last convolutional layer.In the IDN network, we leverage seven convolutional layers to extract features and each layer is activated by ReLU function. The size of convolutional layer is 9x9 in the first layer and the left six layers use 3x3, while the output channel of feature map equals to 128 in these convolutional layer. After these layers, one deconvolution layer with size of 9x9 and stride to be 2 is used to upscale feature map from lowresolution to highresolution for lowresolution resampling compression. Thus, the size of output image is matched with the ground truth image. However, if is fullresolution, the last deconvolution layer is replaced by convolutional layer with size of 9x9 and stride to be 1.
In our method, VCN network is designed to have the same network structure as IDN network, because they are the same class of lowlevel image processing problems. The role of the VCN network is to make the resampled vectors degrade to a decoded lossy but fullresolution image . Different from VCN network, IDN network works to restore input image from quantized resampled vectors so that the user could receive a highquality image at decoder.
IiiB Deep neural networks based compression framework
Here, we choose the autoencoder architecture in [30] for DNNC framework, but the subpixel convolutional layer for color image is replaced by deconvolutional layer for gray image compression. The encoder network within our framework is called RSN network, while decoder network is named IDN network. From [30], it can be easily found that the major components of autoencoder architecture are convolutional layer with stride to be 2 and subpixel convolution as well as ResNet blocks.
Previous approaches such as [26, 27, 28, 29, 30, 31, 32, 33, 37] use the specific approximation functions to make quantization differentiable so that make their framework trained in an endtoend way. For example, One direct way is to replace quantization function with differentiable quantization such as stochastic rounding function [26, 27], softtohard quantization [31], or replace quantization step with additive uniform noise [28, 29]. The other alternative way is only to use approximation function’s gradient during backpropagation [30], but the forward pass still uses classic quantization in order to not change gradients of the decoder network.
We provide a novel way to resolve this problem, which is to learn virtual codec (VCN network), and thus the gradient of quantization function from the IDN network to RSN network can be approximated by the VCN network’s gradient during backpropagation. The objective compressive loss function can be defined as:
(6) 
(7) 
(8) 
in which the symbol marking is similar to Eq.(1). Here, is the image decoding loss for IDN network and is the virtual codec loss for VCN network.
Different from Eq.(1), there is no DSSIM loss between and , because the resampled vectors are hoped to work like wavelet transform, which decomposes input image to lowfrequency components and highfrequency components. Thus, we don’t impose a SSIM loss restriction on resampled vectors within the DNNC framework. As shown in Fig.3, the resampled vectors are listed in the zigzag scanning order, from which we can see that the RSN network decompose into multiple components. Each component contains particular information of image . We can well restore the input image from these vectors, when these vectors are losslessly transmitted over the channel. In order to further compress these vectors, quantization can be operated on these vectors and then the quantized vectors are encoded by arithmetic coding. Without learning parameters of quantization, we first normalize the resampled vectors between 0 and 1. And then we rescale and round them to be integers among . They can be written as:
(9) 
where and is the minimum value and the maximum value of among training data’s resampled vectors using the pretrained network. Accordingly, can be written as:
(10) 
When we set feature map’s number to be constant value like [30], the resampled vectors always tend to have some redundancy, which leads to high bitrate coding. Thus, we change the feature map’s numbers to control image compression’s bitrate, e.g., for compact image resampling. Meanwhile, we set quantization parameter to constant value in our DNNC framework, from which we can see that our DNNC framework doesn’t require to learn the quantization parameter.
IiiC Learning algorithm for both of our frameworks
Due to the difficulty of directly training the whole framework once, we decompose the learning of three convolutional neural networks as three subproblems learning. First, we initialize the parameter sets, , , and of RSN network, IDN network, and VCN network. Because both of our frameworks are built on the autoencoder, we can initialize these three networks by pretraining autoencoder networks, which contains RSN network and IDN network without quantization. In fact, our two frameworks become classical autoencoder networks, when there is no quantization. After the initialization neural networks, we use RSN network to get an initial resampled vector from the input image , which is then lossly encoded by standard codec or quantized by rounding function as the training data at the beginning. Next, the first subproblem learning is to train IDN network by updating the parameter set of . The resampled vectors and IDNdecoded image are used for the second subproblem’s learning of VCN to update parameter set of . After VCN’s learning, we fix the parameter set of in the VCN network to carry on the third subproblem learning by optimizing the parameter set of in the RSN network. After RSN network’s learning, the next iteration begins to train the IDN network, after the updated resampled vectors are compressed by the standard codec for the SCIC framework or only quantized by rounding function for the DNNC framework. The whole training process is summarized in the Algorithm1. It is worth mentioning that the functionality of VCN network is to bridge great gap between RSN network and IDN network. Thus, once the training of the whole framework is finished, the VCN network is not in use any more, that is to say, only the parameter sets of , in the networks of RSN and IDN are used during testing.
Iv Experiment and analysis
To validate the versatility and effectiveness of the proposed method, we apply our image resampling compression method for SCIC framework as well as DNNC framework. In our JPEGcompliant image compression framework, namely ”Ours(J)”, we compare it with JPEG, JPEG2000 and several combinatorial methods, which are standard JPEG compression followed by several artifacts removal, such as [15], [10], [17], [19]. These combinatorial methods are respectively denoted as ”DicTV”, ”Foi’s”, ”CONCOLOR”, ”ARCNN”. Among these methods, ”ARCNN” is the class of CNNbased artifact removal method. Meanwhile, we compare our learning algorithm with one highly related learning algorithm proposed in [25]. To clearly observe the differences between these two algorithms, we use our RSN network and IDN network trained by the learning algorithm presented in [25], whose results are called as ”Jiang’s”. Here, RSN network and IDN network correspond to ”ComCNN” and ”RecCNN” networks in [25]. Meanwhile, other learning details of this approach such as training dataset and batchsize, etc., keep consistency with ours, except learning algorithm. In other words, ”Jiang’s” directly trains RSN network and IDN network in an iterative way, while our method trains VCN network to bridge the gap, which is quantization function’s nondifferentiability, between RSN network and IDN network. Moreover, we compare our DNNC framework’s compression results, which is denoted as Our(D), with JPEG, JPEG2000 and Our(J). Among these comparison, two objective measurements: SSIM, and Peak Signal to Noise Ratio (PSNR) are used to evaluate the efficiency of different image compression methods.
Iva Training details
To get training dataset, 291 images come from [39] and [40]. Among these images, 91^{1}^{1}1https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/ images are from [39], while others^{2}^{2}2https://www.ifp.illinois.edu/~jyang29/codes/ScSR.rar use BSDS500’s training set. Our training dataset consists of 1681 image patches with size 160x160, which are formed by cropping, downsampling and assembling with small patches, whose size is less than 160x160. During training, each batch of image patches rotates and flips randomly. Moreover, the dataset of General100 is used as the validation dataset.
To verify the effectiveness of the proposed method, we use several testing dataset: Set5, Set7, Set14, and LIVE1. Among them, the dataset of Set7 is built up with seven testing image by [25], while other datasets are widely used for image superresolution, artifacts removal and image compression. Because some of comparative method mentioned above require image size to be an integer multiple of 8, all the testing images are cropped to be an integer multiple of 8. All the training dataset, validation dataset and testing dataset can be downloaded according to the website ^{3}^{3}3https://github.com/VirtualCodecNetwork.
Our frameworks are implemented in the platform of TensorFlow. Our models are trained using Adam optimization method with beta1=0.9, beta2=0.999. The initial learning rates for three convolutional neural network are set to be 0.0001, while the learning rates decay to be half of last learning rate, once the training step reaches 3/5 and 4/5 of total step.
IvB Quantitative and qualitative evaluation between SCIC framework and several stateoftheart methods
Our image resampling within the SCIC framework not only refers to image fullresolution resampling but also image lowresolution resampling. Thus, we first need to choose fullresolution resampling or image lowresolution resampling at some points of bitperpixel (bpp). The results of our(J) testing on the validation dataset General100 with different iterative number as well as different resampling ways are shown in Fig.4, where Our(J)L3 and Our(J)F3 respectively represent our(J) by lowresolution and fullresolution resampling using Algorithm1 with . Similarly, we denote others in this way, such as Our(J)L1, Our(J)L2, Our(J)F1, and Our(J)F2.
IvB1 Objective Comparisons
From Fig.4, it can be observed that, at high bitrate in relative, Our(J) with more iterations has more SSIM, PSNR gains than Our(J) with less iteration, no matter what kind of resampling is taken among fullresolution resampling and lowresolution resampling. In the meanwhile, Our(J) should be have less iteration during training at the case of low bitrate. Our(J)L has better performance than Our(J)F on objective measurements below certain low bitrate about 0.4 bpp, since image can’t be well restored from fullresolution resampled vectors, when each pixel has very little bits to be assigned, that is to say, each pixel’s quality is very low.
In our experiments, these lowresolution resampled vectors are compressed with QF sets 2, 6, 10, 20, 30, 40, 50, 60. Meanwhile, fullresolution sampled images are compressed with QF sets 2, 6, 10, 20, 30, 50, 60. Based on the performance on validation dataset of General100, the final results of Our(J) are performed as follows: Our(J)L1 (QF=2, 6, 10, 20, 30, 40) and Our(J)H3 (QF=10, 20, 30, 50, 60), which are displayed in Fig.6. At the same time, we also give the objective quality comparisons and iterative number’s effects on the performance using our JPEGcompliant image compression with Algorithm1, when performing on testing dataset mentioned above, as shown in Fig.5.
As displayed in Fig.6, Our(J) has the best performance on all the testing datasets at both low and high bitrates in terms of SSIM, as compared to JPEG, JPEG2000, and several combinatorial methods: DicTV [15], Foi’s [10], CONCOLOR [17], ARCNN[19]. For objective measurements of PSNR, Our(J) gets more gains than DicTV [15], Foi’s [10], CONCOLOR [17], ARCNN [19] in most cases, when these methods are compared with JPEG. Among these combinatorial methods, CONCOLOR has better objective measurements than DicTV, Foi’s and ARCNN, while DicTV has the worst performance.
When testing on Set5 and Set7, Our(J) can compete with or even better than JPEG2000 in the aspects of PSNR. But Our(J) has lower value than JPEG2000 when testing on Set14, and LIVE1. Since our compressive loss explicitly uses the DSSIM loss for RSN network, image’s structures in our resampled vectors have protected, which leads to have better structural preservation for Our(J), as compared to others.
From Fig.6, it can be also clearly seen that Our(J) has more SSIM and PSNR gains in the whole range of bitrate against Jiang’s [25], which can well prove that our algorithm is better than the one of [25]. It also indicates that our virtual codec network can effectively bridge the gap between RSN network and IDN network. Note than Jiang’s [25] only considers image compression at low bitrate, but our method can satisfy client’s different requirements.
IvB2 Visual Comparisons
Before image decoding reconstruction comparisons between different compression methods, our resampled vector by RSN’s downsampling is first compared with Jiang’s compact representation [25], as shown in Fig.8 (b1b3, d1d3), from which we can see that our resampled vector’s image can more accurately highlight image’s key features. Apart from the downsampling comparison, we also compare our fullresolution resampling with Jiang’s downsampling compact representation at high bitrate, as displayed in Fig.9. Meanwhile, we also compare our compressed resampled vector with Jiang’s compressed compact representation in Fig.8 (c1c3, e1e3) and Fig.9 (c1c3, e1e3). From these comparison, it can be also concluded that downsampling compact representation can’t burden image’s more information, when more bitrate beyond certain value is assigned for the compression of compact representation. This further turns out that our fullresolution resampling is meaningful and efficient to satisfy the scenario of image compression at high bitrate.
From the (f1m1, f2m2, f3m3) of Fig.8 and Fig.9, it can be noticed that Our(J) preserves image’s more structural details than other methods: JPEG, JPEG2000, DicTV [15], Foi’s [10], CONCOLOR [17], and ARCNN[19]. Meanwhile, our method is free of coding’s blocking and ringing artifacts than JPEG and JPEG2000. Among these combinatorial approaches [15, 10, 17, 19], CONCOLOR and ARCNN have better visual quality than others, while ARCNN’s decoded image has a little higher visual quality than CONCOLOR’s.
IvC Quantitative and qualitative evaluation between DNNC framework and SCIC framework as well as standard codecs
IvC1 Objective Comparisons
To further demonstrate the effectiveness of our DNNC framework, we compare DNNC framework’s results with SCIC framework as well as standard codec. From Fig.10, it can be found that Our(D)’s SSIM measurements are better than JPEG for all the testing dataset, especially at low bitrate, and Our(J) has better performance than JPEG2000. Our(D) even can compete against JPEG2000, when testing dataset of Set5 and Set7. However, Our(D)’s SSIM measurements testing on Set14 and LIVE1 is lower than Our(J) and JPEG2000. Meanwhile, coding efficiency of Our(D) is better than JPEG in term of PSNR at low bitrate, but is lower than JPEG at high bitrate. Besides, Our(J) can compete with JPEG2000 when testing on Set5 and Set7 for PSNR, but JPEG2000’s PSNR are large than Our(J)’s on Set14, and LIVE1.
IvC2 Visual Comparisons
The visual comparisons are displayed in Fig.11, from which we can see that both Our(D) and Our(J) are free of blocking artifacts and ringing artifacts around discontinuities, when compared to standard codec such as JPEG and JPEG2000. From this figure, we can also observe that JPEG2000 has better visual quality than JPEG, but they are less than Our(D) and Our(J). Although both of Our(D) and Our(J) compress image with high quality, they have different structural and textual preservation at the boundary of image. Beside, images compressed by Our(J) have more smoothness than the ones of Our(D).
V Conclusion
In this paper, an image resampling compression method is proposed to efficiently compress image. We generalize this method for SCIC framework and DNNC framework. Due to the intractable problem of learning the whole framework directly, so we decompose this challenging optimization problem into three subproblems learning. Furthermore, because our coding frameworks are built on autoencoder architecture, whose output reproduces the input, we can initialize our networks from pretrained autoencoder networks. Experimental results have shown that the proposed method is versatile and effective.
References
 [1] W. Lie, C. Hsieh, and G. Lin, “Keyframebased background sprite generation for hole filling in depth imagebased rendering,” IEEE Transactions on Multimedia, vol. PP, no. 99, pp. 1–1, 2017.
 [2] V. Gaddam, M. Riegler, C. Griwodz, and P. Halvorsen, “Tiling in interactive panoramic video: approaches and evaluation,” IEEE Transactions on Multimedia, vol. 18, no. 9, pp. 1819–1831, 2016.
 [3] G. Wu, B. Masia, A. Jarabo, Y. Zhang, L. Wang, and Q. Dai, “Light field image processing: an overview,” IEEE Journal of Selected Topics in Signal Processing, vol. PP, no. 99, pp. 1–1, 2017.
 [4] C. Systems, “Cisco visual networking index: Global mobile data traffic forecast update 2016–2021 white paper,” http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11520862.pdf, 2017.
 [5] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in arXiv:1412.6980, 2014.
 [6] P. List, A. Joch, J. Lainema, G. Bjontegaard, and M. Karczewicz, “Adaptive deblocking filter,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 614–619, 2003.
 [7] A. Liew and H. Yan, “Blocking artifacts suppression in blockcoded images using overcomplete wavelet representation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 4, pp. 450–461, 2004.
 [8] N. Francisco, N. Rodrigues, E. DaSilva, and S. DeFaria, “A generic postdeblocking filter for block based image compression algorithms,” Signal Processing: Image Communication, vol. 27, no. 9, pp. 985–997, 2012.
 [9] C. Wang, J. Zhou, and S. Liu, “Adaptive nonlocal means filter for image deblocking,” Signal Processing: Image Communication, vol. 28, no. 5, pp. 522–530, 2013.
 [10] A. Foi, V. Katkovnik, and K. Egiazarian, “Pointwise shapeadaptive DCT for highquality denoising and deblocking of grayscale and color images,” IEEE Transactions on Image Processing, vol. 16, no. 5, pp. 1395–1411, 2007.
 [11] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3D transformdomain collaborative filtering,” IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007.
 [12] S. Yoo, K. Choi, and J. Ra, “Postprocessing for blocking artifact reduction based on interblock correlation,” IEEE Transactions on Multimedia, vol. 16, no. 6, pp. 1536–1548, 2014.
 [13] X. Zhang, R. Xiong, X. Fan, S. Ma, and W. Gao, “Compression artifact reduction by overlappedblock transform coefficient estimation with block similarity,” IEEE Transactions on Image Processing, vol. 22, no. 12, pp. 4613–4626, 2013.
 [14] D. Sun and W. Cham, “Postprocessing of low bitrate block DCT coded images based on a fields of experts prior,” IEEE Transactions on Image Processing, vol. 16, no. 11, pp. 2743–2751, 2007.
 [15] H. Chang, M. Ng, and T. Zeng, “Reducing artifacts in JPEG decompression via a learned dictionary,” IEEE Transactions on Signal Processing, vol. 62, no. 3, pp. 718–728, 2014.
 [16] M. Mathieu, C. Couprie, and Y. LeCun, “Deep multiscale video prediction beyond mean square error,” in arXiv: 1511.05440, 2015.
 [17] J. Zhang, R. Xiong, C. Zhao, Y. Zhang, S. Ma, and W. Gao, “CONCOLOR: Constrained nonconvex lowrank model for image deblocking,” IEEE Transactions on Image Processing, vol. 25, no. 3, pp. 1246–1259, 2016.

[18]
Z. Wang, D. Liu, S. Chang, Q. Ling, Y. Yang, and T. Huang, “D3: Deep
dualdomain based fast restoration of JPEGcompressed images,” in
IEEE Conference on Computer Vision and Pattern Recognition
, Las Vegas, NV, United States, Jun. 2016.  [19] C. Dong, Y. Deng, L. Change, and X. Tang, “Compression artifacts reduction by a deep convolutional network,” in IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts, Jun. 2015.

[20]
T. Wang, M. Chen, and H. Chao, “A Novel Deep LearningBased Method of Improving Coding Efficiency from the DecoderEnd for HEVC,” in
IEEE International Conference on Data Compression Conference (DCC), Boston, Massachusetts, Apr. 2017.  [21] L. Cavigelli, P. Hager, and L. Benini, “CASCNN: A deep convolutional neural network for image compression artifact suppression,” in IEEE Conference on Neural Networks, Anchorage, AK, USA, May 2017.
 [22] K. Li, B. Bare, and B. Yan, “An efficient deep convolutional neural networks model for compressed image deblocking,” in IEEE International Conference on Multimedia and Expo, Hongkong, China, Jul. 2017.
 [23] R. Yang, M. Xu, and Z. Wang, “Decoderside HEVC quality enhancement with scalable convolutional neural network,” in IEEE International Conference on Multimedia and Expo, Hongkong, China, Jul. 2017.
 [24] L. Galteri, L. Seidenari, M. Bertini, and B. Del, “Deep generative adversarial compression artifact removal,” in arXiv: 1704.02518, 2017.
 [25] F. Jiang, W. Tao, S. Liu, J. Ren, X. Guo, and D. Zhao, “An endtoend compression framework based on convolutional neural networks,” IEEE Transactions on Circuits and Systems for Video Technology, vol. PP, no. 99, pp. 1–1, 2017.
 [26] G. Toderici, S. Malley, S. Hwang, D. Vincent, D. Minnen, S. Baluja, and et al., “Variable rate image compression with recurrent neural networks,” in International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, May 2016.
 [27] G. Toderici, D. Vincent, N. Johnston, S. Hwang, D. Minnen, J. Shor, and M. Covell, “Full resolution image compression with recurrent neural networks,” in IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul. 2017.
 [28] J. Ballé, V. Laparra, and E. Simoncelli, “Endtoend optimization of nonlinear transform codes for perceptual quality,” in IEEE Conference on Picture Coding Symposium, Nuremberg Germany, Dec. 2016.
 [29] J. Ballé, V. Laparra, and E.Simoncelli, “Endtoend optimized image compression,” in International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, May 2016.
 [30] L. Theis, W. Shi, A. Cunningham, and F. Huszár, “Lossy image compression with compressive autoencoders,” in arXiv: 1703.00395, 2017.
 [31] E. Agustsson, F. Mentzer, M. Tschannen, L. Cavigelli, R. Timofte, L. Benini, and L. Gool, “SofttoHard Vector Quantization for EndtoEnd Learning Compressible Representations,” in Neural Information Processing Systems, Long Beach, Dec. 2017.
 [32] M. Li, W. Zuo, S. Gu, D. Zhao, and D. Zhang, “Learning convolutional networks for contentweighted image compression,” in arXiv: 1703.10553, 2017.
 [33] O. Rippel and L. Bourdev, “Realtime adaptive image compression,” in arXiv: 1705.05823, 2017.
 [34] X. Liu, X. Wu, J. Zhou, and D. Zhao, “Datadriven soft decoding of compressed images in dual transformpixel domain,” IEEE Transactions on Image Processing, vol. 25, no. 4, pp. 1649–1659, 2016.
 [35] D. Huang, L. Kang, Y. Wang, and C. Lin, “Selflearning based image decomposition with applications to single image denoising,” IEEE Transactions on Multimedia, vol. 16, no. 1, pp. 83–93, 2013.
 [36] L. Zhao, J. Liang, H. Bai, A. Wang, and Y. Zhao, “Simultaneously colordepth superresolution with conditional generative adversarial network,” in arXiv: 1708.09105, 2017.
 [37] E. Agustsson, F. Mentzer, M. Tschannen, R. Timofte, and L. V. Gool, “Generative adversarial networks for extreme learned image compression,” in arXiv: 1804.02958, 2018.
 [38] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
 [39] Y. J., J. Wright, T. Huang, and Y. Ma, “Image superresolution via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 11, pp. 2861–2873, 2010.
 [40] P. Arbeláez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection and hierarchical image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 5, pp. 898–916, 2011.
Comments
There are no comments yet.