Image inverse problem center around the recovery of an unknown image based on given corrupted measurement . It is an ill-posed problem because a specific corrupted image
can correspond to a crop of possible high-quality images. The problem has been extensively explored in the past several decades while deep convolutional neural networks (ConvNets) currently set the state-of-the-art, such as denoising , or single-image super-resolution . The commonly suggested and very effective path to the inverse problem is as follows: Given many example of pairs of an original image and its corrupted version, one could learn a deep network to match the degraded image to its source , for example, [14, 16, 5, 9, 6, 8, 17, 13, 10, 3].
Ulyanov et al.  proposed a new strategy, namely Deep Image Prior (DIP), for a single image inverse problem where common strategies are on longer feasible because only one corrupted image (without the original image) is available for model training. Mataev et al.  further improved the performance of the DIP by adding an extra regularization (Regularization by Denoising).
Although Ulyanov et al.  and Mataev et al.  proofed that DIP and its variations are very effective machines for handling various inverse problems, we have to figure out a stopping method before applying DIPs to real-world problems where human supervision is not available. Currently, DIPs stop when humans assess their outputs as good enough or reach their maximum iteration times [11, 7]. The stopping method should output a measurement that indicates how well DIPs have reconstructed the interested image. So, the training algorithm can stop itself when the measurement reaches the maximum.
In this work, we propose a stopping method, namely Orthogonal Stopping Criterion (OSC), which adds a pseudo noise to the corrupted image and measure the pseudo-noise component in the recovered image of each iteration based on the orthogonality between signal and noise. The growth-rate derivate of the measurement will reach its maximum when DIPs start focusing on reconstructing the pseudo noise, which means the training should be stopped because DIPs resist "bad" solutions and descends much more quickly towards naturally-looking images . We use DIP as the baseline111https://github.com/DmitryUlyanov/deep-image-prior and have demonstrated the performance of OSC for several problems such as denoising, super-resolution, inpainting.
The inverse tasks such as denoising, super-resolution and inpainting can be expressed as energy minimization problem of equation (1), where is a task-dependent data term, is the noisy/low-resolution/occluded image, is the reconstructed image, and is a regularizer .
In this work, we handle the inverse tasks by equation (2) where is the pseudo noise and all that we get during the minimization are represented as set . The minimization is stopped according to equation (3), where measures the growth-rate derivate of the pseudo-noise component in . The pseudo-noise component is highly correlate to , because is orthogonal to all components in including ground truth and corruptions. Since the reconstruction of needs much more iterations than the naturally-looking image in , the interested naturally-looking image will be reconstructed by DIP before the growth-rate derivate of the pseudo-noise component reaches its maximum, as long as the reconstruction difficulty of is harder than the naturally-looking image and easier than (or equal to) other corruptions.
Given a series of reconstructed images , we get the pseudo noise component by equation (4), where is the number of elements in and indicates the image which is reconstructed in the th iteration.
curve, the Peak Signal to Noise Ratio (PSNR) curve of DIP, the PSNR curve of OSC, and curvature curves. All curves are normalized according to their own minimum and maximum except the PSNR curve of OSC which uses the minimum and the maximum of the DIP PSNR curve. The curvature curve records the growth-rate derivate of thecurve. To get the curvature curve, for a specific , we find 3 points on the curve to define the new coordinate system shown in dash line, which are , , where , , , , , , , , . defines the length of curve for curvature calculation. is the averaging window size. After mapping the curve between and to the dash-line coordinate system, we fit a parabola to the curve and use the parameter of the quadratic item as the curvature at index which is an approximation of the growth-rate derivate of the pseudo-noise component. From Figure 1, although the curvature-maximum PSNR is not the maximum one during the whole OSC iteration, it’s close enough that the ratios of the curvature-maximum PSNR to the maximum one are more than 95% in the most of our experiments. It is clear in Figure 1 that the existence of the pseudo noise will harm the maximum PSNR but it is insignificant. The OSC method has been listed in Algorithm 1.
is a 0 mean 1/25 standard deviation Gaussian pseudo noise for default. All OSC experiments are same as DIP’s except the using of the pseudo noise. DIP experiments are stopped at suggested iteration or when PSNR reaches maximum. OSC experiments are stopped when the curvature reaches maximum.
3.1 Denoising and generic reconstruction
Figure 2 shows the restoration of a JPEG-compressed image where we repeat the experiment using DIP and OSC. Figure 2 (b) is the image at the suggested stop iteration, (c) is obtained by OSC. The image automatically selected by OSC is better than the DIP result without the supervision of humans.
Figure 3 shows the denoising results of DIP and OSC, where (c) is selected by PSNR, (d) is the result of suggested iteration which is selected based on human inspection, (e) the result of OSC. The result of OSC is close to the PSNR-maximum image, and better than the suggested iteration.
We have done the denoising experiments on Kate, Snail, F16 images and the results are shown in table 1. The JPEG corrupted Snail image has been used as the ground truth. DIP (PSNR) gives out the maximum PSNR and DIP (Iteration) shows the PSNR of 3000 iteration which is the default value in the DIP code. The PSNR values of images selected by OSC are listed in the 4th row followed by the maximum PSNR that we have gotten during OSC iterating. Accuracy is the ratio of 4th row to 5th row. As shown in table 1, OSC results are comparable to DIP which needs the supervision of humans. The Max PSNR is close to DIP (PSNR) which means that the addition of pseudo noise has little influence on the noisy image reconstruction.
For super-resolution, we train OSC to minimize equation (2) using where is the reconstructed image, a down-sampled observation, is Lanczos down sample method which is used by . We tested DIP and OSC on Set5  and Set14  with down scales 4 and/or 8.
Figure 4 has shown the examples of 4x image super-resolution, where (c) is stopped at the PSNR-maximum point, (d) is stopped at the suggested 2000 iteration, (c) is generated by OSC. OSC results are very close to the PSNR-maximum version of DIP, which means that OSC has found the near-optimal solution for super-resolution inverse problem automatically.
Table 2 shows the 4x super-resolution results of Set5 where maximum PSNRs of DIP are in the 1st row, PSNRs of 2000 iterations DIP are in the 2nd row, OSC results are in the 3rd row followed by maximum PSNRs of OSC and accuracy in the last. Similarly, the results of 4x and 8x super-resolution on Set14 are shown in table 3 and table 4. DIP was stopped at 8000 iterations in the 8x super-resolution experiment. From table 2, table 3 and table 4, we believe that OSC is very good at finding optimal stopping iteration for super-resolution problems because the accuracy is higher than 95% for all testing images.
For inpainting, we train OSC to minimize equation (2) using where is the reconstructed image, is a corrupted observation, is Hadamard’s product, is a binary mask of the missing pixels in , is the height and is the width of the image. is the pseudo noise generated by equation (6), where , , are channel, row, column index respectively, , ,
is the random uniform distribution. The pseudo-noise component is calculated by equation (7), where , is the number of elements in the image.
Figures 5 and 6 shows the results of regional recovery. Figure 7 shows the results of two inpainting approaches. Table 5 lists PSNRs of the experiments. The OSC results are very close to DIP maximum PSNR which means the OSC method is fully capable of finding optimal stopping iteration automatically.
In this work, we have developed Orthogonal Stopping Criterion (OSC) which can endow Deep Image Prior (DIP) the power of automation. The automatic stopping mechanic is essential to DIP in real-world applications because the Peak Signal to Noise Ratio (PSNR) and human supervision are both unavailable or hard to reach. By adding pseudo noise to the corrupted image, OSC can find the near-optimal result automatically which is very close to the one with maximum PSNR in our experiments. Additionally, the pseudo noise has little influence on the maximum PSNR which has been verified by the experiments. The ratios of OSC PSNR to the maximum are higher than 95% in 38 out of 40 experiments. Many of them are even higher than 99%. Although, the results of DIP are comparable to OSC, they are selected based on PSNR or human inspection. In all, we believe that OSC is an indispensable part of DIP-based single image inverse systems.
-  (2012) Low-complexity single-image super-resolution based on nonnegative neighbor embedding. Cited by: §3.2.
-  (2018) Nonlocality-reinforced convolutional neural networks for image denoising. IEEE Signal Processing Letters 25 (8), pp. 1216–1220. Cited by: §1.
-  (2019) Dynamic scene deblurring with parameter selective sharing and nested skip connections. In , pp. 3848–3856. Cited by: §1.
-  (2018) Fast and accurate single image super-resolution via information distillation network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 723–731. Cited by: §1.
-  (2018) Universal denoising networks: a novel cnn architecture for image denoising. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3204–3213. Cited by: §1.
-  (2018) Non-local recurrent network for image restoration. In Advances in Neural Information Processing Systems, pp. 1673–1682. Cited by: §1.
-  (2019) DeepRED: deep image prior powered by red. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 0–0. Cited by: §1, §1, §1.
-  (2018) Srfeat: single image super-resolution with feature discrimination. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 439–455. Cited by: §1.
-  (2018) Neural nearest neighbors networks. In Advances in Neural Information Processing Systems, pp. 1087–1098. Cited by: §1.
-  (2018) Scale-recurrent network for deep image deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8174–8182. Cited by: §1.
-  (2018) Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9446–9454. Cited by: §1, §1, §1, §1, §2, §3.
-  (2018) Deep image prior. In Submitted to IJCV, pp. 1–22. Cited by: §3.2, §3.
-  (2018) A fully progressive approach to single-image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 864–873. Cited by: §1.
-  (2019) Deep learning for single image super-resolution: a brief review. IEEE Transactions on Multimedia. Cited by: §1.
-  (2010) On single image scale-up using sparse-representations. In International conference on curves and surfaces, pp. 711–730. Cited by: §3.2.
-  (2018) FFDNet: toward a fast and flexible solution for cnn-based image denoising. IEEE Transactions on Image Processing 27 (9), pp. 4608–4622. Cited by: §1.
-  (2018) Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 286–301. Cited by: §1.