1 Introduction
Blind deconvolution is a fundamental problem in low level vision, and is always drawing research attentions [22, 20, 15, 14, 21]. Given a blurry image , blind deconvolution aims to recover a clear version , in which it is crucial to first estimate blur kernel successfully. Formally, the degradation of image blur is modeled as
(1) 
where and are with size , is with size , is the 2D convolution operator and is usually assumed as random Gaussian noises. Blind deconvolution needs to jointly estimate blur kernel and recover clear image .
The most successful blind deconvolution methods are based on the maximumaposterior (MAP) framework. MAP tries to jointly estimate and by maximizing the posterior , which can be further reformulated as an optimization on regularized least squares [2],
(2) 
where and are prior functions designed to prefer a sharp image and an ideal kernel, respectively. It is not trivial to solve the optimization problem in Eqn. (2), and instead it is usually addressed as alternate steps,
(3) 
and
(4) 
In the most blind deconvolution methods, kernel size is hyperparameters that should be manually set. An ideal choice is the ground truth size to constrain the support domain, which however is not available in practical applications, requiring handcrafted tuning.
On one hand, a smaller kernel size than ground truth cannot provide enough support domain for estimated blur kernel. Therefore, kernel size in the existing methods is usually predefined as a large value to guarantee support domain.
truth size = 23  size = 23, err = 1.9  size = 47, err = 5.9  size = 69, err = 69.9  
(a)  (b)  (c)  (d)  (e) 
On the other hand, as shown in Figure 1, oversized kernels are very likely to introduce estimation errors, and hence lead to unreasonable results. Hereby, we name this phenomenon largerkernel effect. This interesting fact was first mentioned by Fergus [9]. Then Cho and Lee [4] showed a similar result that the residual cost of (2) increases with overestimated kernel size. However, such annoying phenomenon was not well analyzed and studied yet. Note that most MAPbased blind deconvolution algorithms adopt the trialanderror strategy to tune kernel size, so the largerkernel effect is a very common problem.
In this paper, we first explore the mechanism of largerkernel effect and then propose a novel low rankbased regularization to relieve this adverse effect. Theoretically, we analyze the mechanism to introduce kernel estimation error in oversized kernel size. Specifically, we reformulate convolution of (3) and (4) to affine transformations and analyze their properties on kernel size. We show that for in sparse distributions, this largerkernel effect remains with probability one. We also conduct simulation experiments to show that kernel error is expected to increase with kernel size even without noise . Furthermore, we attempt to find out a proper regularization to suppress noise in large kernels. By exploiting the low rank property of blur kernels, we propose a lowrank regularization to reduce noises in , suppressing largerkernel effect. Experimental results on both synthetic and real blurry images validate the effectiveness of the proposed method, and show its robustness to against overestimated kernel size. Our contributions are twofolds:

We give a thorough analysis to mechanism of the phenomenon that overestimated kernel size yields inferior results in blind deconvolution, on which little research attention has been paid.

We propose a low rankbased regularization to effectively suppress largerkernel effect along with efficient optimization algorithm, and performs favorably on oversized blur kernel than stateofthearts.
2 Largerkernel effect
In this section, we describe the largerkernel effect in detail and provide a mathematical explanation.
2.1 Phenomenon
In Figure 1(bc), it has shown that the larger the kernel size would lead to more inferior deblurring results, since the estimated blur kernel with larger support domain is very likely to introduce noises and estimation errors. Figure 1(a) shows both the error ratio (err) [17] of restored images and the Summed Squared Difference (SSD) of estimated kernels reach the lowest at the truth size and increase afterwards.
2.2 Mechanism
To analyze the source of largerkernel effect, we firstly introduce an interesting fact that we call inflating effect.
Claim 1.
(Inflating Effect) Let , where . Let , where and
. Given an mD random vector
whose elements are i.i.d. with the continuous probability density function p, for
Proof.
where .
For , we have . Hence, the Lebesgue measure of is zero, and the probability is zero. ∎
Claim 1 shows that padding linear independent columns to a thin matrix leads to a different least squares solution with lower residue squared cost.
The convolution part in (1
) is equivalent to linear transforms:
(5) 
where italic letters , and represent columnwise expanded vectors of 2D , and , respectively; and are blocked banded Toeplitz matrices [1, 11]; and
are required to be odd.
We attribute the largerkernel effect to either substep (3) or (4). On one hand, remains identical when and increase by wrapping a layer of zeros around and the result of xstep keeps the same. Hence, xstep should not be blamed as the source of the largerkernel effect. On the other hand, when is larger, will become inflated for the same . In 1D cases, where , assume , then
(6)  
During blind deconvolution iterations, for identical values of , a larger introduces more columns onto both sizes of and results in different solutions. To illustrate this point, we tested a 1D version of blind deconvolution without kernel regularization and took different values of (truth and double and four times the truth size) for the 50th kstep optimization after 49 truthsize iterations (see Figure 2). Figure 2(ac) show that the optimal solutions in different sizes differ slightly on the main body that lies within the ground truth size (colored in red), but greatly outside this range (colored in green) where zeros are expected. Figure 2(df) compare ground truth to estimated kernels in (ac) after nonnegativity and sumtoone projections. Larger sizes yield more positive noises; hence, they lower the weight of the main body after projections and change the outlook of estimated kernel.



(a)  (d)  
Estimated kernels 


(b)  (e)  
(c)  (f)  
index  index  

2.3 Probability of largerkernel effect
Even if successfully iterates to truth , Claim 1 implicates the largerkernel effect remains under the existence of random noise . We show
(7) 
under which, the inflating effect holds for probability one in blind deconvolution.
Above all, we have
(8)  
Kaltofen and Lobo [13] proved that for an MbyM Toeplitz matrix composed of finite filed of elements,
(9) 
Herein, clear images are statistically sparse on derivative fields [19, 27], and elements of are modeled to be continuous in hyperLaplacian distributions [14]:
(10) 
Then we get the following claim:
Claim 2.
Proof.
See supplementary file. ∎
To now, we have shown that for in sparse distribution, the inflating effect happens almost surely.
2.4 Quantification of error increment
Assume iterates to ground truth during iterations. Then, for estimated kernel , we have
(11) 
where represents MoorePenrose pseudoinverse. Then,
(12) 
Assume , then
(13) 
where and represents the smallest and the greatest singular values, respectively.
The inflating effect implicates that a larger kernel size amplifies the error in due to noise . To quantify this increment, we extracted a line from a clear image in Levin’s set [17] as shown in Figure 3(a), and plotted and with increasing kernel size . We also generated normalized random Gaussian vectors and compared to simulated boundaries of singular values (see Figure 3(b)). The error in increases hyperlinearly with kernel size.
In practice, nuances are expected between and . Cho and Lee [4] indicated that should be regarded as a sparse approximation to , not the ground truth. Hence,
(14) 
which yields implicit noise [25]. Assume , then,
(15) 
and
(16)  
Then,
(17) 
To quantify how singular values of changes with kernel size, we simulated 100 times, in each of which we generated a stochastic sparse signal with length 254 under PDF in (10) with , and , and generated random Gaussian vector where . Figure 3(c) shows one example of generated and . Figure 3
(d) shows means and standard deviations of
, and , which is the average of singular values, of simulated on . The error of is expected to grow with kernel size even .3 Lowrank regularization
Blind deconvolution is an illposed problem for lacking sufficient information. Without regularization, MAP degrades to Maximum Likelihood (ML), which yields infinite solutions [17]. As prior information, kernel regularization should be designed to compensate the shortage of ML and to guide the optimization to expected results. Great amount of studies focus on image regularization to describe natural images, , Total Variation (TV) [16, 26, 23], hyperLaplacian [14], dictionary sparsity [30, 12], patchbased low rank prior [24], nonlocal similarity [5] and deep discriminative prior [18].
Unfortunately, kernel optimization doesn’t attract much attention of the literature. Previous works adopted various kernel regularizations, e.g., norm [28, 10, 3, 29, 21], norm [15, 25, 20] and norm [31], which, however, generally treated kernel regularization as an accessory and lacked a detailed discussion.
The largerkernel effect is yielded by noise in ultrasized kernels. Figure 1 and Figure 2 show that without kernel regularization, the main bodies of estimated kernels can emerge clearly, but increasing noises take greater amounts when is larger. To constrain to be clean, regularization is expected to distinguish noise from ideal kernels efficiently.
To suppress the noise in estimated kernels, we take lowrank regularization on such that kstep (4) becomes
(18) 
Because the direct rank optimization is an NPhard problem, continuous proximal functions are required. Fazel [8] proposed
(19) 
as a heuristic proxy for
whereis the NbyN identity matrix and
is a small positive number.To allow this approximation to play a role in general matrices, the lowrank object is substituted to [6]. The regularization function then becomes
(20) 
where is the th singular value of .
Taking lowrank regularization on kernels is motivated by a generic phenomenon of noise matrices [1]. Figure 4(ab) shows a nonnegative Gaussian noise matrix and its singular values in decreasing order. For a noise matrix, where light and darkness alternate irregularly, the distribution of singular values decays sharply at lower indices; then, it breaks and drag a relatively long and flat tail to the last. In contrast, ideal kernels respond much lower to regularization (see Figure 4(c)). Based on this fact, noise matrices are distinguished by high cost from real kernels. Figure 4(d) shows that singular values of a lowrank regularized kernel are distributed similarly as the ground truth, compared with the impure one.
One intelligible explanation on the lowrank property of ideal kernels is the continuity of blur motions. Rank of a matrix equals the number of independent rows or columns; it reversely reflects how similar these rows or columns are. Speed of a camera motion is deemed to be continuous [7]. Hence, the local trajectory of a blur kernel emerges similar to neighbor pixels, which is measured in a low value by the continuous proxy of rank.
Compared to previous norms, lowrank regularization responds more efficiently to noise. To illustrate this point, we generated a noisy kernel by adding a small percentage () of nonnegative Gaussian noise and of the real kernel. Figure 5 shows that the lowrank cost rapidly adjust favorably to the noise but norms fail. That is because only takes statistical information. An extreme example consists of disrupting a truth kernel and randomly reorganizing its elements, with cost unchanged. In contrast, rank (singular values) corresponds to structural information.
4 Optimization
Function is nonconvex (and it is actually concave on ). To solve the lowrank regularized least squares (4), we introduce an auxiliary variable and reformulate the optimization into
(21)  
s.t. 
Using the Lagrange method, (21) is solved by two alternate suboptimizations
(22) 
where is the iteration number while and are tradeoff parameters.
The substep is convex and accomplished using the Conjugate Gradient (CG) method. For substep, low rank is adopted with limit; otherwise, the regularization may change the main body of kernel—an extreme result is . Thus, our strategy is to lower the rank at locally. Using the firstorder Taylor expansion of at fixed matrix :
(23) 
where is the
th eigenvalue of
, the ksubstep in (22) is transformed into an iterative optimization(24) 
where is the inner iteration number. For convenience, we set as a flag (if , the ksubstep will be skipped) and only tuned as the tradeoff parameter.
Define the proximal mapping of function as follows:
(25) 
Dong [6] proved that one solution to the proximal mapping of is
(26) 
where is SVD of , and . Local lowrank optimization is implemented as iterations via the given parameter (see Algorithm 1). In our implementation, is designed to exponentially grow with to allow more freedom of for early iterations.
5 Experimental Results
In this section, we first discuss the effects of low rankbased regularization, then evaluate the proposed method on benchmark datasets, and finally demonstrate its effectiveness on realworld blurry images. The source code is available at https://github.com/lisiyaoATbnu/low_rank_kernel.
size=23, err=1.55  size=47, err=1.56  size=69, err=2.14 
Blurry  Low rank  None 
(a)  (b) 
5.1 Effects of low rankbased regularization
Corresponding to high error ratios of large kernels in Figure 1, we repeat the experiment using same parameters except and . Figure 6 shows lowrank regularized kernels are much more robust to kernel size. Noises in kernels are efficiently reduced and qualities of restored images are enhanced. We further verify it on realworld images by imposing different regularization terms. As in Figure 7, blur kernels with lowrank regularization have less noises, while the others suffer from strong noises, yielding artifacts in the deblurring images. We note that in experiments of Figure 6 and Figure 7, we deliberately omitted multiscaling scheme to expose the effectiveness of lowrank regularization itself.
5.2 Evaluation on synthetic dataset
The proposed method is quantitatively evaluated on dataset from [17]. Figure 8 shows the success rates of stateoftheart methods versus our implementations with and without (set and zero) lowrank regularization. The average PSNRs in Figure 8 with different sizes are compared in Table 1. Parameters are fixed during the whole experiment: , , , , and ; a 7layer multiscaling pyramid is taken. Kernel elements smaller than 1/20 of the maximum are cut to zero, which is also taken in [3, 14]. Lowrank regularization works more effectively than the regularizationfree implementation and the stateofart.
Method  prior  truth size  double size 

[22]  –  27.34  23.29 
[3]  26.85  25.74  
[28]  26.91  26.71  
[25]  26.54  26.44  
[15]  25.34  23.95  
[31]  26.58  26.83  
–  26.68  23.85  
27.36  27.47 
5.3 Evaluation on realworld blurry images
We compared our implementation to stateoftheart methods on realworld images to reveal the robustness of low rank regularization on large kernel size. Specifically, [28] takes a heuristic iterative support domain detector based on the differences of elements of , which is regarded to be more effective than 1/20 threshold. Figure 9 shows that size yields strong noises in estimated kernels of previous works [3, 28], and even changes main bodies of kernels [15, 31]. In contrast, low rank regularization can keep the kernel relatively stable for the larger size. One more comparison of different regularizations and refinement methods on large kernel size are shown in Figure 10. As for computational efficiency of our method, it takes about 85s on a Lenovo ThinkCentre computer with Core i7 processor to process images with size .
6 Conclusion
In this paper, we demonstrate that overestimated kernel sizes produce increased noises in estimated kernel. We attribute the largerkernel effect to the inflating effect. To reduce this effect, we propose a lowrank based regularization on kernel, which could suppress noise while remaining restored main body of optimized kernel.
The success of blind deconvolution is contributed by many aspects. In practical implementations, even for noisefree , the intermediate is unlikely to iterate to ground truth, hence some parts of will be treated as implicit noises, which may intensify the effect even more than expected and require future researches.
Acknowledgement
This work is supported by the grants from the National Natural Science Foundation of China (61472043) and the National Key R&D program of China (2017YFC1502505). We thank Ping Guo for constructive conversation. Qian Yin is the corresponding author.
References
 [1] H. C. Andrews and B. R. Hunt. Digital image restoration, chapter 5.2, pages 102–103. PrenticeHall, Englewood Cliffs, NJ, 1977.
 [2] T. F. Chan and C.K. Wong. Total variation blind deconvolution. IEEE Trans. Image Process., 7(3):370–375, 1998.
 [3] S. Cho and S. Lee. Fast motion deblurring. ACM Trans. Graph., 28(5):145, 2009.
 [4] S. Cho and S. Lee. Convergence analysis of map based blur kernel estimation. arXiv preprint arXiv:1611.07752, 2016.

[5]
W. Dong, G. Shi, and X. Li.
Nonlocal image restoration with bilateral variance estimation: a lowrank approach.
IEEE Transactions on Image Processing, 22(2):700–711, 2013.  [6] W. Dong, G. Shi, X. Li, Y. Ma, and F. Huang. Compressive sensing via nonlocal lowrank regularization. IEEE Trans. Image Process., 23(8):3618–3632, 2014.
 [7] L. Fang, H. Liu, F. Wu, X. Sun, and H. Li. Separable kernel for image deblurring. In CVPR, pages 2885–2892. IEEE, 2014.
 [8] M. Fazel, H. Hindi, and S. P. Boyd. Logdet heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices. In American Control Conf. (ACC), volume 3, pages 2156–2162, 2003.
 [9] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T. Freeman. Removing camera shake from a single photograph. In ACM Trans. Graph., volume 25, pages 787–794, 2006.
 [10] D. Gong, M. Tan, Y. Zhang, A. Van den Hengel, and Q. Shi. Blind image deconvolution by automatic gradient activation. In CVPR, pages 1827–1836, 2016.
 [11] R. M. Gray. Toeplitz and circulant matrices: A review. Foundations and Trends in Communication and Information Theory, 2(3):155–239, 2006.
 [12] Z. Hu, J.B. Huang, and M.H. Yang. Single image deblurring with adaptive dictionary learning. In ICIP, pages 1169–1172. IEEE, 2010.
 [13] E. Kaltofen and A. Lobo. On rank properties of toeplitz matrices over finite fields. In Int. Symp. Symbolic and Algebraic Computation (ISSAC), pages 241–249, 1996.
 [14] D. Krishnan and R. Fergus. Fast image deconvolution using hyperlaplacian priors. In NIPS, pages 1033–1041, 2009.
 [15] D. Krishnan, T. Tay, and R. Fergus. Blind deconvolution using a normalized sparsity measure. In CVPR, pages 233–240, 2011.
 [16] A. Levin, R. Fergus, F. Durand, and W. T. Freeman. Image and depth from a conventional camera with a coded aperture. ACM Trans. Graph., 26(3):70, 2007.
 [17] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman. Understanding and evaluating blind deconvolution algorithms. In CVPR, pages 1964–1971, 2009.
 [18] L. Li, J. Pan, W.S. Lai, C. Gao, N. Sang, and M.H. Yang. Learning a discriminative prior for blind image deblurring. In CVPR, pages 6616–6625. IEEE, 2018.
 [19] B. A. Olshausen and D. J. Field. Emergence of simplecell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607, 1996.

[20]
J. Pan, Z. Lin, Z. Su, and M.H. Yang.
Robust kernel estimation with outliers handling for image deblurring.
In CVPR, pages 2800–2808, 2016.  [21] J. Pan, D. Sun, H. Pfister, and M.H. Yang. Blind image deblurring using dark channel prior. In CVPR, pages 1628–1636, 2016.
 [22] D. Perrone and P. Favaro. Total variation blind deconvolution: The devil is in the details. In CVPR, pages 2909–2916, 2014.
 [23] D. Ren, H. Zhang, D. Zhang, and W. Zuo. Fast totalvariation based image restoration based on derivative augmented lagrangian method. Neurocomputing, 2015.
 [24] W. Ren, X. Cao, J. Pan, X. Guo, W. Zuo, and M.H. Yang. Image deblurring via enhanced lowrank prior. IEEE Transactions on Image Processing, 25(7):3426–3437, 2016.
 [25] Q. Shan, J. Jia, and A. Agarwala. Highquality motion deblurring from a single image. ACM Trans. Graph., 27(3):73, 2008.
 [26] Y. Wang, J. Yang, W. Yin, and Y. Zhang. A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences, 1(3):248–272, 2008.
 [27] Y. Weiss and W. T. Freeman. What makes a good model of natural images? In CVPR, pages 1–8, 2007.
 [28] L. Xu and J. Jia. Twophase kernel estimation for robust motion deblurring. In ECCV, pages 157–170, 2010.
 [29] L. Xu, S. Zheng, and J. Jia. Unnatural l0 sparse representation for natural image deblurring. In CVPR, pages 1107–1114, 2013.
 [30] X. Zhang, M. Burger, X. Bresson, and S. Osher. Bregmanized nonlocal regularization for deconvolution and sparse reconstruction. SIAM Journal on Imaging Sciences, 3(3):253–276, 2010.
 [31] W. Zuo, D. Ren, S. Gu, L. Lin, and L. Zhang. Discriminative learning of iterationwise priors for blind deconvolution. In CVPR, pages 3232–3240, 2015.
Supplementary File: Proof to Theorem 2
Assume to be odd and (). Then,
(27)  
(28) 
For any by matrix A, i.f.f. . Thus,
(29) 
As we know, the explicit formula of determinant of a Toeplitz matrix on its elements is unsolved in the current literature. Li [1] gives a concrete expression of by using LU factorization but fails to fit all situations ( when ). However, it can be shown that equals a multivariate polynomial function without manipulating the whole expression. By using Laplace expansion on , the item of largest degree is with factor 1.
Lemma.
Let be a continuous r.v. in the finite support domain [a, b]. Let be a polynomial function
where is a finite polynomial function with the largest degree less than . Generate a new r.v.
Then, for
, the Cumulative Distribution Function (CDF)
is continuous at y.Proof.
where .
For ,
and
where .
Based on Beppo Levi’s Theorem,
Because ( is a constant), for , zeros of are finite, hence the Lebesgue measure of is zero. We have
Thus
∎
Theorem 2.
Let be a continuous r.v. with PDF
For a sample of independent observations , generate a new r.v.
Then,
Proof.
is a polynomial function with the largest degree less than . Based on Lemma, we have
Hence,
∎
References
 [1] H. Li. On calculating the determinants of toeplitz matrices.J. Appl. Math. Bioinformatics, 1(1):55, 2011.
Comments
There are no comments yet.