I Introduction
Computer vision problems are often addressed by using mathematical models. The quality of the solutions to the problems are measured objectively in the mathematical models. With the valid mathematical models, we can elucidate the phenomenon of the natural computations, e.g. by human vision, that try to accomplish the tasks, and we can reproduce the result of the natural computations by computerized simulations.
An objective measure of piecewise smooth image segmentation is the MumfordShah functional energy [1]. By minimizing the energy, we expect to achieve high quality of segmentation for piecewise smooth images. The problem of minimization of the MumfordShah functional energy is the mathematical abstraction, i.e. the model, of piecewise smooth image segmentation. The model is now known as the MumfordShah model. This methodology is different from that of those in [2] [3] which target at the objective evaluation of the segmentation based on the groundtruth results from normal subjects.
In MumfordShah model, each image region is modeled as a smooth or constant function with parameters. By minimizing the energy of the MumfordShah model, we can restore the smooth image in each region, and we can also obtain the partition boundaries of the segmentation located at the discontinuities in the restored image. However, the image values to be restored may not be piecewise smooth or flat as a whole. The images may contain regions of nonsmooth structures. Such as the images in Fig. 1(a). Imposing the smoothness on these images leads to the destructive averaging of the image content. This poses problem to the segmentation with the conventional MumfordShah model. For example, it is possible that nonsmooth visual patterns different in structure may have similar average image values (See Fig. 1(a)). Consequently, the MumfordShah model cannot separate such patterns in the image space.

To cope with nonsmooth image data, the segmentation has been considered as a framework of two independent processes, i.e. the feature extraction and the segmentation of the feature image. There exist combinations of the MumfordShah model or other active contour model with predefined features [4] [5] [6]. The predefined features does not necessarily match the underlying nonsmooth image structures. For example, in Fig. 1
(b) the feature values obtained by image filtering with the Gabor filters that well match the texture pattern, or the local averages of the feature values, can have significant spatial variations. Unsupervised feature selection has also been used
[7] [8] [9]. The major problem concerns us is that these frameworks, formed by separate feature selection and segmentation, do not provide a unified optimization model for segmentation. These concerns motivate us to explore a unified and valid mathematical model for segmentation of nonsmooth structures.In this paper, we propose a novel unified energy minimization model of the segmentation of general nonsmooth image structures, motivated by the original MumfordShah model. We formulate the segmentation of general nonsmooth image structures as a single energy minimization problem of piecewise linear patch reconstruction. The formulation is not heuristic since we prove that the error of segmentation is bounded by the energy of piecewise linear patch reconstruction for a fixed number of bases. Thus, minimizing the energy of piecewise linear patch reconstruction can reduce the error of segmentation. Regarding the energy minimization, we prove that the eigenpatches constitute the global optimal solution to the minimization of the error of linear patch reconstruction. We also explore a more efficient alternative to the eigendecomposition of matrix for computing eigenvectors. We prove that the linear patch reconstruction based on gradient descent converges to the eigenpatches under some mild assumption on the initializations. The assumption can often be met, and the convergence is linear. We therefore propose the gradient descent algorithm for solving the linear patch reconstruction problem even though the problem is not convex. The segmentation algorithm is based on alternating piecewise linear patch reconstruction and curve evolution. The piecewise linear patch reconstruction can naturally be coupled with the conventional MumfordShah model for coping with both smooth and nonsmooth structures. A unique feature of the method is that it produces both the segmentation and a dictionary of optimized (orthonormal) descriptors for each segmented region upon completion in the same optimization framework.
The rest of the paper is organized as follows. We reviewed the related works in section II. We study the PC/PS model and propose the piecewise linear patch reconstruction model in sections IIIIV. We show our error bound of the segmentation in terms of reconstruction error in section IVB and we present our proved theoretical claim of the global optimality of the gradient descent for the nonconvex linear patch reconstruction problem in section IVC. The experimental results are presented in section V. We conclude the paper in section VI.
Ii Background and related works
Iia The MumfordShah model
The segmentation of piecewise smooth images has been formulated as an energy minimization problem in the MumfordShah model. For an image defined over the image domain , the twophase MumfordShah model can be formulated as follows.
(1) 
where is a Dirac delta function, is a Heaviside (step) function, is a penalty coefficient. and
are the reconstruction estimates,
is the signed distance function that partitions the image into two regions, i.e. for one region, for the other. and are the reconstruction errors of the two region models. assigns either of the two models to all the pixels, and the last term tries to minimize the complexity of the labeling. For piecewise smooth (PS) model, we have , , where is the image value at , is a constant penalty coefficient. Note that, , and are all functions defined on . We will omit behind these functions henceforth if there is no risk of confusion. The fundamental optimization technique for the reconstruction is the Green’s functions solution to the Eulerlagrange equation obtained by Calculus of Variations, which was presented in [10]. The numerical schemes for solving the Eulerlagrange equation can also be found in [11] [12]. The smoothness regularization guarantees the global optimality of the solution obtained by any of the optimization methods for a given partition.If we assume the reconstruction functions , to be constants, i.e. for all in , and , then the gradient terms in the PS model vanish, and the functional becomes the piecewise constant (PC) MumfordShah model, which is the prototype of the region competition [13] and the ChanVese model [14].
For minimizing the MumfordShah functional, the algorithm is often the alternating (or simultaneous) implementation of segmentation and reconstruction. The reconstruction is achieved by image smoothing within each region of segmentation. The level set method is often used in [14] for the segmentation. In recent years, the linear relaxation [15] and the convex relaxation [16] of the MumfordShah model have been proposed, which lead to alternatives to the level set method.
IiB Regionbased active contours for unsupervised texture segmentation
The original the MumfordShah model was formulated for segmentation of piecewise smooth images. Later the model was extended for image texture segmentation. In [17] [7] [8], the Gabor filtering was incorporated in the MumfordShah model. In [17], Lee et al. embedded the manually selected 24 Gabor filters into the PS MumfordShah model. In [7], Sandberg et al. adopted the ChanVese model as well as the maximum difference of feature means as the criterion for Gabor filter selection. Sagiv et al. [8] adopted the framework in [7] with manually selected filters. These frameworks require the textures to be piecewise smooth or flat in the feature space, but the filter selection for unsupervised segmentation cannot ensure this. Although the adhoc postprocessing of high dimensional anisotropic diffusion may be applied to ensure the piecewise smoothness, the reasons behind this remain obscure. Kokkinos et al. [9] proposed to apply the Region Competition model to a type of modulation features. This framework assumes that the textures are globally oscillating, as on zebras and tigers, and it still requires filter selection by dominant component analysis (DCA) for parametric texture modeling. However, it is actually unknown what kind of feature or the principle of feature selection can help the segmentation by MumfordShah model without supervision. Therefore, the existing combinations of the feature extraction and the MumfordShah model is heuristic and they do not provide a unified mathematical model for the segmentation.
There exist other frameworks of texture segmentation based on other similar active contour models with fixed texture feature, e.g. [6] [4] and [5]. For example, in [4] and [5]
the structure tensor has been chosen as the texture feature, and in
[6] the histogram of image values on overlapping patches is chosen as the feature. These frameworks are more likely to form a unified theory of the segmentation. However, the choice of the structure tensor feature or local histogram has not been validated for general textures.Iii The principle of segmentation behind the MumfordShah model
In this section, we study the segmentation by MumfordShah model to understand the rationale of this model for segmentation. We restricted ourselves to the twophase model. The generalization of the twophase framework to multiphase may follow [18], which is out of the scope here.
Let us consider the twophase MumfordShah model in a simplified form as follows.
(2) 
where and are the reconstruction errors corresponding to the subregions and , such that . Since we focus on the region model in this work, we omit discussing about the prior term of arclength for imposing smoothness. Note that, the smoothness term is important for dealing with noisy data. For more detailed discussions on the smoothness term, we refer the readers to [1] [13].
In the PC model, , , where is the image value, are the regional means of the image value. In PS model, the , , where are the smooth restoration of the two image regions (due to the smoothness constraint). The smoothness regularization terms, and are often omitted if and are solved by the normalized Gaussian convolution, such as in [19], which was interpreted as nonparametric regression [12]. Regarding the minimization of this simplified functional (2), we have the following interesting fact upon fixing the error functions and .
Proposition III.1
(3) 
where and is a constant independent of .
This fact tells that optimizing the global assignment of is equivalent to optimizing the assignment locally. The assignment rule for determining the optimal at each pixel location is therefore the following.
(4) 
The proof is included in the appendix in a separate report. As in the interpretation in [13], MumfordShah model tries to achieve clustering based on the measure of the membership defined by . Hence, the proper choice of the measure is essential to the segmentation by MumfordShah functional.
Iv Piecewise linear patch reconstruction
In what follows, we establish the mathematical model and the associated solution for the segmentation of nonsmooth image structures.
Iva The model of piecewise linear patch reconstruction
In the original MumfordShah model, each region is modeled as a piecewise smooth surface that minimizes the functional energy as follows.
(5) 
where ,
is a penalty coefficient. A major reason for this formulation is that the energy can be minimized by solving standard partial differential equations, i.e. the diffusion equation. Besides, the energy is convex. Hence, if the local optimal solution exists it is also global optimal. The problem is that this formulation imposes the smoothness.
To cope with nonsmooth image structure, we consider the image patches as linear combination of patch bases, such as in many image appearance models, as follows.
(6) 
with the varying weights for different patches over the image, where is a set of orthonormal bases that fully reconstruct the patch. where and is a square region centered at .
It is obvious that the exact reconstruction can happen regardless of smoothness of the image patch. To find the bases, we may formulate the reconstruction as energy minimization as follows.
(7) 
Replacing the image reconstruction error in simplified twophase MumfordShah functional (3) with the patch reconstruction error formulated above, we obtain the formulation of piecewise linear patch reconstruction as follows.
(8) 
where
(9) 
for , where is the size of the patch. From the above, we can note that the error can be computed by convolutions.
From the assignment rule in (4), we know that the patches will be assigned to a region if the set of bases of this region produces the patch reconstruction error smaller than that of the other regions.
IvB The error of segmentation by piecewise linear patch reconstruction
In the above, we have shown that the piecewise linear patch reconstruction is capable of determining the assignment of the patches according to the patch reconstruction error, but we did not answer whether the assignment is correct. In the following, we establish the theoretical foundation of the proposed formulation. Specifically, we show that the error of the assignment, i.e. the segmentation error, is bounded by the patch reconstruction error for a fixed number of, say , bases. Thus, by minimizing the error of reconstruction, we may reduce the error of segmentation.
The assignment rule of (4) enables us to assess the probabilistic analysis of the correctness of the segmentation. We ask whether the segmentation by (4) is consistent with the truth. Specifically, we wish to know if , whether ; and if , whether , where is the true partition. means the set of patches that have their centers in , where . This requires us to analyze the following segmentation error rate.
(10) 
where , are the sizes of the sets and .
The segmentation error rate can be represented by probability, i.e.
for sufficiently large population of and , where is the probability of certain events due to , is the probability of the events due to . We hope the error rate to be small. For our patch reconstruction model, we can deduce the following error bound for segmentation.Proposition IV.1
If the subregion , are sufficiently large, such that , , then the following holds.
(11) 
where is the patch norm, , , , are constants, and , . Besides, the denominator in the RHS of the bound is positive.
The proof is included in the appendix in a separate report. The above error bound guarantees in theory that the segmentation error rate due to the assignment rule (4) can be decreased by minimizing the reconstruction error with respect to a fixed number of bases. The number of bases will definitely smaller than the number of dimensions of the the patch, since the denominator in the RHS of the bound is positive.
To conclude, we showed that the segmentation error rate is upper bounded by the total patch reconstruction error for a fixed . The minimization of the patch reconstruction error can therefore minimize the segmentation error.
IvC Global optimal linear patch reconstruction
In what follows, we address the optimal linear patch reconstruction.
(12) 
where , , and , where is the width of a patch, and we consider square patch in this paper.
A useful identity regarding this formulation is the following.
(13) 
where
according to Eq. (9), and , are orthonormal. This identity can be verified by expanding the squared error.
Before we try to solve the reconstruction problem (12) (or (13) equivalently), we also note that the optimization problem defined in Eq. (12) is a problem of minimizing constrained concave function. The formal statements with their proofs are included in the appendix in a separate report. Such a problem is known to have local optimal solutions [20]. However, we are able to show that the global optimal solution to the linear patch reconstruction problem is attainable. The key result regarding the optimality is the following theorem.
Theorem IV.2
Given a set of functions defined as follows,
(14) 
where are all the eigenvectors of , is defined by the following.
(15) 
where , then the following bound is true.
(16) 
where is defined in Eq. (13), are the first eigenvalues of .
The proof is included in the appendix in a separate report. The above suggests the matrix eigendecomposition as our solution to the global optimal reconstruction. However, the eigendecomposition typically requires converting the image to patches, computing the covariance matrix followed by Singular Value Decomposition (SVD). In our segmentation framework, we require the optimal reconstruction iteratively during the segmentation. Thus, the eigendecomposition of the matrix obtained from the extracted and labeled patches can be timestorage consuming. Alternatively, we suggest the gradient descent as the solution. The gradient descent equation to minimize the error (
12) is the following.(17) 
The reasons for this choice are the following.
Theorem IV.3
Given the following eigenvalue problem.
(18) 
where is the eigenvalue, is the eigenvector and , and if there are finitely many, say , eigenvectors of , the following gradient descent procedure for solving (12) (or (13) equivalently) converges to the global optimal solution of (12), with the initial bases for any .
(19) 
where for are the solved patch bases.
The proof is included in the appendix in a separate report. Moreover, the convergence of the gradient descent is linear as claimed in the following corollary.
Corollary IV.4
If there are eigenvectors of , and the eigenvalues are ordered such that , the rate of convergence for the gradient descent iteration in each step of the greedy procedure defined by Eq. (19) does not exceed . In other words, the procedure converges linearly.
The proof is included in the appendix in a separate report. Besides, the gradient descent does not require computing the matrix or the matrix eigendecomposition but only the convolutions.
IvD Coupling with piecewise smooth model
It is easy to find two patches, which can be reconstructed equally well (yielding the same residue) by the same set of bases, while having significant difference in intensity. Hence, the proposed linear patch reconstruction model cannot be used for differentiating the image patches that are different only in their illumination and color but similar in their local structure. To cope with both the cases via the same model, we propose to combine our model of linear patch reconstruction with the original piecewisesmooth MumfordShah model to form an integrated functional model of segmentation as follows.
(20) 
where , for and is a predefined weight. The last term penalizes the complexity of the segmentation.
The numerical solution of to the energy minimization model can be derived from the level set method, in which a signed distance function is used to generate through Heaviside function, i.e. .
(21) 
By this equation, it is implied that the reconstruction errors and are fixed when implementing curve evolution. The energy in (20) is minimized by alternatively updating the image partition by solving and updating the reconstruction error in each region by solving and . This energy minimization procedure is in fact the gradient descent method for the separate variables. Hence, the convergence of the procedure is guaranteed.
V Experiments
Va Data preparation and implementation details
Natural textures are examples of the nonsmooth structured visual patterns. Evidences show that the texture patches can be reconstructed (modeled or represented) well by patch subspaces, i.e. the eigenfilters, as reported in [21] where the texture classification was investigated more carefully. Therefore, we mainly evaluate our method of segmentation by piecewise linear patch reconstruction on a subset of Brodatz textures [22]. We empirically choose the textures that appear relatively spatially regular with similar size of texture stimuli (textons) for evaluation. Therefore, we obtain a collection of textures: D3, D56, D1522, D24, D3436, D49, D5253, D55, D57, D65, D68, D7677, D79, D8185, D101106, which we call the set . Afterwards, we generate two sets of mosaic texture images by paring all different the textures from for evaluation our segmentation method. Each set contains 1260 images. One of the sets is made by the original textures, the other is by the textures with the mean intensity subtracted. The template for paring the textures is shown in Figure 2(a). We also use this template as the ground truth for evaluating the segmentation. Both the sets are challenging for segmentation, especially the second one.
Our segmentation algorithm for minimizing the functional energy (20) can be implemented by alternating an algorithm of image partitioning and the patch reconstruction via either the SVD or the gradient descent procedure in (19) . We adopt the curve evolution governed by Eq. (21) as the image partitioning algorithm in the implementation. All the methods in our experiments are based on the curve evolution for fair comparison. For piecewise smooth image we may choose a small penalty coefficient for the contour length, e.g. in our implementation, for nonsmooth image we require a large penalty of the contour length for coping with the randomness, e.g. in our implementation. We use the maximum number of iterations to detect the convergence of the curve evolution algorithms. The maximum iteration number is set to be 600, since it is observed that the curve evolutions in the experiment converge before this iteration number is reached. The convergence of the gradient descent method for patch reconstruction is fast. We set the maximum iteration number to be 5.
VB Evaluation of the gradient descent patch reconstruction
In this subsection, we evaluate the proposed gradient descent method for computing the optimal bases. The principle is to compare the error of reconstruction by gradient descent with the error of reconstruction by SVD. We evaluate the gradient descent method for linear patch reconstruction on the Set and Yale face database. We compare the reconstruction errors according to the first 120 optimal bases produced by the gradient descent method with the errors according to the bases produced by SVD. The reconstruction errors can be computed by evaluating (12) for the entire image domain. We also present the comparison of the reconstructions by orthogonalized Gabor filter and the SVD solution. We apply the principle of maximum filtering response to select the first Gabor filters from a filter bank of filters. We then orthogonalize the selected filters to compute the energy. We compare the averaged errors but not the total error. The total error is the sum of the reconstruction errors for all the patches in the image. The averaged error is the total error divided by the number of patches.
The results are shown in Figure 3. When visualizing the comparison of the errors, we divide the errors by the value of the maximum averaged errors to form a normalized error. The results by gradient descent are very close to that of the SVD, while the results of Gabor filters do not match the SVD. The initial bases for the gradient descent method are randomly generated.
VC Evaluation of the segmentation by linear patch reconstruction
We mainly evaluate the segmentation of our methods, i.e. the SVD and GD based methods, on the two subsets of mosaic textures introduced previously. We compare our methods with the PS MumfordShah model [19] [12], the RegionScalable Fitting (RSF), a.k.a. the Local Binary Fitting (LBF) [23], the Gabor filtering based method [7] which is also adopted in [8] and the local histogram based ChanVese model [6], which we denote as HistPC henceforth. The Gabor filtering based method could be viewed as a baseline approach for texture segmentation, and the local histogrambased method is the stateoftheart approach. We use 8 bases for each region in SVD and GD. We choose the 8 Gabor filters from a bank of 24 filters for each region according to the criterion of maximum filtering response. This filter selection criterion appears like model fitting [24]. The patch size is . The methods are all based on curve evolution. We used a common initial curve for the curve evolutions. The initial curve in the image domain is shown in Figure 2(b). Note that the initial contour crosses the true boundary of the two regions, and the converged contour is expected to outline the region of texture B on the right.
We adopt the pixelwise segmentation error rate to measure the quality of the segmentation by using the ground truth. The boxplot in Figure 4 shows the segmentation errors for all the methods. We can observe a clear lower error rate by our methods for differentiating the textures that differs only in their structures but not in their intensities. The results by GD and SVD are comparable, but the computational time for the segmentation based on SVD is 0.145 0.048 seconds per iteration for the two datasets, while the computational time for the segmentation based on GD is 0.137 0.006 seconds. This suggests us to use the GD based bases updating scheme for segmentation. The mean error by the local histogram based ChanVese active contour for the original textures is small, but the variation of its performance is large. Besides, this local histogram based method is still ineffective for differentiating the different textures having the same mean intensity. We can also observe from the results that Gabor features can deal with textures when the textures differ in their intensities. However, when there is little difference in the intensities of textures, the Gabor features are still powerless. We summarize the quantities in Table I. Besides, we visualize the curve evolutions, as well as the corresponding converged patch bases, for two textured images and a picture from the Berkeley dataset [25] in Figure 5. The picture is composed of different structures, and the piecewise linear patch reconstruction is capable of differentiating the nonsmooth structure from the piecewise smooth structure. We may observe that the descriptors are semantic. The results by other models are shown in Fig. 6.
Our methods  Others’  

Data  SVD  GD  LBF  PS  HistPC  Gabor  
Error(%)  Original  12.5610.10  11.6611.15  42.78 10.30  40.0010.52  9.9615.85  33.3812.94 
No mean  14.7610.44  13.3311.82  45.64 12.87  43.8113.50  24.1221.56  42.688.02 
VD Segmentation by the coupled model with parameter tuning
To cope with both smooth and nonsmooth contents, we may use the coupled model proposed previously. The quality of the result of the segmentation by the proposed coupled model depends on the value of the parameter . It is obvious that the coupled model tends to be the conventional MumfordShah model if , and the model tends to be the pure piecewise linear patch reconstruction if . Fig. 8 shows the results of applying the pure piecewise linear patch reconstruction to piecewise smooth images (with noise). We can observe that the optimal bases of the regions are similar. Hence, the model can not deal with such images. We hope to find the such that the coupled model can be used for segmenting piecewise smooth images while its ability for differentiating nonsmooth structures is preserved. In our implementation, we run the segmentation with different values on all the mosaic images composed of the original textures (without subtracting the mean off), obtained in the last subsection. We alter the value of from to to obtain the error of segmentation shown in Fig. 7. We observe the steady performance for and , and we observe the clear decay of the performance when changing from to . This means that by selecting , the performance of the coupled model for differentiating nonsmooth image structures is almost as good as that of the pure piecewise linear patch reconstruction. We also apply the model with to the piecewise smooth images under noise and the good results are shown in Fig. 9, which indicates that the is already sufficient for segmentation of piecewise smooth images. We therefore chose in the coupled model to optimally balance the two terms in the coupled model.
VE Oneagainstall segmentation
We have so far only considered the presence of two different groups of contents in the image in our formulation, which is known as the twophase model. Due to the capability of the level set method for handling topology changes, the twophase model is still capable of partitioning the image into multiple smooth or nonsmooth regions of two groups. However, the twophase model cannot cope with multiple regions of multiple groups. The problem of segmentation of image into multiple different regions can be addressed via the oneagainstall strategy. In other words, we may consider a problem of phase segmentation as subproblems of twophase segmentation. In each subproblem, the image is to be partitioned into the target region and the background region which is composed of all the other regions.
To evaluate the oneagainstall strategy for coping with multiple different groups of regions, we apply this strategy to the mosaic images containing five different textures. The input and output of the segmentation are shown in Fig. 10. The initial rectangular contours are shown in a unique color assigned to a region in one image. The converged contour curves are shown in the same colors in the other image. The optimal bases corresponding to different regions are also visualized. We can observe that these bases well capture the principal structure of the corresponding regions. The error rates of the segmentation are summarized in Table II.
Vi Discussions and conclusion
Via Discussions on the limitations
The size of the patches and the number of bases are predetermined, which is empirical. Both of the two properties affect the segmentation. For example, when increasing the patch size, the number of bases needed for small reconstruction error would increase due to the enlarged dimensionality of the patch. Therefore, to achieve a small error of segmentation the number of bases needed has to be large. Larger patch size and more bases would also provide a richer description of the nonsmooth structure, which may help segmentation in complex situations. With a small patch size, which means the reconstruction error can be small with few bases, the discrimination might not be clear, since the same set of bases may give similar reconstruction errors to different group of patches in such case.
This work did not really target at natural image segmentation. Specifically, this work explores the mathematical model for addressing an important aspect of the natural image segmentation, i.e. that of coping with nonsmooth structures in the segmentation of images which are 2D signals. For natural image segmentation, more sophisticated framework has to be adopted to mimic human vision.
ViB Conclusion
We propose a unified energy minimization model of nonsmooth image segmentation without requiring any separate process of feature extraction. The model is the energy minimization of the error of piecewise linear patch reconstruction. The segmentation error rate of the proposed model is proven to be bounded by the patch reconstruction error. The gradient descent method for solving the linear patch reconstruction is proven to be globally optimal under mild conditions on the initialization. The experiments validate our theoretical claim and show the clear supreme performance of our methods over other relevant methods. The linear patch reconstruction in our approach can be viewed as an unsupervised dictionary learning process. Segmentation with features depends largely on the prior knowledge, whilst our approach is mostly datadriven. Our approach is useful when prior knowledge is inapplicable.
Appendix

Since and , we have . Thus, for any position we can choose , which is equivalent to at . As a result, for any . Thus, , which completes our proof.
Note that the proof relies on the Axiom of Choice for the continuous domain. The choosing is feasible if we approximate the integral by discretization.

To prove this proposition, we require the following lemma.
Lemma F.1
Given the same condition in Proposition IV.1 then the following holds.
(A1) The proof of this lemma is due to Markov’s inequality. This bound connects the error of segmentation to the error of reconstruction.
Additionally, we require the following fundamental model of functions in the literature of signal analysis, such as wavelets [26] and especially compressive sensing [27] [28].

Given a (discrete) function , and any subset of fixed system of sorted orthogonal bases such that , then the following bound holds for every ,
(A2) where , and .
The worst case is when , i.e. .

Theorem F.2
The optimization problem defined in Eq. (12) is nonconvex.

First, we shall prove that the objective functional is nonconvex. Taking functional derivatives of the energy twice, we have the following.
(A4) If this functional is convex, will be positive semidefinite. However, we can show that the is actually negative semidefinite as follows.
(A5) Hence, the objective functional is concave. Further, we consider the constraints in terms of orthonormality. Let and both satisfy the constraints. We wonder whether the convex combination also satisfy the constraints. For example, let , where . Then we can verify that , unless , which is generally not true. Therefore, the constraints are also nonconvex. Either of the above two facts can complete our proof.
Theorem F.3
In the optimization problem defined in Eq. (12), if is the global optimal solution, then the transformation of
via an orthogonal matrix
is also the global optimal solution.
First, we define a linearly transformed patch basis
as follows.(A6) where is an mixing matrix. We may simply write the following.
(A7) where .
Hence, the energy in terms of could be rewritten as follows.
(A8) If the mixing matrix is orthogonal, we obtain the following.
(A9) Therefore, if is the global optimal solution, the is also the global optimal solution, which completes the proof.
To prove Theorem IV.3 we require some lemmas.
Lemma F.4
The integral operator is symmetric positive semidefinite.
The symmetry is straightforward. The proof of positive semidefiniteness is as in the proof of Theorem F.2.
Hence according to Mercer’s theorem of eigendecomposition of symmetric nonnegative definite bounded integral operator, we can write
in the form of infinite series of eigenfunctions as follows.
Theorem F.5 (Mercer’s representation)
Assuming , then
(A10) 
where are the eigenvalues, is the set of eigenfunctions, and the eigenfunctions form a system of orthogonal basis.
A proof of the above may be found in [29]. Now we are in a position to prove the Theorem IV.3.

First, we write the gradient descent differential equation for the STEP 1 as follows.
(A11) where . The corresponding update equation for the th iteration is the following.
(A12) Suppose , the update equation for the first iteration could be rewritten as follows.
(A13) where we applied Mercer’s representation. Hence, the update equation for the th iteration is the following.
(A14) The normalization constraint gives us the desired result.
(A15) where we may omit the sign. The proofs for the subsequent steps is hence straightforward, where we only need to replace with
. This setting is due to the GramSchmidt process in the constraints. The energy
is therefore the sum of eigenvalues due to Mercer’s representation.

According to the proof of Theorem IV.3, the th iteration for STEP is the following.
(A16) Let , . Note that the eigenvalues are ordered. Hence, . From these we deduce the following.
Comments
There are no comments yet.