Image matting is an important but still challenging problem in the field of image processing and computer vision, and it is usually implemented by extracting the foreground of an image in terms of estimating a proportional factor which measures the degree of a pixel belonging to the foreground. It has been widely used in image compositing, video editing, film production, and so on. For example, image matting techniques can be used to create image composition or promote further editing tasks[1, 2, 3, 4, 5, 6, 7, 8, 9], and to extract the moving objects in the video and re-composite them into desired scenes [10, 11, 12, 13, 14, 15, 16, 17].
Image matting is different from saliency detection, though they are much alike. Saliency detection is a computational process to predict such salient stimuli (regions) in images or videos for humans, and has seen many applications in content-aware image editing, adaptive image/video, displaying, and advertisement . While image matting particularly pays attention to the boundaries between the foreground and background, and its main task is how to make the boundaries accurate enough.
To properly extract semantically meaningful foreground objects, users usually manually label the foreground, background and unknown regions of an input image before matting. This three parts form the trimaps as shown in Figure 1. Using the trimaps, the problem of image matting then turns into estimating the alpha values for the pixels in unknown regions based on the known foreground and background pixels. Except for the trimaps, another type of prior information can be obtained by the strokes which demand fewer user interaction and operation. The strokes-based algorithms consider the marked scribbles as the input to extract the alpha matte. In terms of application purposes, the trimaps method is appropriate for the matting situations where high quality matting is demanded, while the strokes method is suitable for matting cases where no high accuracy matting is required, but free-style user interaction is preferred . With the development of computer science and digital imaging technologies, the image matting is drawing more and more attention from both professionals and consumers. A variety of image matting methods have been proposed in the past decades. The current typical image matting algorithms can be categorized into sample-based matting methods and affinity-based matting methods.
Sample-based matting methods [20, 21, 22, 23] usually consider one pixel is surrounded by a local region and perform image matting by sampling some neighbor pixels in certain rules. Among them, Bayesian matting , and Shared matting  are two typical methods. The Bayesian matting is based on the Ruson and Tomasi’ algorithm and uses a continuously sliding small window for the neighborhood definition to model Gaussian mixtures for the foreground and background. Shared matting assumes that the neighborhood pixels share similar attributes in a small observation window, and aiming at the real-time matting technique. Sample-based matting methods perform well for some simple color patterns. However, this type of methods cannot process the complex object effectively, because the matting results particularly rely on the sampling strategy.
For avoiding the disadvantage of the sample-based matting method, affinity-based matting methods [26, 27, 28, 29] implement image matting by the assumption of local smoothness. This is because the correlation of foreground pixels and background pixels is more strong in a small local window. For example, Poisson matting  assumes the background and the foreground colors are locally smooth for unknown pixels, then it needs to solve a homogenous Laplacian matrix. Technically, it uses an approximate gradient field of mattes. Closed-form matting approach  is proposed by introducing the matting Laplacian matrix and solving a quadratic cost function under the assumption that the colors, both foreground and background, can be fit with linear model in the local window. For Spectral matting 
, there is a much important conclusion that the smallest eigenvectors of the matting Laplacian matrix span the individual matting components of the image, thus the image mask can be recovered by these components linearly. Towards the images with complex textures, affinity-based matting methods performs more robustly than that of sample-based matting methods. However, the computation complexity of affinity-based matting methods is mostly high. This is because this type matting methods usually need to solve a huge affinity matrix. Moreover, these methods may dim the definite boundary.
|given dataset in a high dimensional space||representation of -th patch|
|obtain dataset in a reduced lowly space||selection matrix|
|color of -th pixel||local window for -th pixel|
|color of -th neighbor of -th pixel||reconstruction error of -th neighbor of -th color patch|
|patch in a high dimensional space||reconstruction error of -th neighbor of -th alpha patch|
|patch in a reduced lowly space||Moor-Penrose pseudo inverse of|
|and||number of pixels in a local patch||weighted factor between and its neighbor|
|affine transformation between and||identity matrix|
|orthonormal columns||number of neighbors|
|matting feature of -th pixel||
a known vector from the trimaps
|alpha vector in -th patch||the sequence of approximate solutions|
|reconstruction error of -th patch in alpha space||the sequence of search points|
|a vector with dimensions||a variable as the iterations|
To make good use of the idea of sampling and affinity, some confusion methods [33, 34, 35, 36, 37] are proposed to solve the image matting problem. Actually, it is a trade-off due to that the confusion matting methods are composited by sample-based matting method and affinity-based matting methods. However, they still can not entirely alleviate the problems confronted by the two types of matting methods. As such, much more methods are proposed for image/video matting [38, 39, 40, 41, 42]. More typically, the work  is based on the prior information of light field, and  is based on defocus spectral information. For further capturing the low-dimensional manifold structure of complex pattern, a few manifold matting methods are proposed in recent years. They include LTSA matting , LLE matting , and [47, 48, 49, 50]. These manifold matting algorithms are designed by optimizing some intuitive energy functions. In fact, there are a multitude of manifold learning methods, including Laplacian Eigenmaps (LE) 
, Maximum Variance Unfolding (MVU), ISOMAP , Locality Preserving Projections (LPP) , and so on [55, 56]
. Each of them has its own advantage in processing high-dimensional data. But the aforementioned manifold matting algorithms can not be effectively fused by the manifold learning methods.
Motivated by the above problems, we in this paper propose a unified manifold matting framework named as Patch Alignment Manifold Matting (PAMM) for image matting. The idea is that under the assumption that the alpha space shares a common subspace with the color space, we propose a manifold model of local image patches in color space, and attempt to mine the intrinsic information of the alpha space to compute the alpha value by using patch alignment manifold learning. In particular, PAMM mainly consists of part modeling and whole alignment. The part modeling is used to produce the alpha reconstruction error of one local patch, while the whole alignment is performed to derive the whole alpha reconstruction error. Moreover, prior information is offered with trimaps for the optimization problem. This is because the image matting methods need the prior information to learn the definite foreground, definite background, and unknown region.
Furthermore, we utilize an efficient Nesterov’s algorithm [57, 58, 59] to iteratively solve the optimization problem until the users need is met . Then the expected alpha mask is obtained and utilized to extract the image foreground. Finally, we construct some concrete example of our proposed PAMM framework. Among them, the two new manifold learning matting algorithms, termed ISOMAP matting and its derived Cascade ISOMAP matting (CasISO matting), are more effective. The experimental results reveal the effectiveness of the manifold matting framework and its two example methods by comparing with several representative matting methods.
The contributions of this paper are as follows. First, we propose a unified manifold image matting framework called PAMM into which different manifold learning methods can be incorporated to produce the corresponding image matting technologies. Due to that this framework is constructed to adaptively mine the intrinsic information in the alpha space, it can process the complex pattern of an image better than the current representative methods. Second, we present two efficient implementations, ISOMAP matting and CasISO matting, to manifest the universal application of the framework. This two matting methods can deal with the nonlinear data distribution and well preserve discriminability of pixel classes. Third, we perform extensive experiments for comparing our proposed methods with eight current representative methods. Experimental results reveal the effectiveness and superiority of our proposed methods.
The remainder sections of this paper are organized as follows. In section II, Some related works are presented. The proposed matting framework PAMM and some representative manifold matting methods including ISOMAP matting and CasISO matting are described in section III. Experimental results are shown in section IV. Finally, we present the conclusions and future work.
Ii Related Work
In this section, we will introduce some related works about dimension reduction which is a key step in our proposed manifold matting scheme. Locally Linear Embedding (LLE) 
is a powerful eigenvector method for the problem of the nonlinear reduction. The LLE uses linear coefficients, which reconstruct a given measurement by its neighbors, to represent the local geometry. Then the LLE seeks a low dimensional embedding in which these coefficients are still suitable for reconstruction. Therefore, the LLE is an unsupervised learning algorithm which is based on the linear structure over a local window. Local Tangent Space Alignment (LTSA) exploits the local tangent information as a representation of local geometry, and this local tangent information is then aligned to provide a global coordinate. The LTSA matting assumes that the local smoothness assumptions have been replaced by implicit manifold structure defined in local color spaces and formulate a new cost function. The algorithm of LTSA first extracts local information and then constructs alignment matrices, followed by aligning global coordinates. The Laplacian Eigenmaps (LE)  is a geometrically motivated approach to nonlinear dimensionality reduction which has locality-preserving properties and natural connections to the graph embedding clustering. This dimensionality reduction method constructs an undirected and weighted graph to describe the data manifold, and the low-dimensional data can be found by solving the graph embedding. The Maximum Variance Unfolding (MVU) , also called Semidefinite Embedding (SDE), uses Semidefinite Programming (SDP) and Kernel Matrix Factorization (KMF) to model nonlinear dimensionality reduction problems. The ISOMAP  is also a excellent manifold learning method estimating the geodesic distance between faraway points. The ISOMAP preserves global geodesic distances of all the pairs of measurements. For neighboring points, input space distance provides a good approximation to geodesic distance. These approximations are computed efficiently by finding shortest paths in a graph with edges connecting neighboring data points.
Iii Patch Alignment Manifold Matting Framework
For the problem of image matting, a basic assumption model is mostly used as Eq. (1).
where the color of pixel can be denoted as . The image can be regarded as two components: foreground image and background image , and the term means the opacity of the pixel and balances the two components. This formulation is an illness model with many unknown terms so that the biggest challenge of image matting is how to find the optimal alpha solutions. In fact, there are some dimensional information redundancy in the alpha space. We can solve the problem by using the patch alignment method.
The idea of patch alignment is firstly introduced in . It reveals the intrinsic structure on which most of the nonlinear dimensionality reduction methods and manifold learning methods are based [65, 66, 67, 68, 69]. This framework consists of two parts: part modeling and whole alignment. For part modeling, different algorithms have different optimization criteria over patches, and each of them is built using a certain distance measurement with its related points. The part modeling usually applies manifold learning algorithms, such as LTSA, LLE, MVU, ISOMAP, and so on. For whole alignment, all part modelings are integrated to form final global coordinate for all of the independent patches based on the alignment strategy, originally used in . And the whole alignment stage unifies dimensionality reduction algorithms of spectral-based analysis. This framework discovers that 1) algorithms are intrinsically different in the patch optimization stage, and 2) all algorithms share an almost identical whole alignment stage. Some important notations used in the paper are summaried in Table I.
For the part modeling, different algorithms have different optimization methods over patches. In the part modeling step, we consider any measurement and its k related nearest neighbors and . The matrix is formed to denote the patch. For , we have a part mapping and . The part modeling is defined as
where is the trace operator; varies with the different algorithms, encoding the objective function for the -th patch.
In whole alignment stage, all part modelings are integrated to form the final global coordinate for all independent patches. We denote the as a low dimensional data representation for each patch . Assuming that the coordinate of each patch is selected from the global coordinate , that is
where is the selection matrix and an entry is defined as
where denotes the index set for the patch that is consists of the measurement (or ) and its k related neighbors. Thus, the formulation can be rewritten as
Gathering all of the patch s, there will be a whole alignment matrix. The whole alignment can be derived as
where denotes the number of the image patches; and is the alignment matrix. This is the idea of patch alignment. In the following subsections, the framework of PAMM will be presented.
1) PAMM Part Modeling. For the image matting problem, the manifold learning methods are utilized on small image patches. The methods are conducted on the RGB color space to find the subspace, so that we can obtain the reconstruction error between the observation data and the assumption model. Then the reconstruction error can be optimized to minimum energy. In this paper, we try to explore the possibility of applying dimensionality reduction algorithms, particularly manifold learning techniques, to solving image matting problem based on the alpha model. For image matting problem, geometric structure of pixels must be taken into consideration firstly, so small local windows are defined in which the neighbors of a pixel data are chosen from. For pixel and its RGB color vector , the neighborhoods of the are defined as the vectors . And denotes the -th neighbor of the pixel in terms of a local window , usually a window with as the center pixel. For each pixel , define a subset
where is the pixel index in the local window.
For the purpose of image matting, there is a basic assumption that the pixel of natural color image has three alpha channels which correspond to the RGB color channels. The affinity subspace of the color space is transformed into that of alpha space. That is to say, the color channels and the alpha channels will share the same color subspace.
The local affine is computed by RGB color vector and its manifold subspace, and the data distribution of a local window is assumed manifold structure. Therefore, in a small local window , the color affine subspace can be used to derive the reconstruction error function of the alpha data space. The matting problem is aiming at to solve alpha solution of image pixels, so we build a reconstruction error which is based on the color subspace reduced dimensions from the original color data. There is a method which can be utilized to get the affine subspace approximation between the and the for most of the manifold learning algorithms
where the is the affine transformation which can find the low dimension subspace of the high dimension data .
Applying some manifold learning algorithm over color , there will be a of orthonormal columns  such that
where with the identity matrix is the reconstruction error, is the local coordinates over the subspace in the color space and is the mean color vector.
For the purpose of image matting, the global matting feature of the local coordinates is reconstructed. This local coordinates are based on the local information on the local manifold defined by the windows. Specifically, we wish for the matting values to satisfy the set of the equations as follows, according to local structures determined by the ,
where is the mean of , and is the local affine transformation matrix in the alpha space. Denote , is a vector with dimensions, and . However, we assume that the color space and the alpha space share the same low dimensional subspace .
2) PAMM Whole Alignment. In the whole alignment, the is represented as the low dimensional data for each patch . Combining all of the unknown reconstruction errors of image patches, we can derive a total reconstruction error
Note that the components consist of patches and are overlapped, and thus this formula can be rewritten as
where , is the selection matrix and . The matrix is called the patch alignment matrix. The manifold learning methods will share the same framework proposed, and they respectively have different patch alignment matrixes with their own subspace error.
Note that in the assumption model of Eq. (1), means the opacity of the pixel , which is used to balance the foreground image and background image. We construct the vector to represent the vector consisting of s of the pixel with its neighbors. In this way, image matting can be reduced to the problem of solving or
. By using our proposed manifold matting framework, we can compute the alpha value corresponding to each pixel, and thus each pixel can be classified as a foreground one or a background one.
The proposed PAMM scheme is summarized in Figure 1. We assume that the color space (RGB) can be approximated by the manifold subspace in local patch. The manifold learning methods, LTSA, LLE, MVU, ISOMAP, are utilized to calculate color subspace for the color space. Besides, the alpha space is assumed to have the shared data subspace with its color space. Applying the color subspace, the alpha patch reconstruction error is obtained. Then aligning the reconstruction errors of all the patches, the energy optimization is derived. The final alpha solution of the energy function will be optimized using iterative shrinkage-thresholding algorithms. After solving the problem, the foreground and background are reconstructed using the . Its procedure is presented in Algorithm 1. In the following subsections, we will present several derived manifold matting methods based on the PAMM framework.
With a linear term, the problem of total error above will be derived as a function
where is a known vector from the trimaps, and it has the same dimension as . This function is still a smooth and convex problem.
In order to determine , we apply the Nesterov’s algorithm [57, 58, 59] which has been proved as an optimal first order method for smooth convex optimization to solve this problem. As same as the gradient method, the Nesterov’s algorithm does not require more than one gradient evaluation at each iteration, but just an additional point that is smartly chosen and easy to compute. Besides, the convergence rate of this optimization algorithm is with an complexity . Applying this optimization algorithm, the key steps will be briefly introduced as below.
where is a constant. In this step, the Nesterov’s method is utilized and based on two sequences which is the sequence of approximate solutions and which is the sequence of search points that
where is a coefficient which need to be chosen, whereas it is a variable as the iterations, not a constant.
Then the alpha solution can be solved by the formula as below
where is determined by line search rule, and the min and max operators are over vectors as well as in Matlab.
Iii-a LLE Matting
As for the LLE matting  of an image, the nearest neighbors are defined from spatial distance. The is used to denote the subset of color vectors over local patch window pixels of the -th pixel. Note that the pixel color is contained in patch . Under this LLE assumption, the color vector at pixel can be approximated by a linear combination (so-called reconstruction weights) of its -nearest neighbors of in patch
. Therefore, LLE fits a hyperplane throughand its nearest neighbors in the color manifold space are defined over the image pixels.
For reasonable alpha solutions of image matting, LLE assumes that the local alpha space is preserved well as same as the color manifold. Once the is determined, applying the manifold matting framework, the reconstruction error functions of the LLE matting can be determined by minimizing the following objective function
where , is called LLE alignment matrix, and is the identity matrix.
Iii-B LTSA Matting
In terms of LTSA matting , we define neighborhood points by incorporating pixel geometric structure. For each pixel , we define the neighborhood of as the RGB vector . And the means pixel is the neighbor of pixel in a local window. Applying the classical PCA over local window , there is a of (chosen to ) orthonormal columns such that . From above step, we can get the subspace data on the image patch of the LTSA matting method. Therefore, applying the manifold matting framework, the whole alignment reconstruction error of LTSA matting is as follows
where is the alignment matrix of LTSA matting method.
Iii-C ISOMAP Matting
The ISOMAP  is an excellent manifold learning method estimating the geodesic distance between faraway points. So we firstly propose ISOMAP matting method applying ISOMAP on the PAMM framework. This matting method can deal with the nonlinear data distribution and better preserve discriminability of pixel classes. For the image matting problem, the manifold learning methods are utilized on small image patches. The methods are conducted on the RGB color space to find the subspace, so that we can obtain the reconstruction error between the observation data and the assumption model. Then the reconstruction error can be optimized to minimum energy.
The ISOMAP method needn’t compute the dimensionality reduction over the whole image because there will be much computation cost when the number of data points is exponentially growing. For image matting, it is reasonable to apply ISOMAP method to obtain color subspace in local patches. Applying the ISOMAP method over the patch which has three dimension RGB color channels, there will be a subspace Y of lower dimension. Then we can get the affine subspace approximation from between the and the . The formulations can be derived as follows
where the is the affine approximation which can find the low dimension subspace of the high dimension data . Just like the approach above, the error function can be derived as follows
where the matrix is called the ISOMAP patch alignment matrix.
Iii-D CasISO Matting
Based on the ISOMAP matting, we also propose a Cascade ISOMAP matting (CasISO matting). Because we want to try our best to explore the best approximate subspace which can obtain best foreground mask and the minimum of reconstruction error. Compared to the ISOMAP matting, the CasISO matting consists of two stages to find its approximate alpha subspace. In a general way, we in the first stage utilize the manifold learning method ISOMAP to transform the color space of an image into one color subspace. Using the same ISOMAP method, we then transform this color subspace into another color subspace in the second stage. This color subspace will be shared to the alpha space. Note that the data space structure will be fully adjusted, and the color subspace will be better for the alpha subspace in this strategy.
Iii-E Other Examples
Using the same strategy, the LE matting can obtain the reconstruction error function as below
where the is the patch alignment matrix of the LE matting method. And also for the MVU matting, we can get the whole alignment reconstruction error
where the matrix is called MVU patch alignment matrix.
These manifold learning methods are good at finding the shared subspace of the color space and the alpha space, and then they are in favor of deriving the whole reconstruction error. Finally, the optimized energy of the whole reconstruction error will be solved by the Nesterov’s algorithm. Hence, these manifold learning methods are good fit for the PAMM.
In this section, we will demonstrate the effectiveness of the ISOMAP matting and CasISO matting methods resulted from our proposed manifold matting framework PAMM by comparing them with the current representative matting methods.
Iv-a Experimental Settings
In order to demonstrate the effectiveness of the proposed algorithms, one famous publicly available image dataset named alphamatting dataset (http://www.alphamatting.com/)  is utilized in the experiment. The alphamatting dataset provides testing dataset and training dataset and the ground foreground colors for the images in the training dataset for those who need them. The foreground colors are provided as RGB files. All of the training dataset images are used for the qualitative and quantitative experiment from the dataset. And these images are low resolution training images with some pixels. There is no existing image matting method which can automatically define the semantic foreground object which fully matches user’s requirement. So for image matting, we mostly provide some labels for some pixels with trimaps. In order to obtain a perfect alpha matte result and a fair evaluation, the alphamatting dataset also provides so-called trimaps for the matting algorithms or systems. The trimaps are composed of three parts: definite foreground, definite background and unknown regions.
Iv-A2 Methods for comparisons
For the proposed manifold matting framework, we can unite different image matting methods to obtain alpha masks. Many manifold learning matting methods, such as LE matting, LLE matting, MVU matting, LTSA matting, ISOMAP matting, and CasISO matting are utilized to make comparisons, both qualitatively and quantitatively. Besides, some other non-manifold newly matting methods are also selected as its competitors. For instance, Closed-Form matting, Shared matting, Com-Sampling matting  and Com-Weighted matting  are the latest matting methods which are based on other non-manifold theories. For the structure limit of the table and the figure, the Closed-Form matting, Shared matting, Com-Sampling matting, Com-Weighted matting, LE matting, LLE matting, LTSA matting, MVU matting, ISOMAP matting and CasISO matting in Table III, Table IV and Figure 2 are represented as Closed, Shared, ComSamp, ComWeight, LE, LLE, LTSA, MVU, ISOMAP and CasISO. All of the image matting methods share the same input images and the same trimaps. The manifold image matting methods require large enough memory and long CPU time to iteratively get the final results, so the experimental images in this paper are converted into a certain size with pixels.
Two metric criteria are used to measure the error between the extracted masks and their ground truth. They are Sum of Absolute Differences (SAD) and Mean Squared Error (MSE), respectively. The MSE is defined as
and the SAD is defined as
where M denotes result mask, and G is ground-truth.
Iv-B Matting Performances
We first investigate the sensitivity of the subspace dimension in the CasISO matting to the matting results. In order to find the best approximate subspace of CasISO matting, several combinations of subspace dimensions are used in our experiments. The experimental results are presented in Table II. The Dims means variation trend of dimension. First of all, take 3-4-2 for example, the proposed method raises dimension from 3 dimensions to 4 dimensions, and then reduces dimension from 4 dimensions to 2 dimensions. In addition, both of this two steps apply the ISOMAP method. The purpose of raising dimension is to expand the structure of the data, and the dimension reduction wants to find its approximate subspace in nature. From Table. II, the best combination dims of CasISO matting is obtained by 3-3-3. The MSE of 3-3-3 is with the minimal 0.0022, and the SAD of 3-3-3 is with the minimal 156.70. Although the data dimensions of 3-3-3 are remained, the distribution structure of data space is changed. By this dimensional combination of 3-3-3, the color subspace will be beneficial to the approximation for the original color space and the formation of reconstruction error for the alpha space.
We further perform experiments to find the optimal number of iterations used in the proposed methods. We only test the proposed ISOMAP matting and CasISO matting algorithms due to that they utilize the same efficient Nesterov’s algorithm with other example manifold matting algorithms. On the other hand, it is hard to know how many iterations the ISOMAP matting and CasISO matting algorithms need so as to get a optimal alpha mask. Therefore, we compute the MSE criterion of the CasISO matting algorithm in different K iterations as in Figure. 5. Because different image has different convergence iterations, we compute the average MSE on all of the testing image dataset. The red curve in Figure. 5 represents the average MSE, and the GT04, GT07 and GT13 are selected randomly to show their differences in convergence iterations. We can see that the average MSE of CasISO matting is becoming stable when the iteration exceeds 250. Consequently, the iteration of the approximate Nesterov’s algorithm for the proposed ISOMAP matting and CasISO matting algorithms is set as 250.
The overlap between patched has been experimented, and showed in Figure 5, the MSE results are growing and the foreground mask are rougher as the center distance of two patches. So the physical meaning of overlap and the experiments demonstrated the overlap is necessary. Usually the local window is with particular size.We have experimented the pixel number in a patch in Figure 5, and the MSE is nearly stable when the pixel number less than 16, while that grows bigger slowly when the pixel number more than 16. However, the pixels in a patch couldn’t be too many, because it will be high time and space complexity, and causes bigger MSE reusults. Typically in our PAMM, the size of window is , hence the is set as 9.
Experimental alpha mask results in different iterations are also showed in Figure. 6. Obviously, the mask results become more and more accurate as the growing iteration, especially complicated hair in the foreground boundary. Compared to the ground truth image d), the alpha mask image c) with 250 iterations is close to the former in the texture apperance.
In this subsection, the proposed manifold matting framework, ISOMAP matting method, and CasISO matting method are verified both qualitatively and quantitatively on the alpha matting dataset. Three image matting examples of all competitive algorithms are showed qualitatively in Figure. 2. The matting methods are non-manifold from column d) to column g), while they are manifold matting methods column h) to column m). Nearly all of these methods have good performance visually, except for the LE matting method. We can see that all the matting examples of LE matting can not obtain accurate foreground. Compared to the ground truth, all the alpha mask results of proposed ISOMAP matting and CasISO matting are both perform well. In some cases, non-manifold matting methods perform better, such as example image of Com-Sampling method and Com-Weighted method in the first row. However, because of the complicated hair between foreground and background, it is indispensable to make more comparisons quantitatively.
To further show the effectiveness of our proposed methods, we perform more comparative experiments on the testing dataset. The MSE criterion scores and SAD criterion scores on the testing dataset of all the 10 matting algorithms are showed in Talbe III and Table IV. The proposed ISOMAP matting method and CasISO matting method perform best in average MSE criterion with 0.0024 and 0.0022 respectively. Although the Closed-Form matting method ranks first in average SAD criterion with 143.26, the CasISO matting method and ISOMAP matting also have more remarkable average SAD scores than others. The CasISO matting method ranks second with 156.70 average SAD score and ISOMAP matting ranks third with 161.29 average SAD score. For different images, the competitive methods have different performances in Talbe III and Table IV. For a single image, the proposed ISOMAP matting and CasISO matting obtain nearly all the best results among manifold matting methods, including the LE matting, LLE matting, MVU matting, LTSA matting. Specifically for image GT16, the proposed CasISO matting get the best SAD score with the 954.18 and nearly best MSE score with the 0.0245. Some non-manifold matting methods, such as the Closed-Form matting, Com-Sampling matting and Com-Weighted matting, also obtain some best MSE and SAD results. However, the data distributions of some examples are very complex and there are some particular cases which are very hard to completely model. Also it is very hard to visually distinguish the differences of masks between CasISO matting and ISOMAP matting. In Table IV, the SAD scores of ISOMAP matting are better for some pictures, while those of CasISO matting outperform better in other cases. However, the CasISO matting outperforms the ISOMAP matting on the whole, and it obtains most of the best MSE and SAD results. Because the manifold data structure of CasISO matting is fully adjusted and the alpha results are much closer to the ground truth. But the time complex of CasISO matting is higher than the ISOMAP matting. We showed the E-time of all comparison methods in Table V, the manifold-based matting methods generally need more E-time, because they need to construct patch alignment matrixes and use Nesterov’s algorithm to optimize iteratively. Others are non-manifold based matting methods, and they need less E-time. Especially, Shared matting which has a real-time performance is implemented using C++.
Utilizing different manifold algorithms, the proposed manifold matting framework can obtain different foreground masks. The experiments above demonstrate that manifold matting algorithms have common inherent data traits for matting problem. So it is necessary and worthwhile to summary the existing manifold matting algorithms in an unified manifold matting framework. Both qualitative and quantitative comparisons can prove that the ISOMAP matting and CasISO matting method fit the matting framework and perform well. Additionally, some more matting results of the CasISO matting method on the dataset are provided in Figure. 7. Except for the 3 images showed in Figure. 2, we in this figure show the alpha results of other 24 images from the alphamatting dataset. For most of these images, the CasISO matting method can give excellent mask results. Therefore, all of the experimental results of CasISO matting method are showed in this paper. We can see the foreground matting results are smooth and continuous, such as the complex hairs of Barbie doll images and small inner holes of the flower and plant images. So the CasISO matting method is robust and also reveal the effectiveness of the proposed framework PAMM.
In summary, the above experiments demonstrate that the ISOMAP matting and CasISO matting methods resulted from our proposed manifold matting framework PAMM are effective and feasible compared with eight representative matting methods.
V Conclusions and Future Work
In this paper, we investigate the image matting problem, and propose a new patch alignment manifold matting (PAMM) framework and its two concrete algorithms, the ISOMAP matting and its extension CasISO matting. This PAMM framework consists of part modeling and whole alignment optimization by minimizing the reconstruction error with the efficient Nesterov’s algorithm. In addition, it is a unified manifold matting framework in which manifold learning methods can be incorporated as a manifold dimension reduction step. The experimental results show the effectiveness of the manifold matting framework. Moreover, the proposed example matting algorithms, ISOMAP matting and CasISO matting perform better than the several representative methods in some senses. In our future work, we will perform real-time manifold matting. On the other hand, we plan to use deep learning methods to perform image matting.
-  Y. Chen, Y. Ma, D. H. Kim, and S.-K. Park, “Region-based object recognition by color segmentation using a simplified pcnn,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, pp. 1682–1697, Aug. 2015.
M. Ruzon and C. Tomasi, “Alpha estimation in natural images,” in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Hilton Head Island, SC, Jun. 2000, pp. 1018–1025.
-  M. Zhao, C.-W. Fu, J. Cai, and T.-J. Cham, “Real-time and temporal-coherent foreground extraction with commodity rgbd camera,” IEEE J. Sel. Topics Signal Process., vol. 9, pp. 449–461, Apr. 2015.
-  X. Lu and X. Li, “Group sparse reconstruction for image segmentation,” Neurocomputing, vol. 136, pp. 41–48, 2014.
-  M. Gong, Y. Qian, and L. Cheng, “Integrated foreground segmentation and boundary matting for live videos,” IEEE Trans. Image Process., vol. 24, pp. 1356–1370, Apr. 2015.
-  J. Shen, Y. Du, W. Wang, and X. Li, “Lazy random walks for superpixel segmentation,” IEEE Trans. Image Process., vol. 23, pp. 1451–1462, Apr. 2014.
-  Y.-M. Zhang, K. Huang, G.-G. Geng, and C.-L. Liu, “Mtc: A fast and robust graph-based transductive learning method,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, pp. 1979–1991, Sept. 2015.
-  X. Liu, M. Song, D. Tao, J. Bu, and C. Chen, “Random geometric prior forest for multiclass object segmentation,” IEEE Trans. Image Process., vol. 24, pp. 3060–3070, Oct. 2015.
-  X. Yang, X. Gao, D. Tao, X. Li, and J. Li, “An efficient MRF embedded level set method for image segmentation,” IEEE Trans. Image Process., vol. 24, pp. 9–21, Jan. 2015.
-  X. Shen, X. Tao, H. Gao, C. Zhou, and J. Jia, Deep Automatic Portrait Matting. Cham: Springer International Publishing, 2016, pp. 92–107.
-  X. Wang, E. Türetken, F. Fleuret, and P. Fua, “Tracking interacting objects using intertwined flows,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 11, pp. 2312–2326, Nov. 2016.
-  L. Zhang, R. Ji, Y. Xia, Y. Zhang, and X. Li, “Learning a probabilistic topology discovering model for scene categorization,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, pp. 1622–1634, Aug. 2015.
-  Q. Duan, J. Cai, and J. Zheng, “Compressive environment matting,” J. Vis. Comput., vol. 31, no. 12, pp. 1587–1600, 2015.
-  X. Wang, E. Türetken, F. Fleuret, and P. Fua, “Tracking interacting objects optimally using integer programming,” in Proc. European Conf. Comput. Vis., Zurich, Swissland, Sep. 2014, pp. 17–32.
-  L. Zhang, R. Hong, Y. Gao, R. Ji, Q. Dai, and X. Li, “Image categorization by learning a propagated graphlet path,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, pp. 674–685, Mar. 2016.
-  Y. Fu, Z. Li, J. Yuan, Y. Wu, and T. S. Huang, “Locality versus globality: Query-driven localized linear models for facial image computing,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 12, pp. 1741–1752, Dec. 2008.
-  A. Maksai, X. Wang, and P. Fua, “What players do with the ball: a physically constrained interaction modeling,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Las Vegas, Nevada, USA, Jun. 2016, pp. 972–981.
-  C. Lang, J. Feng, S. Feng, J. Wang, and S. Yan, “Dual low-rank pursuit: Learning salient features for saliency detection,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, pp. 1190–1200, Jun. 2016.
-  Q. Zhu, P. A. Heng, L. Shao, and X. Li, “What’s the role of image matting in image segmentation?” in Proc. IEEE Int. Conf. Robot. and Biomimet., Shenzhen, Dec. 2013, pp. 1695–1698.
-  L. Karacan, A. Erdem, and E. Erdem, “Image matting with kl-divergence based sparse sampling,” in Proc. IEEE Int. Conf. Comput. Vis., Santiago, Dec. 2015, pp. 424–432.
-  J. Wang and M. Cohen, “Image and video matting: A survey,” Now Publishers Inc., vol. 3, no. 2, 2007.
-  Q. Zhu, L. Shao, X. Li, and L. Wang, “Targeting accurate object extraction from an image: A comprehensive study of natural image matting,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, pp. 185–207, Feb. 2015.
-  J. Wang and M. Cohen, “Optimized color sampling for robust matting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Minneapolis, MN, Jun. 2007.
-  Y.-Y. Chuang, B. Curless, D. Salesin, and R. Szeliski, “A bayesian approach to digital matting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2001, pp. 264–271.
-  E. Gastal and M. Oliveira, “Shared sampling for real-time alpha matting,” Comput. Graphics Forum, vol. 29, no. 2, pp. 575–584, 2010.
-  K. He, J. Sun, and X. Tang, “Fast matting using large kernel matting laplacian matrices,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., San Francisco, CA, Jun. 2010, pp. 2165–2172.
-  J. Shen, Y. Du, and X. Li, “Interactive segmentation using constrained laplacian optimization,” IEEE Trans. Circuits Syst. Video Technol., vol. 24, pp. 1088–1100, Jul. 2014.
-  S. Tierney and J. Gao, “Natural image matting with total variation regularisation,” in Proc. Int. Conf. Digit. Image Comput. Tech. and Appl., Fremantle, WA, Dec. 2012, pp. 1–8.
-  J. Gao, M., and J. Liu, “The image matting method with regularized matte,” in Proc. IEEE Int. Conf. Multimedia and Expo, Melbourne, VIC, Jul. 2012, pp. 550–555.
-  J. Sun, J. Jia, C. Tang, and H. Shum, “Poisson matting,” ACM Trans. Graphics, vol. 23, no. 3, pp. 315–321, 2004.
-  A. Levin, D. Lischinski, and Y. Weiss, “A closed-form solution to natural image matting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, pp. 228–242, Feb. 2008.
-  A. Levin, A. Rav-Acha, and D. Lischinski, “Spectral matting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, pp. 1699–1712, Oct. 2008.
-  C. Rother, V. Kolmogorov, and A. Blake, “Grabcut: Interactive foreground extraction using iterated graph cuts,” ACM Trans. Graphics, vol. 23, pp. 309–314, 2004.
-  L. Grady, T. Schiwietz, S. Aharon, and R. Westermann, “Random walks for interactive alpha-matting,” in Proc. Vis. Imag. Image Process., Benidorm, Spain, Sept. 2005, pp. 423–429.
-  Y. Guan, W. Chen, X. Liang, Z. Ding, and Q. Peng, “Easy matting-a stroke based approach for continuous image matting,” Comput. Graphics Forum, vol. 25, no. 3, pp. 567–576, 2006.
-  D. Liu, Y. Xiong, L. Shapiro, and K. Pulli, “Robust interactive image segmentation with automatic boundary refinement,” in Proc. Int. Conf. Image Process., Hong Kong, Sept. 2010, pp. 225–228.
-  X. Bai and G. Sapiro, “A geodesic framework for fast interactive image and video segmentation and matting,” in Proc. IEEE 11th Int. Conf. Comput. Vis., Rio de Janeiro, Oct. 2007, pp. 1–8.
-  J. Jubin, V. Ehsan, Shahrian, C. Hisham, and R. Deepu, “Sparse coding for alpha matting,” IEEE Trans. Image Process., vol. 25, pp. 3032–3043, Jul. 2016.
-  D. Zou, X. Chen, G. Cao, and X. Wang, “Video matting via sparse and low-rank representation,” in Proc. IEEE Int. Conf. Comput. Vis., Santiago, Chile, Dec. 2015, pp. 1564–1572.
-  B. Wang, X. Gao, D. Tao, and X. Li, “A nonlinear adaptive level set for image segmentation,” IEEE Trans. Cybern., vol. 44, pp. 418–428, Mar. 2014.
-  K. Liu, X. Li, and Y. Dong, “Superpixel fats for fast foreground extraction,” in Proc. IEEE China Summit Int. Conf. Signal and Inf. Process., Chengdu, Jul. 2015, pp. 132–136.
-  A. Alush and J. Goldberger, “Hierarchical image segmentation using correlation clustering,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, pp. 1358–1367, Jun. 2016.
-  J. Fiss, B. Curless, and R. Szeliski, “Light field layer matting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Boston, Massachusetts, Jun. 2015, pp. 623–631.
-  H. Zhou and T. Ahonen, “Automatic defocus spectral matting,” in Proc. IEEE Int. Conf. Image Process., Paris, Oct. 2014, pp. 4328–4332.
-  J. Gao, “Image matting via local tangent space alignment,” in Proc. Int. Conf. Digit. Image Comput. Tech. and Appl., Noosa, QLD, Dec. 2011, pp. 614–619.
-  D. Tien, J. Gao, and J. Tulip, “Image matting via lle/ille manifold learning,” Inf. Technol. Ind., vol. 1, no. 1, pp. 6–12, 2013.
-  K. Weinberger and L. Saul, “Unsupervised learning of image manifolds by semidefinite programming,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., vol. 2, Washington, DC, USA, Jun. 2004, pp. 988–995.
D. Tao, X. Li, X. Wu, and S. Maybank, “Geometric mean for subspace selection,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, pp. 260–274, Feb. 2009.
-  K. Weinberger and L. Saul, “Image manifolds by semidefinite programming,” J. Comput. Vis., vol. 70, no. 1, pp. 77–90, 2006.
-  X. Wang, Z. Li, and D. Tao, “Subspaces indexing model on grassmann manifold for image search,” IEEE Trans. Image Process., vol. 20, no. 9, pp. 2627–2635, Sep. 2011.
-  M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Comput., vol. 15, no. 6, pp. 1373–1396, 2003.
-  E. Castro and B. Pelletier, “On the convergence of maximum variance unfolding,” J. Mach. Learn. Res., vol. 14, no. 1, pp. 1747–1770, 2013.
-  J. Tenenbaum, V. Silva, and J. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Sci., vol. 290, pp. 2319–2323, 2000.
-  X. He and P. Niyogi, “Locality preserving projections,” in Proc. Conf. Neural Inf. Process. Syst., Whistler, British Columbia, CA, Dec. 2003, pp. 153–160.
-  L. Zhang, Q. Zhang, L. Zhang, D. Tao, X. Huang, and B. Du, “Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding,” Pattern Recognit., vol. 48, no. 10, pp. 3102–3112, Dec. 2015.
-  Y. Nesterov, “A method for solving a convex programming problem with convergence rate ,” Dokl. Akad. Nauk SSSR, vol. 27, pp. 372–376, 1983.
-  B. Amir and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imag. Sci., vol. 2, no. 1, pp. 186–202, 2009.
-  Y. Nesterov, “Gradient methods for minimizing composite objective function,” CORE report, 2007.
-  J. Wang and M. Cohen, “An iterative optimization approach for unified image segmentation and matting,” in Proc. IEEE Int. Conf. Comput. Vis., Beijing, Oct. 2005, pp. 936–943.
-  S. Roweis and L. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Sci., vol. 290, pp. 2323–2326, 2000.
-  Z. Zhang and H. Zha, “Principal manifolds and nonlinear dimension reduction via local tangent space alignment,” J. Sci. Comput., vol. 26, pp. 313–338, 2004.
-  J. Wang, “Maximum variance unfolding,” Geometric Structure of High-Dimensional Data and Dimensionality Reduction, pp. 181–202, 2011.
-  T. Zhang, D. Tao, X. Li, and J. Yang, “Patch alignment for dimensionality reduction,” IEEE Trans. Knowl. Data Eng., vol. 21, pp. 1299–1313, Sept. 2009.
-  Y. Fu, Z. Li, T. S. Huang, and A. K. Katsaggelos, “Locally adaptive subspace and similarity metric learning for visual data clustering and retrieval,” Comput. Vis. Image Underst., vol. 110, no. 3, pp. 390–402, Jun. 2008.
-  X. Gao, B. Xiao, D. Tao, and X. Li, “A survey of graph edit distance,” Pattern Anal. and Appl., vol. 13, no. 1, pp. 113–129, Feb. 2010.
-  J. Yu, D. Tao, and M. Wang, “Adaptive hypergraph learning and its application in image classification,” IEEE Tran. Image Process., vol. 21, no. 7, pp. 3262–3272, Jul. 2012.
-  B. Du and L. Zhang, “Target detection based on a dynamic subspace,” Pattern Recognit., vol. 47, no. 1, pp. 344–358, Jul. 2014.
-  Z. Ding, M. Shao, and Y. Fu, Deep Robust Encoder Through Locality Preserving Low-Rank Dictionary. Cham: Springer International Publishing, 2016, pp. 567–582.
-  E. Shahrian, D. Rajan, B. Price, and S. Cohen, “Improving image matting using comprehensive sampling sets,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Portland, OR, Jun. 2013, pp. 636–643.
-  E. Varnousfaderani and D. Rajan, “Weighted color and texture sample selection for image matting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Providence, RI, Nov. 2012, pp. 718–725.
-  C. Rhemann, C. Rother, J. Wang, M. Gelautz, P. Kohli, and P. Rott, “A perceptually motivated online benchmark for image matting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Miami, FL, Jun. 2009, pp. 1826–1833.