Estimation of the deformation parameters of a target (either objects or texture) is a fundamental technique mainly for computer vision applications such as registration(Gay-Bellile et al., 2010; Cao et al., 2018; Balakrishnan et al., 2019) and tracking (Tan et al., 2014; Wang et al., 2019, 2020). If the geometric deformation model is constrained to only rotation and translation then the deformation is rigid. Affine or projective transformations can express more complex deformation, while practical applications, such as medical image analysis (Rueckert et al., 1999; Oliveira and Tavares, 2014) and morphing (Alexa, 2002; Scherhag et al., 2019)
, often involve non-rigid deformation with more degrees of freedom. Furthermore, the deformation between two images can be global, local, and even space-invariant, which makes the problem more challenging because the movement vector of each pixel is required to be estimated independently while preserving the smoothness. This technique can estimate the deformation parameters by deforming a template image such that the similarity between the deformed template and a target image is maximized. The procedure of similarity maximization can be cast as an optimization problem by treating deformation parameters as decision variables and the similarity as the objective function.
As to the geometric deformation model, free-form deformation (FFD) (Sederberg and Parry, 1986) models the deformation by manipulating control points arranged in a regular lattice over the target. Each pixel moves based on weights by basis functions and displacements of surrounding control points. B-spline basis functions are generally used to weigh displacements (Hsu et al., 1992). The larger the number of control points is, the more finely the deformation can be modeled. In FFD, the influence of a control point is limited to neighbor pixels, which brings benefits with respect to ability in modeling and computational cost (Crum et al., 2004). Due to these characteristics, FFD is allowed to model highly free and subtle deformation. However, optimization with FFD’s parameters is challenging since an optimizer needs to treat the displacements of all control points as decision variables, and it is obvious that the expressive power of the deformation model is proportional to the number of parameters. Moreover, because each control point can affect multiple regions, an improvement of similarity in one region may negatively make similarity in other regions worse, i.e., there exist conflicts between regions.
To alleviate the above issues, we introduce a new idea to estimate the FFD parameters by casting the deformation estimation problem as a multi-objective optimization problem (MOP), which can be effectively solved by multi-objective evolutionary algorithms (MOEAs). The overview of our algorithm is shown in Fig. 1
. A template is spatially divided into several groups and the similarity measure over each group is treated as a single-objective function and independently computed. Each group consists of patches, and the pixels in each patch are affected by the same control points. A MOP requires simultaneous optimization of two or more objectives that conflict with each other. In our problem setting, we aim to find Pareto optimal solutions given none of the groups can be improved without degrading some of the other groups, which can be solved by certain off-the-shelf MOEAs. In addition, we adopt a coarse-to-fine strategy using image pyramids to improve the estimation capability, especially for large deformations. Specifically, the optimization starts at the top of the pyramid (i.e., the lowest resolution image) and is executed at each level of the pyramid. The number of control points is gradually increased as the resolution increases, and the interpolation of control points is realized by mesh subdivision to allow fine-grained deformation. Also, a post-processing method is proposed to integrate Pareto optimal solutions into a single output as the decision-making procedure. For each group in Fig.1, the group-wise deformation parameters with the highest group-wise similarity are adopted. These group-wise deformation parameters are aggregated into a final solution. We perform comparative experiments using both synthetic and real-world data to show the effectiveness and usefulness of our method. In conclusion, our contributions are threefold.
The deformation estimation problem is cast as a MOP by spatially dividing an image into multiple groups accompanied by independent similarity measures.
The estimation capability is improved by a coarse-to-fine strategy, which is realized by building image pyramids and conducting mesh subdivisions at each level.
A post-processing method is proposed to integrate Pareto optimal solutions into a final output.
The rest of this paper is organized as follows. We present related works of deformation estimation and MOEAs in Sec. 2 followed by a brief review of FFD model in Sec. 3. The overview of three off-the-shelf genetic algorithms (GAs) are given in Sec. 4. In Sec. 5, we describe the details of the spatial multi-objective problem and the coarse-to-fine optimization strategy. The experimental results are shown in Sec. 6. Conclusion is given in Sec. 7.
2 Related Work
2.1 Deformation estimation between two images
Many methods have been proposed to deal with deformable surfaces, which can be roughly categorized as feature-based methods and pixel-based methods. The former category estimates deformation parameters by using feature correspondences commonly extracted from two images. The latter category maximizes the similarity calculated using dense pixels directly. Also, hybrid methods have been studied to incorporate the advantages of both approaches (Zhu et al., 2009; Pizarro and Bartoli, 2012; Wu et al., 2013).
estimated deformation parameters based on the correspondences between feature points between the template image and the target image. The accuracy largely depends on the quality of the correspondences. Therefore, the elimination of outliers from the extracted feature set is an essential process. However, the large number of parameters in free-form deformation makes it difficult to apply standard methods such as RANSAC(Tran et al., 2012). Other limitations are: 1) In the case of feature-less images, feature points are hard to be detected. Without inlier correspondences, the parameters cannot be appropriately estimated. Especially in the case of non-rigid transformations, more corresponding points are required (Bartoli and Zisserman, 2004). 2) Local features such as SIFT (Lowe, 2004) and ASIFT (Morel and Yu, 2009) are susceptible to complex transformation, which may largely degrade the confidence of correspondences when complex transformation occurs.
The purpose of pixel-based methods (Bartoli and Zisserman, 2004; Gay-Bellile et al., 2006; Malis, 2007; Hilsmann et al., 2010; Gay-Bellile et al., 2010; Tan et al., 2014; Zhang and Akashi, 2015, 2016) is to solve the minimization problem of the cost function consisting of a data term and some restrictions such as smoothness term calculated from pixel intensities. The data term is usually defined as the sum of the intensity differences between the pixels of the template image and the corresponding pixels in the deformed target image. Such methods are less dependent on image features compared to feature-based methods. In addition, the capability of dealing with self-occlusion is a notable point. Since only a few features typically exist near the self-occlusion boundary, the pixel-based methods are more reasonable in such cases (Gay-Bellile et al., 2010; Pizarro and Bartoli, 2012). In (Gay-Bellile et al., 2010), a penalty term called shrinker is incorporated into the cost function. The shrinker term acts to shrink the displacement in order to make self-occluded areas disappear. (Pizarro and Bartoli, 2012)
employed a pixel-based approach to refine the deformation parameters given by a proposed feature-based method. When self-occlusion or strong deformations are involved, the hybrid method shows better results than only using the feature-based method. There also exist researches to maximize the similarity under the framework of evolutionary computation with a single -objective(Zhang and Akashi, 2015, 2016). There also exist methods achieving the minimization of the cost function by employing non-linear least squares solvers, such as the Gauss-Newton algorithm (Bartoli and Zisserman, 2004; Gay-Bellile et al., 2010; Pizarro and Bartoli, 2012), the Levenberg–Marquardt algorithm (Gay-Bellile et al., 2006; Hilsmann et al., 2010), and the learning-based methods (Tan et al., 2014). To the best of our knowledge, exploiting evolutionary algorithms (Klein et al., 2007) or multi-objective optimization approaches (Alderliesten et al., 2012; Pirpinia et al., 2019) to deal with deformable surfaces have been sparsely treated so far. Our previous work appearing in GECCO2019 addressed this problem by using a modified single-objective GA (Nakane et al., 2019). Different from the previous work, in this paper, we attempt to adopt evolutionary algorithms for solving this problem by casting it as a multi-objective optimization problem.
2.2 Multi-objective evolutionary algorithms (MOEAs)
EAs are optimization algorithms inspired by Darwin’s evolutionary theory, such as the GA, evolutionary strategy, and evolutionary programming. These algorithms share a common framework in which many candidate solutions are simultaneously dealt with and stochastic operations are iteratively applied. Because of the powerful exploration capability, EAs have been applied in a variety of computer vision tasks. Interested readers can also refer to the survey (Nakane et al., 2020). EAs are also effective tools for solving MOPs. The population-based search procedure provides the advantage of finding the Pareto optimal solutions in a single run. MOEAs use dominance relation to rank solutions in an objective space consisting of conflicting objectives. In particular, representative MOEAs, such as non-dominated sorting GA-II (NSGA-II) (Deb et al., 2002), strength Pareto EA2 (SPEA2) (Zitzler et al., 2002), and Pareto enveloped based selection algorithm-II (PEAS-II) (Corne et al., 2001), include a mechanism that preserves non-dominated solutions in every generation, called elitism, and hence these algorithms can outperform non-elitist MOEAs by preventing the loss of good solutions (Vachhani et al., 2015). Since the goal of MOEAs is to provide solutions that are widely distributed on the Pareto front, MOEAs are also required to maintain the diversity of solutions. In NSGA-II, a crowding distance was proposed which is the sum of the distances between the two nearest solutions for each objective. SPEA2 used the inverse of the distance to the k-th nearest solution as the density. PEAS-II divided the objective space into several hyperboxes and counted the number of solutions within them. The density was assigned to each hyperbox as the number of solutions contained. On the other hand, MOEAs are less effective for problems with four or more objectives, i.e., many-objective optimization problems (MaOPs). The main reason is that as the number of objectives increases, the condition of dominance becomes more complex. More objectives lead to a greater proportion of non-dominated solutions, and hence the ability of convergence toward the Pareto front decreases (Garza-Fabre et al., 2009). There are several strategies to adapt MOEAs to MaOPs (Li et al., 2015a), such as dimensionality reduction (Saxena et al., 2013; Wang and Yao, 2016) and use of indicators (Bader and Zitzler, 2011; Li et al., 2016). Among them, one representative strategy is via decomposition, e.g., MOEA based on decomposition (MOEA/D) (Zhang and Li, 2007), where a MaOP was decomposed into single-objective sub-problems using weighting vectors, and reference-point based many-objective NSGA-II (NSGA-III) (Deb and Jain, 2014), where an objective space was divided by reference vectors. There also exist various improved versions of MOEA/D (Xu et al., 2019; Zhang et al., 2020) and NSGA-III (Yuan et al., 2016; Cui et al., 2019). Combination of both merits is also shown in (Li et al., 2015b).
3 Deformation Model
Deformation estimation is achieved by registering the template image to the target image . In order to deform , we employ FFD combined with cubic B-splines using control point meshes. For of pixels, control points are arranged on the lattice with horizontal spacing and vertical spacing (i.e., the outmost control points are outside the region of ), as illustrated in Fig. 2. Each control point is assigned a displacement vector representing the distance and direction from the initial position, and the movement of a certain coordinate on is determined by surrounding control points, which is defined as:
where , and are the cubic B-spline basis functions (Lee et al., 1997). Then, the pixel coordinate on corresponding to is given by the transformation function :
Since the transformation using Eq. 2 is a forward warping procedure and consists of real numbers, there is a problem that rounding operation is necessary to obtain pixel intensities. Such a process results in a large number of “holes” in the deformed template image, as illustrated in Fig. (a)a and Fig (b)b. An alternative is to employ backward warping that corresponding to is computed using the inverse transformation and interpolation scheme can be used to obtain the pixel intensity. According to (Schwarz, 2007), can be defined using an approximation:
The deformation obtained in Eq. 3 and its difference from Eq. 2 are illustrated in Fig. (c)c and Fig. (d)d, respectively. In conclusion, Eq. 1 is employed for modeling geometric deformation and Eq. 3 is employed as the image transformation for registration in this paper.
4 Review of Genetic Algorithms (GAs)
To further explain how we use the multi-objective optimizer in Fig. 1 to solve the optimization problem, we provide a short review on GAs in this section. GA is one of the leading algorithms in nature-inspired optimization methods. For a population consisting of a number of candidate solutions, the GA gradually optimizes the by iteratively applying genetic operators. One of the unique features of the GA is genotype representations for candidate solutions. Each candidate solution, called individual , is encoded into an internal representation, such as a bit string or a real-valued vector, in order to apply genetic operators. The genotype representations can allow genetic operators to adapt to different problems flexibly.
The genetic operators mainly consist of parents selection, crossover and mutation. The procedure of the simple GA (Holland, 1975) is briefly listed as follows:
- Step 1:
Set and generate individuals in randomly.
- Step 2:
Select individuals as parents from with parents selection operator.
- Step 3:
Generate an offspring population from parents by crossover and mutation operators.
- Step 4:
Evaluate all individuals in .
- Step 5:
Select individuals from as .
- Step 6:
Set and return to Step 2 until the termination criterion is satisfied.
Parent selection is typically implemented as probabilistic selection biased by evaluation values (i.e., individuals with better evaluation values are assigned higher probabilities.) The crossover operation generates offspring through the genetic recombination of multiple parents. The mutation operation randomly changes genes of offspring with low probability. Step 5 is also known as survivors selection, that is, all individuals inand compete in order to become members of according to their evaluation values. The simple GA directly adopts as . Besides, the elitism strategy, which preserves the best individual in the pool is often adopted.
NSGA-II (Deb et al., 2002) is a representative GA-based algorithm for solving MOPs. The key idea of the NSGA-II is to introduce the selection criterion using two sorting approaches. The first one, called fast non-dominated sorting, iteratively extracts a non-dominated set from the population and assigns a rank to each according to the order in which they are extracted. Another one, called crowding distance sorting, determines the priority in by the crowding distance which represents the density of neighboring individuals in the solution space. At last, individuals are selected from consisting of to , where . Insufficient individuals are taken from according to the crowding distance. Fast non-dominated sorting promotes convergence to the Pareto front, while crowding distance sorting maintains diversity on the Pareto front. Moreover, elitism is ensured by using in survivors selection.
NSGA-III (Deb and Jain, 2014) is a variant of NSGA-II which focuses on solving MaOPs. Instead of crowding distance sorting, reference lines connecting the origin with reference points evenly distributed on the evaluation value space are used to maintain the diversity of the population on the Pareto front. Each is associated with the closest reference line in the perpendicular distance. After is determined by fast non-dominated sorting in survivors selection, the number of individuals in associated with each reference line, called niche count, is logged. NSGA-III then iteratively selects the individual in which is associated with the reference line of the lowest niche count. Reference points can relieve the algorithm in adaptively maintaining population diversity. In addition, users can obtain only a part of the Pareto front as required by manually distributing reference points.
5 Spatial Multi-Objective Optimization
As described in Sec. 3, the deformation of the template image is determined by the displacement of each control point. Therefore, the purpose of the optimization procedure is to calculate the displacements where the deformed template image matches the target in the target image most. The principal contribution of this work is to cast this task as a MOP by spatially partitioning the template into groups. Each group is assigned a single-objective function of the similarity measure. The overview of the optimization procedure is shown in Fig. 4. The procedure starts from building pyramids for both the template image and the target image, then the optimization is performed with the pyramids in a coarse-to-fine scheme. The key advantage of this framework is that the population optimized at each level can be inherited as the initial population of the next level. To ensure the consistency of parameter inheritance from low-resolution level to high-resolution level, a subdivision method considering control point mesh is employed to achieve a natural interpolation of additional displacements, which allows the optimized population to be directly inherited.
Final candidate solutions are obtained by optimizing the population at the bottom of the pyramid (i.e., the image in the original resolution). The subsequent effort lies in how to determine a final solution as output from multiple solutions on the Pareto front. In addition to a direct selection approach, a post-processing approach using multiple candidate solutions is also proposed.
5.1 Optimizing deformation parameters via MOEAs
Three GAs described in Sec. 4 are employed and compared to optimize the displacements in the experiment. Since each control point can move within the plane, the total number of decision variables is . To represent these variables as genes, we use real-valued coding because each displacement is a 2D vector of real values. For clarity, we denote the displacement of a single point by , and an individual is represented by the vector concatenating the displacements of all the points as follows,
can directly represent a candidate control point mesh. The initial population is iteratively optimized by genetic operators. As to the evaluation of each , for simplicity, a single-objective function is firstly introduced, which is combined with the simple GA and compared to multi-objective GAs in the experiment. In the objective function, mean absolute difference with respect to intensities is used as the similarity measure. As introduced in Sec. 3, we use backward warping to find the correspondences for the calculation of objective function, hence sampling is performed on the target image. Let and denote the entire region of the template image and the target image , respectively, and we can further define the sampling region as:
The single-objective function for the simple GA is given by:
Based on Eq. 6, spatial multi-objective functions can be easily defined. We first define patch as a region consisting of pixels affected by the shared control points, i.e., consists of patches. The group partitioning is achieved by dividing all the patches into groups, where is the number of objective functions. We denote the region of -th group as . One objective function is assigned over each group, which can be written by modifying Eq. 5 and Eq. 6:
where . Therefore, multi-objective GAs evaluate individuals based on the following vector function:
Because each control point affects multiple patches, their similarity functions can “conflict” with each other, which is also considered in the case of groups consisting of multiple patches. This is an important motivation for adopting MOEAs because a single-objective optimizer can hardly solve such a conflicting problem efficiently (Deb et al., 2002). The number of regions is adjustable as a hyper-parameter (increasing makes optimization significantly more difficult). Pareto optimal solutions are supposed to obtain more appropriate solutions than optimizing a single-objective.
5.2 Coarse-to-fine strategy
An iterative framework using image pyramids can be employed to alleviate the difficulty of deformation estimation with large displacements (Hilsmann et al., 2010; Gay-Bellile et al., 2010). Estimation starts from the lowest resolution image for a rough estimation, and more accurate estimations are achieved as the image resolution increases. Specifically, we adopt Gaussian pyramid which iteratively generates low resolution images through Gaussian smoothing. The assignment of pyramid level indices follows the order in which estimation is performed (i.e., the first and the -th level images are with the lowest and the highest resolution, respectively). Both the image width and height are halved and hence the -th level image has resolution of the -th level image.
With the increase of image resolution, it is also necessary to increase the resolution of the control point mesh to inherit deformation parameters between different levels. To this end, for a certain level, interpolating new control points without destroying the mesh configuration of the previous level is required. In this work, we adopt the Catmull–Clark subdivision (Catmull and Clark, 1978) for the mesh subdivision. The purpose of subdivision is to update the control point mesh with respect to each individual from the optimized to mesh, where and (i.e., and are fixed for all levels). The Catmull–Clark subdivision algorithm generates a subdivided mesh by inserting new control points and updating the existing control points. As illustrated in Fig. 5
, the points on the subdivided mesh can be classified into three types: face points, edge points, and vertex points. A face point is inserted to a patch. Assuming that the vertices of a patch areand , the face point is computed as their centroid,
An edge point is inserted to the edge shared by two patches. Assuming that two face points are and , and two endpoints of the edge are and , the edge point is computed as follow:
A vertex point is the updated point of a vertex shared by the four patches. Denoting that the average of the four face points is , and the average of the midpoints of the four edges which share as one of the endpoints is , the vertex point is computed as follows:
Note that the edge points and vertex points outside the dashed rectangle are not calculated, as surrounding control points are needed for calculation. After interpolation of control points, all the displacements are doubled to fit the increase of resolution in the image pyramid. An example of the subdivision process is shown in Fig. 6. It can be observed that the subdivided mesh in Fig. (b)b can maintain the shape of the previous mesh in Fig. (a)a well. Therefore, this subdivision step is useful for providing a good initialization for optimizing the image of the next level in the pyramid.
5.3 Decision of the final output
We introduce a post-processing procedure to decide the final single solution as the output from the optimized population. A natural idea is to define the best solution as the individual with the smallest sum of the objective function values (i.e., maximum similarity). However, in MOEAs, such an approach can lose most of the valuable information of the Pareto optimal solutions. We propose a post-processed solution by exploiting these sub-optimal solutions, as illustrated in Fig. 7. For each group, the solution with the smallest value of the corresponding objective function is extracted, which provides the control points that only affect the corresponding group. The final output is created by aggregating control points provided from all the groups. For shared control points, the average of their displacements is computed.
6 Experimental Results
The effectiveness and usefulness of solving the deformation estimation problem with the multi-objective scheme are verified using both synthetic data (Sec. 6.1) and real-world images (Sec. 6.2). We compared the following four settings: the simple GA with a single-objective, NSGA-II and NSGA-III with two-objectives respectively, and NSGA-III with four-objectives. In the two-objective setting, patches are divided equally and vertically into two groups (e.g., for a lattice including patches, each group consists of patches). Similarly, patches are divided vertically and horizontally into four groups in the four-objective setting (e.g., each group consists of patches for a lattice). The number of pyramid levels is fixed to three in all experiments. To reduce the computational cost, pixel sampling is performed for individual evaluation by scanning the target image at five pixel intervals. For the implementation of three GAs, the Platypus package111https://github.com/Project-Platypus/Platypus is used, which is an evolutionary computation framework in Python which includes many MOEAs. To ensure the correctness and fairness of the experiment, we only manually set several essential parameters and fix other parameters following the default setting throughout the experiment. Specifically, the number of evaluations is set to 10000 and the number of reference points of NSGA-III is set to 100 for the two-objective setting and 120 for the four-objective setting. For fair comparisons, all experiments are executed five times with different random seeds considering probabilistic operations. The initial population for all settings is kept the same with respect to each random seed.
6.1 Comparison with synthetic data
To only focus on verifying the estimation ability rather than robustness against noises, we prepare five images in pixels for generating template images and deformed target images, as shown in Fig. 8. The corresponding template images are obtained by cropping pixels regions from the center of each image. Eight types of deformations are used for generating a deformed target image, the parameters are:
Deformation (2 types): an image is deformed into a wavy shape by moving the control points according to a sine curve. In addition to vertical-only displacements, a combination of both vertical and horizontal displacements is used.
Lattice size (2 types): and lattices are used for the bottom image of the pyramid.
Ranges of decision variables (2 types): we use and as the range of each decision variable. Ranges are limited to make sure that control points do not overlap with each other spatially.
As a result, there are 40 (i.e., 5 images 8 types of parameter settings) types of deformed target images in total. These images are generated by using backward warping, and hence the ground truth of displacements can be obtained. In this section, we evaluate each result based on not only the root mean square error (RMSE) but also the mean Euclidean distance error (MEDE). RMSE is calculated from all the pixels between the deformed template image and the sampling region . MEDE is calculated based on all the ground truth displacements.
6.1.1 Results of vertical wavy images
Comprehensive numerical results of the best solutions with respect to the vertical wavy images are summarized in Table 1. For each combination of image setting and algorithm, the minimum, maximum, and average values based on five random trials are investigated. As can be observed by comparing the four algorithms in terms of RMSE and MEDE, it is clear that the two-objective algorithms can achieve better results in most cases. Examples of visual results are shown in Fig 9, from which we can observe that these methods can estimate deformation parameters correctly for all the test images. By contrast, GA with single-objective achieves the best result only once in terms of the average RMSE and zero times in terms of the average MEDE value. The accuracy of GA can degrade significantly, e.g., lattice with range as illustrated in Fig. (a)a. These observations verify the effectiveness of the multi-objective approaches. In addition, we can observe that four-objective NSGA-III gets poorer results than two-objective algorithms. Focusing on average values of both evaluation criteria, four-objective NSGA-III outperforms others for zero times regarding the RMSE and only once regarding the MEDE.
The final solutions after post-processing on Pareto optimal solutions obtained by multi-objective algorithms are compared in Table 2. As can be observed, the post-processing successfully improves the estimation accuracy in many cases comparing to Table 1. In particular, four-objective NSGA-III benefits most from the post-processing. Focusing on the number of improved results on the average value, four-objective NSGA-III performs the best 17 times on RMSE and 19 times on MEDE, while two-objective NSGA-II performs the best for 15 and 14 times, respectively.
6.1.2 Results of vertical and horizontal wavy images
The results of the best solutions with respect to the vertical and horizontal wavy images are shown in Table 3. Despite more complex deformations, the multi-objective approaches can still achieve good estimation. Several qualitative results are shown in Fig. 11. Comparing with Table 1, we can observe that the best results are irregularly distributed in terms of both evaluation criteria. Nevertheless, two-objective NSGA-II and NSGA-III are still better choices overall. In the case of lattice, GA can achieve the best result when the search space is small (e.g., the sea image). Four-objective NSGA-III shows good results in the case of lattice setting, which achieves the best average MEDE for six times out of ten times. Although the number of groups is a hyper-parameter to be handled carefully as mentioned in Sec. 6.1.1, our results show the trend that larger number of groups is more effective when dealing with complex and subtle deformations.
6.2 Comparison on real-world images
The usefulness of the proposed method under real-world scenarios is verified in this section. We use three different pairs of template and target images as shown in Fig. 12.
Texture (Fig. (a)a): the images capture a part of the undeformed/deformed texture printed on a piece of paper. The target image has two vertical bumps. We set the lattice as and the decision variable range as .
Sign (Fig. (b)b): the images capture a sign with undeformed/deformed text printed on a piece of wrapping paper. The lattice and decision variable range are set to lattice and , respectively.
Face (Fig. (c)c): the template image and the target image show a frontal face with serious expression and smile, respectively. The goal is to obtain deformation parameters that express smiling. We use the lattice and the decision variable range.
The sizes of the template images and target images are set the same as Sec. 6.1. Fig. (a)a and Fig. (b)b are captured by a web camera and Fig. (c)c is extracted from the FEI face database222https://fei.edu.br/~cet/facedatabase.html (Thomaz and Giraldi, 2010) and cropped. Because the ground truth of the real deformations is unknown, we evaluate results only using RMSE.
The results of best solutions and post-processed solutions are shown in Table 4. These results show that multi-objective approaches can outperform the single-objective approach in real-world situations. However, post-processing fails to improve estimation accuracy in a number of cases (e.g., the face image). The estimation results for each image are shown in Fig. 13. It can be observed that the deformed texture image contains some highlight areas and the eye area of the face image is also shadowed. Hence, the deformation results for these corresponding areas show worse accuracy than the other areas. The groups including these areas cannot contribute to the post-processing. As a limitation, the proposed method suffers from illumination changes due to the characteristics of the similarity measure.
|Best solution||Post-processed solution|
In this paper, we proposed a novel deformation estimation method using MOEAs to tackle the conflicts based on the fact that each control point of the deformation model affects a local region rather than a single pixel. Our method casts deformation estimation as a MOP by dividing a template image into several groups consisting of patches with group-wise similarity defined as group-wise objective functions, which can be solved by off-the-shelf MOEAs. To handle large deformations, optimization is run hierarchically following a coarse-to-fine strategy powered by image pyramid and control point mesh subdivision. Besides, a post-processing procedure is proposed to integrate Pareto optimal solutions into a single output, which can improve the estimation accuracy. The observations from experimental results can be summarized threefold. First, our partitioning approach with two-objective algorithms can obtain deformation parameters more accurately than GA with a single objective. Second, although the four-objective algorithm performs not as well as expected due to a large number of objectives, it shows to be effective in dealing with complex and subtle deformations. Third, the post-processing procedure can improve estimation accuracy in many cases. We can observe the usefulness of the proposed method with real-world images.
The main limitation of our method is that high computational resources are required. As future work, we would like to address this issue by further tuning the hyper-parameters that can reduce the computational cost without degrading the performance. We are also interested in the referenced point distribution of the NSGA-III. A user-supplied setting may be able to focus solutions on regions that are desirable for the post-processing procedure.
Funding This work is supported by JSPS KAKENHI Grant Number JP20K19568.
Conflicts of interest The authors declare that they have no competing interests.
- Multi-objective optimization for deformable image registration: proof of concept. In Medical Imaging 2012: Image Processing, Vol. 8314, pp. 594–600. External Links: Cited by: §2.1.
- Recent advances in mesh morphing. Computer Graphics Forum 21 (2), pp. 173–198. External Links: Cited by: §1.
- HypE: an algorithm for fast hypervolume-based many-objective optimization. Evolutionary Computation 19 (1), pp. 45–76. External Links: Cited by: §2.2.
- VoxelMorph: a learning framework for deformable medical image registration. IEEE Transactions on Medical Imaging 38 (8), pp. 1788–1800. External Links: Cited by: §1.
- Direct estimation of non-rigid registration. In Proceedings of the British Machine Vision Conference, pp. 92.1–92.10. External Links: Cited by: §2.1, §2.1.
- Deformable image registration using a cue-aware deep regression network. IEEE Transactions on Biomedical Engineering 65 (9), pp. 1900–1911. External Links: Cited by: §1.
- Recursively generated B-spline surfaces on arbitrary topological meshes. Computer-Aided Design 10 (6), pp. 350–355. External Links: Cited by: §5.2.
- PESA-II: region-based selection in evolutionary multiobjective optimization. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation, pp. 283–290. External Links: Cited by: §2.2.
- Non-rigid image registration: theory and practice. The British Journal of Radiology 77 (suppl_2), pp. S140–S153. External Links: Cited by: §1.
- Improved NSGA-III with selection-and-elimination operator. Swarm and Evolutionary Computation 49, pp. 23–33. External Links: Cited by: §2.2.
- An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints. IEEE Transactions on Evolutionary Computation 18 (4), pp. 577–601. External Links: Cited by: §2.2, §4.
- A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6 (2), pp. 182–197. External Links: Cited by: §2.2, §4, §5.1.
Ranking methods for many-objective optimization.
Mexican International Conference on Artificial Intelligence, pp. 633–645. External Links: Cited by: §2.2.
- Direct estimation of nonrigid registrations with image-based self-occlusion reasoning. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (1), pp. 87–104. External Links: Cited by: §1, §2.1, §5.2.
- Image registration by combining thin-plate splines with a 3D morphable model. In International Conference on Image Processing, pp. 1069–1072. External Links: Cited by: §2.1.
- Realistic cloth augmentation in single view video under occlusions. Computers & Graphics 34 (5), pp. 567–574. External Links: Cited by: §2.1, §5.2.
- Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor, MI. Cited by: §4.
- Direct manipulation of free-form deformations. ACM SIGGRAPH Computer Graphics 26 (2), pp. 177–184. External Links: Cited by: §1.
- Evaluation of optimization methods for nonrigid medical image registration using mutual information and B-splines. IEEE Transactions on Image Processing 16 (12), pp. 2879–2890. External Links: Cited by: §2.1.
- Scattered data interpolation with multilevel B-splines. IEEE Transactions on Visualization and Computer Graphics 3 (3), pp. 228–244. External Links: Cited by: §3.
- Many-objective evolutionary algorithms: a survey. ACM Computing Surveys 48 (1), pp. 1–35. External Links: Cited by: §2.2.
- Stochastic ranking algorithm for many-objective optimization based on multiple indicators. IEEE Transactions on Evolutionary Computation 20 (6), pp. 924–938. External Links: Cited by: §2.2.
- An evolutionary many-objective optimization algorithm based on dominance and decomposition. IEEE Transactions on Evolutionary Computation 19 (5), pp. 694–716. External Links: Cited by: §2.2.
- Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, pp. 91–110. External Links: Cited by: §2.1.
- An efficient unified approach to direct visual tracking of rigid and deformable surfaces. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2729–2734. External Links: Cited by: §2.1.
ASIFT: a new framework for fully affine invariant image comparison. SIAM Journal on Imaging Sciences 2 (2), pp. 438–469. External Links: Cited by: §2.1.
- A probabilistic bitwise genetic algorithm for B-spline based image deformation estimation. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 300–301. Cited by: §2.1.
- Application of evolutionary and swarm optimization in computer vision: a literature survey. IPSJ Transactions on Computer Vision and Applications 12 (3). External Links: Cited by: §2.2.
- Template-based monocular 3D shape recovery using Laplacian meshes. IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (1), pp. 172–187. External Links: Cited by: §2.1.
- Medical image registration: a review. Computer Methods in Biomechanics and Biomedical Engineering 17 (2), pp. 73–93. External Links: Cited by: §1.
- Fast non-rigid surface detection, registration and realistic augmentation. International Journal of Computer Vision 76, pp. 109–122. External Links: Cited by: §2.1.
- Evolutionary multi-objective meta-optimization of deformation and tissue removal parameters improves the performance of deformable image registration of pre- and post-surgery images. In Medical Imaging 2019: Image Processing, Vol. 10949, pp. 838–848. External Links: Cited by: §2.1.
- Feature-based deformable surface detection with self-occlusion reasoning. International Journal of Computer Vision 97, pp. 54–70. External Links: Cited by: §2.1, §2.1.
- Nonrigid registration using free-form deformations: application to breast MR images. IEEE Transactions on Medical Imaging 18 (8), pp. 712–721. External Links: Cited by: §1.
- Linear local models for monocular reconstruction of deformable surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (5), pp. 931–944. External Links: Cited by: §2.1.
- Objective reduction in many-objective optimization: linear and nonlinear algorithms. IEEE Transactions on Evolutionary Computation 17 (1), pp. 77–99. External Links: Cited by: §2.2.
- Face recognition systems under morphing attacks: a survey. IEEE Access 7, pp. 23012–23026. External Links: Cited by: §1.
- Non-rigid registration using free-form deformations. Ph.D. Thesis, Technische Universität München. Cited by: §3.
- Free-form deformation of solid geometric models. ACM SIGGRAPH Computer Graphics 20 (4), pp. 151–160. External Links: Cited by: §1.
- Deformable template tracking in 1ms. In Proceedings of the British Machine Vision Conference, External Links: Cited by: §1, §2.1.
A new ranking method for principal components analysis and its application to face image analysis. Image and Vision Computing 28 (6), pp. 902–913. External Links: Cited by: §6.2.
- In defence of RANSAC for outlier rejection in deformable registration. In European Conference on Computer Vision, pp. 274–287. External Links: Cited by: §2.1.
- Survey of multi objective evolutionary algorithms. In International Conference on Circuits, Power and Computing Technologies, pp. 1–9. External Links: Cited by: §2.2.
- Objective reduction based on nonlinear correlation information entropy. Soft Computing 20 (6), pp. 2393–2407. External Links: Cited by: §2.2.
- Deformable surface tracking by graph matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 901–910. External Links: Cited by: §1, §2.1.
- Tracking partially-occluded deformable objects while enforcing geometric constraints. arXiv preprint arXiv:2011.00627. Cited by: §1.
- Multiple non-rigid surface detection and registration. In IEEE International Conference on Computer Vision, pp. 1992–1999. External Links: Cited by: §2.1.
- MOEA/HD: a multiobjective evolutionary algorithm based on hierarchical decomposition. IEEE Transactions on Cybernetics 49 (2), pp. 517–526. External Links: Cited by: §2.2.
- A new dominance relation-based evolutionary algorithm for many-objective optimization. IEEE Transactions on Evolutionary Computation 20 (1), pp. 16–37. External Links: Cited by: §2.2.
- Fast affine template matching over galois field.. In BMVC, pp. 121.1–121.11. Cited by: §2.1.
- Robust projective template matching. IEICE TRANSACTIONS on Information and Systems 99 (9), pp. 2341–2350. External Links: Cited by: §2.1.
- MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation 11 (6), pp. 712–731. External Links: Cited by: §2.2.
- Enhancing MOEA/D with information feedback models for large-scale many-objective optimization. Information Sciences 522, pp. 1–16. External Links: Cited by: §2.2.
- A fast 2D shape recovery approach by fusing features and appearance. IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (7), pp. 1210–1224. External Links: Cited by: §2.1.
- SPEA2: improving the strength Pareto evolutionary algorithm for multiobjective optimization. In Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems, pp. 95–100. Cited by: §2.2.