Image stitching for imperfect image series is a challenging problem which has gained much progress in recent years [1, 2]. Generally, there are two ways to address this problem. One way is to propose an alignment technique (image warping) that aligns the images as accurate as possible [3, 4, 5, 6, 7]. Another way called seam-driven (or seam-guided) is to utilize the seam-cutting [8, 9] to find a most invisible seam in the overlapping region from one or finite alignment hypotheses [10, 11, 12, 13]. The first way aims to generate a geometrically accurate result, which may fail to be effective when the input images have parallax or other issues. Thus seam-driven becomes the critical way to produce convincing stitching results.
The seam-driven strategy for image stitching is first proposed by Gao et al. . They applied seam-cutting to estimate multiple seams from a finite alignment hypotheses. Then a seam quality metric is defined to evaluate these seams and the final result is produced from the seam with the minimal measure. This pipeline is adopted in many other seam-driven methods [11, 12]. However, their quality metrics are defined to measure the average performance of the pixels on the seam without considering the relevance and variance among them. This may cause that the seam with the minimal measure is not optimal in human perception. Fig. 1 shows a comparison example where two seams are measured. It is worth noting that there are some inaccurate measurements (false positives) for the pixels on the seam. In fact, it is difficult to define a single quality metric to precisely evaluate the stitching seams, since in the seam-driven strategy, two seams can be equally convincing in human perception despite their distinct quality metric. This motivates us to develop a seam estimation method which can find the perceptually optimal seam given one alignment hypothesis.
In this paper, we propose a coarse-to-fine seam estimation method for image stitching. From the perspective of seam evaluation, we observe that a perceptually optimal seam should have a relatively small quality measure as well as a small variance of the pixels on the seam (see Fig. 3(d)). Our coarse-to-fine strategy has two main steps. In the first step, given aligned images, we estimate a stitching seam via the conventional seam-cutting where the energy function is calculated based on the original difference map. In the second step, we introduce a patch-point evaluation algorithm to evaluate the pixels on the seam, the evaluations are then used to recalculate the difference map and reestimate a stitching seam. The two processes iterate until the current seam changes negligibly comparing with the previous seams. Experiments show that our method outperforms the conventional seam-cutting and other seam-driven methods.
Ii Related Work
In recent years, many efforts have been devoted to seam-cutting or seam-driven to address the complex scenes and issues in image stitching. Seam-cutting is proposed to handle the imperfect image series which aims to estimate an invisible seam between overlapping images such that the images can be seamlessly blended together. Most seam-cutting methods formulate the seam estimation as the energy minimization of a labeling problem, and minimize the energy via graph cuts . Various energy functions were defined in their work to handle specific issues [8, 9, 15, 16, 7, 13]. Our method takes the conventional seam-cutting as the initial seam estimation method.
The seam-driven methods then incorporate the seam-cutting approaches in their framework. Gao et al.  indicated that the perceptually best result is not necessarily from the best global alignment. To find the best result from multiple seams, they defined a seam quality metric to measure the stitching seams. Zhang and Liu  improved this strategy by combing homogarphy and content-preserving warps  to locally align images and generate better alignment hypotheses, where the seam cost is used as the quality metric. Lin et al.  proposed to generate the alignment hypotheses via a superpixel-based feature grouping and a seam-guided structure-preserving warp, where the warp is iteratively improved by adaptive feature weighting. They also defined a quality metric based on the ZNCC (zero-mean normalized cross correlation) score, which was also used in . All these quality metrics in seam-driven are defined to evaluate the average performance of the pixels on the seam, it may cause that the seam with the minimal measure is not optimal in human perception.
Our method adopts the seam evaluation in a different way. Instead of defining a seam quality metric to find the seam of best performance from multiple seams, we propose a evaluation algorithm concentrating more on the correlation and variation of the pixels on the seam. The evaluation is then applied into our coarse-to-fine seam estimation strategy.
Iii Coarse-to-fine Seam Estimation
In this section, we first give a brief description about the conventional seam-cutting method and propose our patch-point evaluation algorithm, including a patch evaluation and a point evaluation. Then, we develop an iterative evaluation-reestimation procedure and summarize our coarse-to-fine seam estimation framework in the end.
Iii-a Conventional Seam-cutting
For two-image stitching, we use and to denote the aligned reference and target images, is their overlapping region and is a label set. A seam means assigning a label to each pixel where “0” corresponds to and “1” corresponds to . Seam-cutting aims to find a labeling (i.e., a map from to ) that minimizes the energy function
where is a neighborhood system of pixels. The smoothness term is defines as
where denotes the color difference map. The data term measures the penalty of assigning pixel with label , we refer to  for more details.
Iii-B Patch-point Evaluation Algorithm
To evaluate the stitching seam, a ZNCC-based method was proposed by  and further used in . For each pixel on the seam, they extract a local patch centered at and compute the ZNCC score between the local patch in the target image and that in the reference image . The seam quality is defined as
where is the total number of pixels on the seam. As shown in Fig. 1, such quality measures the average performance of these pixels without considering the relevance and variance among them. It may cause that the seam with the minimal measure is not optimal in human perception.
Despite the difficulties of defining a precise seam quality metric, we can still use this strategy to evaluate the pixels on the seam. Generally, the patch differences have a good “continuity” property while the point differences have a nice “diversity” property (see Fig. 2). Thus, we combine the patch and point together to evaluate the seam.
Iii-B1 Patch evaluation
As the misalignment artifacts usually occur as structural inconsistency in the overlapping region, we use the SSIM (structural similarity) index  instead of ZNCC to compare the local patches in the two images. Experiments also show the superior robustness of SSIM. The patch evaluation on pixel is defined as
The SSIM index is a decimal value between and , and value is only reachable if the two local patches are identical. Thus, a misaligned pixel on the seam usually possesses a relatively large value of patch evaluation.
Iii-B2 Point evaluation
For parallax issues in imperfect image series, a single patch evaluation is not enough to provide a precise evaluation for the pixels on the seam. Sometimes it will create false positives, which gives some well-aligned pixels a relatively large value of patch evaluation (see Fig. 2(a)). We add a point evaluation for the pixels to improve the evaluation algorithm. The point evaluation on pixel is defined as
where and are adjacent in the overlapping region and . The point evaluation measures the color difference between the pixels on the two sides of the seam. Thus, a plausible seam would have a relatively small value of point evaluation for (nearly) all pixels on the seam. This can avoid the false positives in the patch evaluation.
Iii-B3 Evaluation algorithm
To investigate the correlation and variation between these pixels, we take the evaluations as signals where the -axis is the order of pixels along the seam (see Fig. 2). We smooth out the signals with a wavelet denoising tool to eliminate the effect of the invisible misalignments. An alternative way is to smooth out the original aligned images via Gaussian filter, we experimentally find that the wavelet denoising way is more effective.
Generally, a misaligned pixel on the seam would simultaneously possess a large value of patch and point evaluation. We define the evaluation for as follows,
where is added to maintain the scale of the evaluation. Fig. 2 shows an example of our patch-point evaluation algorithm on a stitching seam, where the evaluation for each pixel is shown as a hot map. We can see that the evaluations are nearly consistent with the human perception.
Iii-C Seam Estimation Refinement
We then utilize the patch-point evaluation to iteratively refine our seam estimation. In general, a misaligned pixel on the seam would possess a large value of patch-point evaluation, on the contrary, a relatively large value of the patch-point evaluation usually means a potential misaligned pixel. Thus, in the seam refinement, we increase the smoothness costs of the potential misaligned pixels by modifying the difference map with
Then, the difference map of the overlapping region is turning into
We use to denote a banding area containing the seam, which is generated by expanding the seam for 5 pixels on its each side. For pixel , is set to be the patch-point evaluation of its nearest pixel on the seam. The difference map is recalculated in the banding area for efficiency and robustness.
We then recalculate the energy function with the new difference map and reestimate a stitching seam. The evaluation-reestimation procedure iterates until the current seam changes negligibly comparing with the previous seams. Here “negligibly” means that the current seam can be totally contained in the previous banding areas. For a reasonable initial seam, this procedure usually terminates within 5 iterations. Finally, we obtain a stitching seam and the final result is generated by applying the gradient domain fusion  on the seam.
Fig. 3 shows a stitching example of the seam estimation refinement where the seam in each iteration is shown as a hot map. The initial estimated seam suffers from the artifacts of structural inconsistency as it passes through the misaligned regions. With several iterations, we can obtain a perceptually convincing seam.
Iii-D Proposed Coarse-to-fine Framework
We summarize our coarse-to-fine seam estimation framework in Algorithm 1.
In our experiments, the patch size in the patch evaluation is set to be , in (7) equals , and in (8) are set to be and respectively. We use SIFT  and RANSAC  to find the feature correspondences between input images. Global homography or other available warps [5, 12] are then estimated to align the images. Finally, our coarse-to-fine seam estimation method is adopted to estimate the final seam and the final result is generated by blending the aligned images via gradient domain fusion .
We compare our method with the conventional seam-cutting and other seam-driven methods [11, 12]. The comparisons are done on public available datasets including Parallax  and SEAGULL . All comparison results are provided in the supplementary material.
Fig. 4 shows some comparisons between different stitching methods. Input images are from . The conventional seam-cutting, SEAGULL and our method adopt the same image alignment provided by SEAGULL. Parallax, SEAGULL and conventional seam-cutting suffer from the visual artifacts of structural inconsistency, as shown in red rectangle. Our method can finally produce convincing results in human perception.
In this paper, we propose a coarse-to-fine seam estimation method to handle the imperfect image series in image stitching. Comprehensive experiments demonstrate that our method can finally find a nearly perception-consistent stitching seam after several iterations, which outperforms the conventional seam-cutting and other seam-driven methods.
-  R. Szeliski, “Image alignment and stitching: A tutorial,” Found. Trends Comput. Graph. Vis., vol. 2, no. 1, pp. 1–104, 2006.
-  M. Brown and D. G. Lowe, “Automatic panoramic image stitching using invariant features,” Int. J. Comput. Vis., vol. 74, no. 1, pp. 59–73, 2007.
J. Gao, S. J. Kim, and M. S. Brown, “Constructing image panoramas using
dual-homography warping,” in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2011, pp. 49–56.
-  W.-Y. Lin, S. Liu, Y. Matsushita, T.-T. Ng, and L.-F. Cheong, “Smoothly varying affine stitching,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2011, pp. 345–352.
-  J. Zaragoza, T.-J. Chin, Q.-H. Tran, M. S. Brown, and D. Suter, “As-projective-as-possible image stitching with moving dlt,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 7, no. 36, pp. 1285–1298, 2014.
Y.-S. Chen and Y.-Y. Chuang, “Natural image stitching with the global
similarity prior,” in
Proc. 14th Eur. Conf. Comput. Vision, 2016, pp. 186–201.
-  G. Zhang, Y. He, W. Chen, J. Jia, and H. Bao, “Multi-viewpoint panorama construction with wide-baseline images,” IEEE Trans. Image Processing, vol. 25, no. 7, pp. 3099–3111, 2016.
-  V. Kwatra, A. Schödl, I. Essa, G. Turk, and A. Bobick, “Graphcut textures: image and video synthesis using graph cuts,” ACM Trans. Graph., vol. 22, no. 3, pp. 277–286, 2003.
-  A. Agarwala, M. Dontcheva, M. Agrawala, S. Drucker, A. Colburn, B. Curless, D. Salesin, and M. Cohen, “Interactive digital photomontage,” ACM Trans. Graph., vol. 23, no. 3, pp. 294–302, 2004.
-  J. Gao, Y. Li, T.-J. Chin, and M. S. Brown, “Seam-driven image stitching,” Eurographics, pp. 45–48, 2013.
-  F. Zhang and F. Liu, “Parallax-tolerant image stitching,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2014, pp. 3262–3269.
-  K. Lin, N. Jiang, L.-F. Cheong, M. Do, and J. Lu, “Seagull: Seam-guided local alignment for parallax-tolerant image stitching,” in Proc. 14th Eur. Conf. Comput. Vision, 2016, pp. 370–385.
-  N. Li, T. Liao, and C. Wang, “Perception-based seam cutting for image stitching,” Signal, Image and Video Processing, pp. 1–8, 2018, doi: 10.1007/s11760-018-1241-9.
-  Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 11, pp. 1222–1239, Nov. 2001.
-  A. Eden, M. Uyttendaele, and R. Szeliski, “Seamless image stitching of scenes with large motions and exposure differences,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., vol. 2, 2006, pp. 2498–2505.
-  J. Jia and C.-K. Tang, “Image stitching using structure deformation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 4, pp. 617–631, Apr. 2008.
-  F. Liu, M. Gleicher, H. Jin, and A. Agarwala, “Content-preserving warps for 3d video stabilization,” in ACM Trans. Graph., vol. 28, no. 3. ACM, 2009, p. 44.
-  Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
-  P. Pérez, M. Gangnet, and A. Blake, “Poisson image editing,” ACM Trans. Graph., vol. 22, no. 3, pp. 313–318, 2003.
-  D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
-  M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981.