Image stitching is a well studied topic in computer vision, which mainly consists of alignment [2, 3, 4, 5], composition [6, 7, 8, 9, 10] and blending [11, 12, 13]. In consumer-level photography, it is difficult to achieve perfect alignment due to unconstrained shooting environment, so image composition becomes the most crucial step to produce artifacts-free results.
Seam-cutting [14, 15, 16, 17, 18] is a powerful composition method, which intends to find an invisible seam in the overlapping region of aligned images. Mainstream algorithms usually express the problem in terms of energy minimization and minimize it via graph-cut optimization [19, 20, 21]. Normally, for a given overlapping region of aligned images, different energy functions correspond to different seams, and certainly correspond to different composed results (see Fig. 1). Conversely, in order to obtain a plausible stitching result, we desire to define a perception-consistent energy function, such that the most invisible seam possesses the minimum energy.
Recently, many efforts have been devoted to seam-cutting by penalizing the photometric difference using various energy functions. A Euclidean-metric color difference is used in  to define the smoothness term in their energy function, and a gradient difference is taken into account in . Eden et al.  proposed an energy function that allows for large motions and exposure differences, but the camera setting is required. Jia and Tang  associated the smoothness term with gradient smoothness and gradient similarity, to reduce structure complexity along the seam. Zhang et al.  combined alignment errors and a Gaussian-metric color difference in their energy function, to handle misaligned areas with similar colors. However, few of existing methods consider human perception in their energy functions, which sometimes causes that a seam with minimum energy is not most invisible in the overlapping region.
Seam-cutting has also been applied in image alignment. Gao et al.  proposed a seam-driven image stitching framework, which finds a best homography warp from some candidates with minimal seam costs instead of minimal alignment errors. Zhang and Liu  combined homography and content-preserving warps to locally align images, where seam costs are used as a quality metric to predict how well a homography enables plausible stitching. Lin et al.  proposed a seam-guided local alignment, which iteratively improves warping by adaptive feature weighting according to their distances to current seams.
In this paper, we propose a novel seam-cutting method via a perception-based energy function, which takes the nonlinearity and the nonuniformity of human perception into account. Our proposed method consists of three stages (see Fig. 2). In the first stage, we calculate a sigmoid-metric color difference of the given overlapping region as the smoothness term, to characterize the perception of color discrimination. Then, we calculate an average pixel saliency of the given overlapping region as the saliency weight, to simulate that human eyes incline to pay more attention to salient objects. Finally, we minimize the perception-based energy function by the graph-cut optimization, to obtain the seam and the corresponding composed result. Experiments show that our method outperforms the seam-cutting method of the normal energy function, and a user study demonstrates that our composed results are more consistent with human perception.
Major contributions of the paper are summarized as follows.
We proposed a novel perception-based energy function in the seam-cutting framework.
Our composition method can be easily implemented into other stitching pipelines.
In this section, we first show more details of the normal seam-cutting framework, then a novel perception-based energy function is described, and finally we propose our seam-cutting framework.
Ii-a Normal Seam-cutting Framework
Given a pair of aligned images denoted by and , let be their overlapping region and be a label set, where “” corresponds to and “” corresponds to , then a seam means assigning a label to each pixel . The goal of seam-cutting is to find a labeling (i.e., a map from to ) that minimizes the energy function
where is a neighborhood system of pixels. The data term represents the cost of assigning a label to a pixel , and the smoothness term represents the cost of assigning a pair of labels to a pair of pixels .
The data term is defined as
where is a very large penalty to avoid mislabeling, is the common border of () and (marked in red and blue respectively in Fig. 1(a)). In fact, the data term fixes the endpoints of the seam as the intersections of the two colored polylines.
The smoothness term is defined as
where denotes the Euclidean-metric color difference (see Fig. 2(b)).
Ii-B Perception-based Energy Function
In experiments, the seam denoted by , that minimizes the normal energy function (1) is sometimes not most invisible in . In other words, there exists a seam denoted by , that is more invisible but has a greater energy than (see Fig. 2 (e) and (f)). Therefore, we desire to define a perception-consistent energy function, such that the most invisible seam possesses the minimum energy.
Ii-B1 Sigmoid metric
Fig. 3 shows a toy example where is not most invisible. In fact, the seam shown in (b) crosses the local misalignment area (marked in light blue in (a)), because the Euclidean-metric color difference does not give it a large enough penalty. In contrast, the seam shown in (d) avoid the local misalignment area (marked in red in (c)), because the sigmoid-metric color difference successfully distinguish it from the alignment area.
In particular, the perception of colors is nonlinear as it has a color discrimination threshold, which means human eyes cannot differentiate some colors from others even if they are different. Let denote the threshold, the perception of color discrimination can be characterized as
if , color difference is invisible,
if , sensitivity of discrimination rises rapidly,
if , color difference is visible.
We want to define a quality metric to measure the visibility of color difference, such that the cost of invisible terms approximates zero while the cost of visible terms approximates one. Fortunately, the sigmoid function
is a suitable quality metric for our purpose.
Next, we will show how to determine the parameters and . Briefly, given an overlapping region of aligned images, the threshold plays the role of roughly dividing into an alignment area and a misalignment area by its color difference, which is similar to determine a threshold to divide a binary image into a background region and a foreground region. Thus, we employ the well-known Ostu’s algorithm  to determine a suitable
with the maximum between-class variance. On the other hand,represents how rapidly the sensitivity of color discrimination rises around . Normally, will have a good practical performance, where is the width of bins of the histogram used in Ostu’s algorithm.
Now, the smoothness term is modified as
where denotes the sigmoid-metric color difference. Fig. 2(c) shows that forces the misalignment area more distinguishable from the alignment area than , which effectively helps the seam avoid crossing the misalignment area.
Ii-B2 Saliency weights
Fig. 4 shows another toy example where is not most invisible. In fact, seams and shown in (b) and (d) respectively, both cross the local misalignment area. Though the energy of is greater, it is more invisible than in aspect of human perception, because the location where its artifact arises is less remarkable than .
In particular, the perception of images is nonuniform, which means that human eyes incline to pay more attention to salient objects. Thus artifacts in salient regions are more remarkable than artifacts in non-salient regions. In order to benefit from these observations, we define a saliency weight
where denotes the average pixel saliency of (see Fig. 2(d)). We normalize in the range of to avoid over-penalizing saliency weights. As stitching results are usually cropped into rectangles in consumer-level photography, we assign if either or is located in the common border of the canvas and (marked in green in Fig. 2(a)).
Ii-C Proposed Seam-cutting Framework
Our seam-cutting framework is summarized in Algorithm 1.
In our experiments, first, we use SIFT  to extract/match features, use RANSAC  to determine a global homography and align input images. Then, for the overlapping region, we use Ostu’s algorithm 
to estimate a threshold(), and use salient object detection  to calculate pixel saliency weights. Finally, we use graph-cut optimization  to obtain a seam, and blend aligned images via gradient domain fusion  to create a mosaic.
Fig. 5 shows some experimental comparisons between two seam-cutting frameworks. Input images in the second group come from the dataset in . Due to unconstrained shooting environment, there exist large parallax in these examples, such that a global homography can hardly align them. In such cases, the normal seam-cutting framework fails to produce artifact-free results, while our perception-based seam-cutting framework successfully creates plausible mosaics. More results and original input images are available in the supplementary material.
In order to investigate whether our proposed method is more consistent with human perception, we conduct a user study for comparing two seam-cutting frameworks. We invite 15 participants to rank 15 unannotated groups of stitching results (make a choice from 3 options: 1. A is better than B, 2. B is better than A, 3. A and B are even). Fig. 6 shows the user study result, which demonstrates that our stitching results win most users’ favor.
In this paper, we propose a novel perception-based energy function in the seam-cutting framework, to handle image stitching challenges in consumer-level photography. Experiments show that our method outperforms the seam-cutting method of the normal energy function, and a user study demonstrates that our results are more consistent with human perception. In the future, we plan to generalize our method in the seam-driven framework to deal with image alignment.
-  R. Szeliski, “Image alignment and stitching: A tutorial,” Technical Report MSR-TR-2004-92, Microsoft Research, 2004.
-  R. Szeliski and H.-Y. Shum, “Creating full view panoramic image mosaics and environment maps,” in Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, ser. SIGGRAPH ’97. ACM Press/Addison-Wesley Publishing Co., 1997, pp. 251–258.
-  M. Brown and D. G. Lowe, “Automatic panoramic image stitching using invariant features,” Int. J. Comput. Vision, vol. 74, no. 1, pp. 59–73, 2007.
J. Gao, S. J. Kim, and M. S. Brown, “Constructing image panoramas using
dual-homography warping,” in
Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2011, pp. 49–56.
-  J. Zaragoza, T.-J. Chin, M. S. Brown, and D. Suter, “As-projective-as-possible image stitching with moving DLT,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2013, pp. 2339–2346.
-  S. Peleg, “Elimination of seams from photomosaics,” Computer Graphics and Image Processing, vol. 16, no. 1, pp. 90–94, 1981.
-  M.-L. Duplaquet, “Building large image mosaics with invisible seam lines,” in Aerospace/Defense Sensing and Controls. International Society for Optics and Photonics, 1998, pp. 369–377.
-  J. Davis, “Mosaics of scenes with moving objects,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 1998, pp. 354–360.
-  A. A. Efros and W. T. Freeman, “Image quilting for texture synthesis and transfer,” in Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, ser. SIGGRAPH ’01. ACM, 2001, pp. 341–346.
-  A. Mills and G. Dudek, “Image stitching with dynamic elements,” Image and Vision Computing, vol. 27, no. 10, pp. 1593–1602, 2009.
-  P. J. Burt and E. H. Adelson, “A multiresolution spline with application to image mosaics,” ACM Transactions on Graphics, vol. 2, no. 4, pp. 217–236, 1983.
-  P. Pérez, M. Gangnet, and A. Blake, “Poisson image editing,” ACM Transactions on Graphics, vol. 22, no. 3, pp. 313–318, 2003.
-  A. Levin, A. Zomet, S. Peleg, and Y. Weiss, “Seamless image stitching in the gradient domain,” in Proc. 8th Eur. Conf. Comput. Vision, 2004, pp. 377–389.
-  V. Kwatra, A. Schödl, I. Essa, G. Turk, and A. Bobick, “Graphcut textures: image and video synthesis using graph cuts,” ACM Transactions on Graphics, vol. 22, no. 3, pp. 277–286, 2003.
-  A. Agarwala, M. Dontcheva, M. Agrawala, S. Drucker, A. Colburn, B. Curless, D. Salesin, and M. Cohen, “Interactive digital photomontage,” ACM Transactions on Graphics, vol. 23, no. 3, pp. 294–302, 2004.
-  A. Eden, M. Uyttendaele, and R. Szeliski, “Seamless image stitching of scenes with large motions and exposure differences,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., vol. 2, 2006, pp. 2498–2505.
-  J. Jia and C.-K. Tang, “Image stitching using structure deformation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 4, pp. 617–631, Apr. 2008.
-  G. Zhang, Y. He, W. Chen, J. Jia, and H. Bao, “Multi-viewpoint panorama construction with wide-baseline images,” IEEE Transactions on Image Processing, vol. 25, no. 7, pp. 3099–3111, 2016.
-  Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 11, pp. 1222–1239, Nov. 2001.
-  Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 9, pp. 1124–1137, Sept. 2004.
-  V. Kolmogorov and R. Zabin, “What energy functions can be minimized via graph cuts?” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 2, pp. 147–159, Feb. 2004.
-  J. Gao, Y. Li, T.-J. Chin, and M. S. Brown, “Seam-driven image stitching,” Eurographics, pp. 45–48, 2013.
-  F. Zhang and F. Liu, “Parallax-tolerant image stitching,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2014, pp. 3262–3269.
-  K. Lin, N. Jiang, L.-F. Cheong, M. Do, and J. Lu, “Seagull: Seam-guided local alignment for parallax-tolerant image stitching,” in Proc. 14th Eur. Conf. Comput. Vision, 2016, pp. 370–385.
-  N. Otsu, “A threshold selection method from gray-level histograms,” Automatica, vol. 11, no. 285-296, pp. 23–27, 1975.
-  J. Zhang, S. Sclaroff, Z. Lin, X. Shen, B. Price, and R. Mech, “Minimum barrier salient object detection at 80 fps,” in Proc. IEEE Int. Conf. on Comput. Vision, 2015, pp. 1404–1412.
-  D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vision, vol. 60, no. 2, pp. 91–110, 2004.
-  M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981.