Photo collages are created by placing a number of photo images on a canvas area of limited size. They are used to visually represent in an appealing and compact way events of interest. The images can be fitted on the canvas by simply scaling them at the risk of losing important details contained in them and making the collage dull. In this paper we consider the problem of how to automatically create pleasing photo collages: given a set of photo images and a canvas area, we want to arrange the photos on the canvas in a pleasant unsupervised manner and without scaling them (see Figure 1). Assuming that the size of the canvas area is smaller than the sum of the sizes of the photos to be displayed, two main issues arise. The first is that photos may occlude themselves, the second is that photos may be partially outside the canvas area. These issues must be addressed by taking into account the pleasantness of the resulting collage that is influenced by the order with which the photos are placed on the canvas and their spatial arrangement. Usually, the most important photos are placed at the top of the less important ones in order to minimize the risk of being severely occluded, and composition properties related to photo contents, geometric constraints and aesthetic consideration are taken into account to maximize the pleasantness of the resulting collage. The criteria that define what is important in a photo and what composition properties should be satisfied may vary from user to user. Moreover the single criteria may compete against each other. To be used in an automatic system for photo collage generation, the pleasantness criteria and their relative importance must be properly quantified using suitable algorithms. At the end of this process a fitness function can be defined whose value represents the overall degree of pleasantness of a photo collage. To obtain the most pleasant collage, an automatic algorithm must search the best arrangement of the photos by maximizing the value of the fitness function. For this purpose an optimization algorithm is usually exploited. Several formulations of some of the above criteria have been proposed in the literature but none of the existing works performed an user study in order to actually determine what are the criteria that made a photo important, what constraints must be satisfied in order to have a collage balanced, or what hints users pay attention to in judging the pleasantness of a photo collage. We argue that if we could elicit the criteria by modeling the preferences of the users, we would be able to create more pleasant photo collages.
1.1 Related Work
Previous works on photo collage can be categorized into two main groups depending on the processing applied to the photos. These two groups are: content-preserving and non content-preserving.
In the non content-preserving group belong the photo collage methods that select relevant regions within the photos in order to maximize the information that is conveyed in the final collage. The methods ensure that these regions are made visible in the final collage while the less relevant regions are either removed (cropping) or hidden by other, more relevant, ones (hiding). In addition to scaling and translation, these methods usually perform a layering of the photos to decide the order with which they are positioned on the canvas and/or rotate the photos to further preserve their content as much as possible.
Among the methods that also apply a rotation operation on the photos, we find Picture Collage [37, 26] that is the one of the first works that formalized the problem of photo collage as an optimization problem using different, competing collage criteria, namely image saliency, blank space, and saliency raio balance. Inspired by this work is the collage strategy proposed in 
which uses the same criteria but images are firstly classified into three categories and then different relevant region detection strategies are adopted on the basis of the image category. Also inspired by the work of are the improved collage strategies proposed by  and  where the collages can be also interactively modified by the user. A recent photo collage approach 
uses a heuristic search process to ensure that salient information of each photo is displayed in the polygonal area resulting from a power-diagram-based circle packing algorithm. Most of the previous approaches use a saliency map, solely or coupled with other descriptors, as informativeness criteria. In instead, the informativeness criteria corresponds to foreground objects detected on depth maps. Finally, differently from all the aforementioned approaches, the method proposed in  creates Arcimboldo-like collages with multiple thematically-related cutouts from filtered Internet images.
The stained glass-like photo collage by  is one of the methods that preserve the photo orientation without rotating them. The photos are cropped with respect to the contained face regions. These cropped regions have straight edges that are used to arrange the photos on the canvas. Digital Tapestry  subdivides the photo into a set of sub-blocks and from them, the relevant regions of the photo are reconstructed and merged together. A pixel-based variant of this approach, named AutoCollage, is described in . Here the relevant regions, with variable shapes are merged with a seamless blending that ensures that no sharp boundaries between them are formed in the final collage. A similar approach is the Mobile Photo Collage presented in . The Puzzle-Like collage  instead, cuts out from each photo an irregular shaped region which follows the area surrounding a relevant object within the image. Finally, we can cite the Dynamic Media Assemblage , a photo collage approach that can be used to summarize video content as well as a photo collection in a stained glass-like collage.
In the content-preserving group belong those methods that arrange the photos according to the relevance of their content defined in some way. The only operations performed on them are scaling and translation. Usually the most relevant photos are scaled bigger than the less relevant ones, and they are positioned on the most salient regions of the canvas. Moreover the aspect ratio of the photos is preserved. These methods are also referred as photo layout methods.
An example is the work of  where the photo layout is constructed using a larger topic photo and several small-size supportive photos. The photos are selected and sized according to their temporal and content coherence. A similar approach is exploited in  on video sequences where key-frames in a visual summary are arranged on the canvas using the narrative grammar of comics. Also within this group we can cite the work of  where exclusion zones are used to layout a set of photos on a canvas using different spatial criteria. This method was further improved in . In  spatial criteria are coupled with aesthetic principles to layout photos in a pleasant composition. Recently, taking advantage of information usually found in social networking, and building on the previous PicWall work , FriendWall () uses social attributes (intrinsic labels) to create photo collage employing both image visual features and associated Metadata. As a final example, we can cite the interactive approach  where pre-designed layout templates of annotated cells are used to arrange the photos according to their metadata, and focus area can be selected by the user.
1.2 Paper Contribution and Organization
The focus of this work is to exploit subjective experiments to model user preferences in order to learn what criteria (and to what extent) need to be taken into account to automatically generate a pleasing photo collage. To this end we designed an experimental framework that incorporates the identification of the criteria via user preference modeling, the implementation of the corresponding computational algorithms, the learning of their relative importance, and the validation of the results. We applied our framework in the context of non content-preserving collages. We believe that this category permits to investigate more criteria underlying the definition of pleasantness as the associated problem has more degrees of freedom than the one associated to content-preserving collages. However, our proposed framework can be adapted to these methods as well. The different steps of our framework are depicted in Figure2. A first subjective experiment is conducted to investigate how different criteria are involved in the user subjective definition of pleasantness. For this experiment, we redefine the three basic criteria (image informativeness, canvas area coverage, and information ratio balance) exploited in most of the works in the state-of-the-art (e.g. [37, 2]). We evaluate three different representations of image informativeness: the first one, which is usually used in the state-of-the-art, is based on saliency; the other two are based on quality and color harmony respectively, and are here introduced. Collages are created by exploiting a Direct Search optimization algorithm. Since user image collections are of very different contents, and different contents may lead to different pleasantness criteria, we considered five thematic image datasets. The results obtained from this experiment are used to identify new criteria both at global and local level. The new global criteria are: face ratio, axis alignment, centrality, and convexity; the new local criteria are: color similarity, orientation diversity and minimum orientation difference. After having developed algorithms to compute these new criteria, their relative importance is learned by exploiting user rankings on the previously created collages. The identified criteria and their learned importance are then used to generate new sets of collages that are evaluated by a new panel of users. To further validate the proposed framework, we performed three additional experiments. In order to verify if the identified criteria and their learned relative importance generalize well, that is, if they can be used to create collage on unseen image sets, we performed a subjective experiments on six other image collections of different contents with respect to the ones used in the previous experiments. We also tested the generalizability of the learned definition of pleasantness by creating collages varying the number of images in the set and the canvas size. Moreover, we compared the performance of our proposal against two state-of-the-art algorithms. To the best of our knowledge this is the first work which extensively exploits subjective experiments within the collage generation process to learn user preferences, and that uses datasets of images of different contents to validate the proposed approach.
The rest of the paper is organized as follows. The problem formulation is mathematically described in Section 2 along with the description of the basic criteria. Section 3 illustrates the collage generation by describing the three different importance maps considered in our experiments, the photo datasets used, and the optimization algorithm responsible for the collage creation. The first subjective experiment and its outcomes are described in Section 4. The set of the new criteria derived from the first experiment is described in Section 5, while the user preferences modeling and learning strategy is detailed in Section 6. Results of the second subjective experiment performed on the newly created collages are illustrated in Section 7. The generalizability of the learned definition of pleasantness and the comparison with state of the art methods on new datasets are reported in Section 8. Finally Section 9, concludes the paper.
2 Problem Formulation and Basic Criteria Definition
Given input photo images and their corresponding importance maps (importance map representations will be discussed in the next section), a photo collage algorithm must arrange all the images on a canvas area . In a photo collage, each image is characterized by its state , where
is the 2D translation vector (w.r.t. the canvas origin),is the orientation angle (w.r.t. the x-axis), and is the layering index used to determine the placement order of the image. The state is used in a roto-translation transformation to position the image (and its importance map) on the canvas area:
The layering indexes can be manually or automatically assigned according to some heuristics. We compute the layering indexes on the basis of the 2D integrals of the importance maps : images with higher importance maps are placed on top layers, while images with lower importance maps are placed on bottom layers. An example of the procedure used for photo collage layering and compositing is reported in Figure 3.
The picture collage creation is formulated as an optimization problem in order to find the best configuration of states which optimizes all the criteria considered.
2.1 Basic Criteria Definition
Most of the existing photo collage methods (e.g. ) exploit the three “basic criteria” listed in Table 2.1. These criteria are quantified by the functions . The functions are parametrized by the configuration of states , and take as data the set of transformed images , the set of transformed importance maps , and the canvas . In the following we write the functions as dropping the dependencies for a more compact notation.
Visibility The overall collage visibility is the average of all information ratios (based on an importance map) computed on the visible regions of the images:
where is a function that computes the visible parts (taking into account clipping and overlapping) of the given map, and is a function that computes the 2D integrals of the map.
Canvas coverage The canvas coverage is defined as the ratio of canvas area covered by the arranged photos:
where is a function that computes the area corresponding to the given input.
Visibility ratio balance
The visibility ratio balance is computed as the standard deviation of the information ratios:
where computes the standard deviation of the given values.
The values obtained are combined into a fitness function that must be maximized:
with , , a weight used to define the contribution of the -th criterion (usually fixed to ). This fitness function is at the basis of most of the photo collage algorithms in the state-of-the-art.
3 Collage Generation
In the following subsections, assuming that a proper dataset of images is available, we describe three different approaches to compute the image importance map: the first approach is inspired by ; the other two are here introduced. We also describe the algorithm used to place the images on the canvas area by searching the best configuration of states. The algorithm optimizes the fitness function defined in Equation 5.
3.1 Photo Datasets
A collage is usually created from a set of images sharing a common underlying theme. To create our dataset, we downloaded the images from the DPChallenge111http://www.dpchallenge.com/ web site. The site collects photos of both amateur and professional photographers that participate to digital photography challenges. Each challenge has a main theme that the participants must follow. All the submitted photos are then judged by other participants by giving a numerical score. We selected five photo challenges among the hundreds published and for each of them we collected the 14 best rated photos. The challenges have been chosen to include diverse subjects of generic themes. The chosen challenges are: Burst of Color III (Burst for brevity), Fashion II (Fashion), Landscape V (Landscape), Self Portrait VII (Self), and Zen Photography III (Zen). The Burst dataset is composed of images with a single subject; the Fashion dataset contains images of people and accessories; the Landscape dataset is composed of mostly horizontal images; on the contrary, the Self dataset contains mostly portrait images both in colors and black and white; finally, the Zen dataset is composed of heterogeneous images and in most cases it is not easy to identify the subject. This diversity makes it possible to investigate if people use different criteria in the creation of photo collages for different themes. Figure 4 shows the five sets of photos.
3.2 Importance Maps
Two of the basic criteria used in Equation 5 require the computation of importance maps to locate the most informative regions in an image. The underlying idea is that the most informative regions should not be hidden by other images thus maximizing the information displayed. Since there is no a unique definition of what is important in an image, in our investigation we tested different importance maps exploiting three different image properties: saliency, color harmony, and quality. Each importance map is plugged in turn into Equation 5 obtaining three different collages for each photo dataset.
Saliency The first importance map is based on saliency and uses an approach to compute it similar the one presented in . We used this approach in a previous work on image thumbnailing  and the resulting saliency maps show that, on the overall, a compact set of salient regions are produced. We considered these results reasonable for our purposes. Other, more recent and precise saliency methods can be exploited. The recent paper  shows the performances of several algorithms on reference datasets that can be used as alternative ones. For surveys related to saliency see [11, 23, 1]. To compute the saliency map, the image is divided into small rectangular tiles. On each tile, a contrast score is computed by comparing its average color with the average colors of the neighbor’s tiles. The contrast score is assigned to each pixel in the tile. The basic algorithm has been extended by computing three different saliency maps in the LUV color space using neighborhoods of increasing size. Each map captures the saliency at a different scale. These saliency maps are then filtered and combined together into a single normalized map of values in the range . We denote the importance map of the i-th image computed using saliency as . Examples of saliency maps are shown in the second column of Figure 5.
Harmony Since color combinations are related to the pleasantness of an image, for the second importance map, we used the method proposed in  to evaluate color harmony of the image locally by creating a color harmony map. We choose to use this approach because, in contrast to other approaches (e.g. ), it computes an image color harmony score by considering the distribution and spatial relationship between color regions found by the MeanShift segmentation algorithm. In order to have a color harmony map we computed the harmony score on pixel’s neighborhoods (i.e. pixels in a square region surrounding a given pixel’s location) of different sizes. The harmony map is obtained by summing all the scores and by normalizing them in the range. We denote the importance map of the i-th image computed using color harmony as . The third column of Figure 5 shows some examples of color harmony maps.
Quality Image quality approaches model how an image is perceived if affected by different image distortions. We cannot predict what kind of image distortions are present, nor we have a reference image to which compare our photos, thus we must consider generic (or “universal”) no references image quality approaches. We exploited the BRISQUE (Blind/Referenceless Image Spatial Quality Evaluator) computational model described in . The model uses different image features in order to quantify the image quality. Since BRISQUE computes a single quality index for an image, we implemented a neighborhood-based strategy in order to obtain a quality map. We considered the quality index computed on the whole image and on three pixel’s neighborhoods. The indexes are summed and normalized in the range. We denote the importance map of the i-th image computed using image quality as . The fourth column of Figure 5 shows some examples of image quality maps.
3.3 Optimization Algorithm
Let us consider, for now, a generic photo dataset and a generic importance map definition. Under these assumptions, the optimal collage is generated by finding the best configuration of states which maximizes Equation 5. The solution space of this maximization problem is of mixed type: in fact for each state we have , , and . In order to uniform the state variables types, and since small variations of do not affect the final collage, the allowed orientations are uniformly quantized in the range .
The chosen optimization method is an extension of a Direct Search algorithm (DS) modified to deal with discrete solution spaces [4, 3]. DS is a derivative-free method for solving optimization problems [18, 24]. Since the focus of this paper is not on the optimization algorithm used, any non-gradient method could be used  as well as stochastic ones [17, 12].
The algorithm is initialized with a random configuration of states: the i-th image is placed on the canvas at a random position and with a random orientation . Its layering index instead is determined by the importance map as previously described. At each iteration of the implemented DS algorithm, the algorithm finds the best configuration of states by testing the current best configuration against all those obtained by varying the position and orientation in each image’s state. The position of each image is then updated and a new iteration is started. The algorithm terminates when the maximum number of iterations has been reached.
We used the modified Direct Search algorithm without any heuristic whose computational cost is with being the number of grid points on the canvas, being the number of allowed orientation on ecah grid point, and being the number of images to be placed. Other, more efficient, optimization algorithms can be used. Here we are interested in the effects of using different criteria in creating the photo collages, not the most efficient way to create them.
3.4 Experimental Setup
In our implementation the size of the canvas is set 400400 pixels and all the images have been resized such that pixels maintaining the same aspect ratio. With these constraints, the ratios between the sum of the areas of the images in a given dataset and the area of the canvas are for the Burst dataset, for the Fashion dataset, for the Landscape dataset, for the Self dataset, and for the Zen dataset. In practice this means that we need to hide about of the pixels of the images to fit them on the canvas in a pleasant manner. Or, conversely, we need to retain the most informative and pleasant of the pixels. The canvas and image dimensions have been chosen solely for the purpose of evaluating the performance of our framework under typical constraints usage. We are here interested more in the ratio of size between images and canvas than in the absolute dimensions themselves and we wanted the placement problem to be hard. A larger canvas and/or larger images can be used in an actual application. It should be pointed out that while optimization has been done on canvas of 400400 pixels, the subjective tests have been done on their 16001600 versions.
Top left image corners were allowed to be placed on a regular grid from to 400 in both canvas directions, with a step of pixels. The set of allowed orientations is defined in the range in steps. For Experiment I, the values of , , and in Equation 5 have been set to , while for Experiment II, these have been learned from the users.
4 Subjective Experiment I
The above algorithm has been applied to each dataset using the three importance maps yielding a total of 15 photo collages as shown in Figure 6. Let us denote each collage with the corresponding configuration of states : denotes the photo dataset, and the importance map used.
In order to identify the criteria to be used to create pleasing photo collages, we performed a subjective test involving several users. Test subjects were selected taking into account age, gender and expertise in photography. Specifically, 16 subjects (Italian native speakers) were enlisted. Subjects are between 21 and 41 years old, three females and 13 males. Only one of the subjects can be considered an expert photographer (although not professional) while the others consider themselves amateurs. Half of the subjects stated that they shoot an average of 3,000-4,000 photos/year. The remaining subjects shoot an average of 100-300 photos/year. All of them have a certain knowledge about digital image processing. No relation exists between subjects and images in the photo datasets. In this first experiment, we showed to each subject the five sets of three photo collages, one set at time, and asked him/her to rank the three collages according to his/her liking without judging the semantic of the scenes depicted. The subjects were aware that the collages have been generated by different algorithms but no technical information and no hints about the underlying criteria were given. This was done in order to not bias their choices. The sets of photo collages, as well as the collages within each set, were presented in a random order. The evaluation of all the collages and the related interviews took on average 30 minutes per subject.
After all the test sessions have been performed, we counted the number of times that each collage was ranked at the first (i.e. best), second, or third position in its photo dataset. During the counting, we checked for noisy user feedback that, in the pairwise experiment, manifests in the form of circular preferences (e.g. AB, BC, and CA). We planned to remove these subjects from the analysis, but at the end of the experiment no one of the subjects showed this behavior.
Table 4 shows the detailed results. As it can be seen, in the case of the Zen photo dataset, the results are quite polarized. Almost all the subjects have judged the collages in a similar manner ranking first the collage created with the Saliency map, then the one using the Harmony map, and lastly the collage using the Quality map. The same ranking, although with a less polarization effect, can be observed for the Burst, Fashion and Landscape sets. In all the three sets, the collages created with the Saliency map is clearly the preferred one. The one using the Harmony map is the second best since it has been selected second or third a fewer number of times than the one using the Quality map. The only set displaying a different ranking is Self. In this case, the ranking is the opposite of the ones obtained from the other four sets.
In Table 4 the final ranking of the importance maps for the five photo collage sets are reported. The ranking is determined by applying the Formula One World Championship points scoring system: each collage receives 25, 18, or 15 points each time that it is selected respectively first, second, or third. The numbers in parenthesis are the computed scores.
After each test, we also interviewed each subject about the reasons of his/her choices, what factors have influenced the selection of a photo collage over the others, and what criteria they used. In the following, for each photo dataset, we report a summary of the answers given by the users during the interviews.
4.1 Experiment I: Results
The Burst photo dataset is composed of images with bright colors. Many of these images are close-ups. It is not surprising that most subjects indicated color as a primary feature in collage evaluation. In particular, several subjects suggested that the images should have been positioned in the canvas by taking into account the color similarity. Very dissimilar colors among neighbor images were considered disturbing. One subject suggested to hide very dark regions preferring to have a collage with bright colors. Most subjects preferred images placed with randomized orientations. Collages containing images with their borders parallel to the canvas borders were penalized. Most of the images in this dataset contain a single object of interest. Collages where this object was fully visible were thus preferred, in particular in the case of faces.
The Fashion photo dataset is mainly composed of images of full-body women models. Only one image is a close-up. These images are less colorful than the Burst dataset but they contain high contrast regions. The main criterion used in evaluating the collages was the visibility of the models. Several subjects also indicated that having the top layer image in a central position makes the collage more pleasing. Secondary criteria include the visibility of a (impossible to model) favourite image, and loss of bright colored regions. No other criteria were suggested on this dataset.
The Landscape photo dataset contains images with mostly dull colors if compared against the previous ones. No people are visible and the scenes depicted are mostly natural scenes. Several shots have a panoramic aspect ratio. For these reasons, according to the subjects, the collages created on this dataset resulted among the most difficult to be evaluated. Images arranged in a regular way were considered disturbing. If an image was mostly covered by the others (as for example the violet sunset in the collage created with the Saliency Map), it was considered acceptable by many users. On the overall, the collages were often considered equivalent.
photo dataset was the easiest to evaluate probably because contains self portraits. As expected, the criteria arisen from the interviews referred mostly to the visibility of the faces. One interesting insight on this dataset is that, even though we encouraged the subjects to avoid judging the image content from the semantic point of view, many choices were made based on the appealing of the faces depicted. For example, some subjects considered the photo of the clown unpleasant and thus a photo that could be covered before others. On the contrary others considered this photo very artistic. It seems that when human are depicted, personal preferences are difficult to ignore. This is the only dataset containing both gray-scale and color type of images. Some test subjects did not appreciate collages with spatial clusters of images of the same type.
The Zen photo dataset should inspire peace and tranquillity. It contains photos with very few colors and details. They are mostly close-ups, and some of the photos show soft-focus effects. Most of the subjects found it difficult to judge the collages and express the rationale behind their choices. However, color composition and harmony were the most important criteria. The best collages were those where the relevant objects were visible. One interesting criterion emerged on this dataset is that the shape of the visible image regions should not be jagged. Regular (i.e. convex) shapes are considered more appealing.
5 New Criteria Definition
From the results reported in the previous section, we can see that the users evaluated the collages using different criteria. These criteria are both local and global. Local criteria refer to either properties of single images or of their neighborhoods, while global criteria refer to properties of the collage seen as a whole. The three basic criteria reported in Table 2.1 and exploited in previour works are not enough to capture the different nuances of pleasantness expressed by the users. Thus, on the basis of the insights obtained from Experiment I, and taking into account that we need to model them with computational algorithms, we have selected the criteria in Table 5 to be used in the generation of pleasing collages. The first three criteria are extensions of the ones in Table 2.1, where now the importance map is computed by using a combination of the three importance maps described in Section 3.2. The other seven criteria have been defined following the results of Experiment I. Since the results showed that we also need to take into account the presence of faces within the images, it is necessary to consider, for each image, a binary mask containing the face regions. These masks undergo the same geometric transformations as the importance maps:
We indicate with the set of transformed masks which is passed along with the other data to the criteria functions. In the following we write the functions as dropping the dependencies for a more compact notation.
Visibility For each image we combined the three importance maps computed on saliency, quality and harmony, in order to obtain a global importance map:
where are found as described in the next section. Visibility is thus computed as in Equation 2 by substituting the set of transformed importance maps with the new one :
Canvas coverage The definition of the canvas coverage is identical to the definition of in Equation 3:
Visibility ratio balance The ratio balance is computed as in Equation 4:
Face ratio A face detector is run on each image to find the mask containing the face regions (i.e. face bounding boxes). Let the mask be
the face ratio feature is then defined as follows:
Axis alignment This feature measures the ratio of images with orientation parallel to the axis, i.e. given that .
Centrality The centrality feature measures how central is the image in the first layer, i.e. the top-most image. Let us call the centroid of the visible part of the image in the top layer and the centroid of the canvas . The centrality is defined as:
where is used to compute the half diagonal length.
Convexity For each transformed image the convexity ratio is defined as ratio between the area corresponding to the image’s visible region and the area of its convex hull. The convexity feature is computed as the minimum convexity ratio over all the transformed images:
Color similarity This feature is computed by evaluating the color histogram similarity of each image on the canvas with respect to its neighbors. For each image we first compute:
where is the color histogram computed on the visible portion of , is the chi-squared distance, and represents the set of the indexes of the images neighbors of . Color similarity is then computed as:
This feature measures the average of the variance in orientation in each set of neighbor images:
where and is the maximum rotation angle allowed.
Minimum orientation difference This feature measures the average of the minimum orientation differences between each image and its neighboring set :
The new fitness function to be optimized in the generation of pleasing photo collages, can be compactly written as:
where each weights the contribution of criterion , and are found as described in the next section. Please recall that the fitness function also depends on the three weights introduced in Equation 7, and that are used to compute the new importance maps.
6 User Preferences Modeling and Learning
Given as input the values , , we want to learn a single set of optimal weights to be plugged into Equation 20 so that they produce fitness values in accordance with user preferences emerged from Experiment I on all the datasets considered. To this end, for each dataset, the fitness values obtained for the collages created using the saliency, harmony, and quality importance maps must be in the same order reported in Table 4. Taking as example the Burst dataset, where the user preferences were Saliency Harmony Quality, we want that . Furthermore, the relative distances between the normalized scores obtained by the different maps and reported in Table 6 should be preserved as much as possible.
As an example, let us indicate with the function that computes the Formula One score. Taking again as example the Burst dataset, we want that the fitness satisfies
Similar constraints come from the other four datasets considered, giving a total of ten simoultaneous contraints that Equation 20 has to satisfy.
The optimal weights are found by solving the following optimization problem:
where , and are respectively the per-dataset user rankings computed using and the rankings induced by :
is the Kendall tau rank correlation coefficient , is the norm, and is a weight term that balances the relative contributions of the two parts of which Equation 23 is made of. In this work, is heuristically set to .
The rationale behind the optimization is that we want to automatically find the best set of weights that, plugged into Equation 23, produce a fitness function in maximum accordance with user rankings on all the datasets used for training.
The Kendall tau rank correlation coefficient in the first term is used to measure if, and to what extent, a given set of weights produces a fitness in accordance with user rankings. This means that when the fitness function is valued on the set of collages, its outputs should be in the same order in which the users judged them. The second term is introduced to avoids the scores to be too close to each other and to have a meaningful ranking.
Having such a fitness function permits, given a new set of images for which we want to build a collage, to have a measure of how good is a certain configuration of image states. Furthermore, maximizing we are confident that we are generating a collage that the users will judge good on the overall. The optimization to solve Equation 23 has to be performed just once and offline so the computational time required to solve it is of secondary importance. However, it requires a bunch of seconds to run, since all the inputs are computed offline and the only operations involved are the computation of the Kendall tau rank (first term of Equation 23) and the relative distances between user scores and induced scores (second term of Equation 23).
The optimization problem in Equation 23 is solved using the continuous-space implementation of the DS algorithm. The signs of the criteria weights are reported in Table 6 together with a brief explanation of their effect in the creation of the collage.
Once the weights are learned, a new set of collages is generated by maximizing Equation 20. For each dataset, the layering order of the images is induced by the weigths . More in details, for each image a new importance map is created using Equation 7. The layering order is then obtained by sorting in decreasing order the 2D integrals of the importance maps . The collages are generated using the discrete version of DS algorithm introduced in Section 3.3.
7 Subjective Experiment II
Exp. I - Best coll.
Exp. II - New coll.
The final collages obtained on each dataset using the above described procedure, are reported in Figure 7. We denote each new collage with the corresponding configuration of states . For each dataset we also report the best ranked collage from Experiment I according to the scores in Table 6. In this experiment we wanted to understand if the new collages were judged better than the previous ones. To this end, we performed a pairwise subjective test. For each dataset, users were presented with the two collages in Figure 7 and they were asked to choose the preferred one. A total of 39 subjects participated to this experiment: 26 males and 13 females.
Results of Experiment II are reported in Table 7. In three datasets (Burst, Landscape, and Self) the new collages were preferred by over 64% of the subjects. In particular, the Self dataset exhibits the higher percentage of preferences with about 72% of the subjects choosing the new collage. The Fashion dataset shows about 56% of preference for the new collage. For this dataset, the new criteria seem to be marginally effective. This is due to the artistic nature of the photos that makes them good-looking regardless of their positioning. The Zen dataset continues to be the most problematic to be evaluated. Substantially, the subjects split in half in judging the collages due to the particular nature of the photo’s content. On average 62% of the subjects preferred the new photo collages.
8 Further experiments
In this section further experiments are carried out to verify the generalization ability of the identified criteria and their learned relative importance. Three different experiments are presented: i) the learned definition of pleasantness is used to create collages on unseen image sets; ii) results obtained by our proposal are compared against two state-of-the-art algorithms; iii) the behavior of the learned definition of pleasantness is also tested by varying the number of images in the dataset and the canvas size.
8.1 Generalization to new photo themes
In order to test how the learned definition of pleasantness generalizes to photo collages not seen during the training phase, a further experiment has been done. The optimal set of weights learned on the five training datasets in the previous section is used as-is to create photo collages on the new datasets. For this experiment, six new challenges have been selected from the DPChallenge web site. The chosen challenges are: Shallow DOF VI (Shallow for brevity), Red V (Red), Primary Colors II (Primary), Silhouettes VI (Silhouettes), Selfie! (Selfie), and 160 Pixels (Pixels).
The experiment performed is a pairwise subjective test similar to the one used in Experiment II. For each dataset, users were presented with the two collages in Figure 8 and they were asked to choose the preferred one. Following the results of Experiment I, one of the collage was generated using a single importance map; the other one was generated maximizing the learned fitness. A total of 42 subjects participated to this subjective experiment: 22 males and 20 females.
Results from this experiment showed that on average 61% of the subjects preferred the photo collages created using the learned definition of pleasantness. In particular, in four datasets (Shallow, Primary, Selfie and Pixels) these collages were preferred by over 66% percent of the subjects. For the remaining two datasets (Red and Silhouettes) instead, the two collage versions tied.
8.2 Comparing collages
The different collage algorithms in the state of the art are based on different philosophies: keep images with the same size vs. allow image resize; allow image rotation vs. not; preserve image borders vs. blend image contents; allow images overlapping vs. not. We have run an experiment to compare our collage results with those of two algorithms belonging to the non-content preserving category (the same as ours) but using different philosophies: Shape Collage222http://www.shapecollage.com/ (a commercial software), and Autocollage . The most relevant differences between the algorithms are that Shape Collage and our algorithm allow images to be rotated, while Autocollage does not. Moreover, Autocollage blends the images together to have a smooth transition between them, while Shape Collage and our algorithms do not.
For this comparison, we used the six image datasets used in Section 8.1. We set the parameters of the Autocollage and Shape Collage algorithms to generate collages of 14 images on a squared canvas and image to canvas ratio as similar as possible as in our set-up. The collages generated with the different methods are shown in Figure 9.
The same subjects that participated in the previous experiment participated to this one. We asked them to choose among the three collages which they preferred. Results from this experiment showed that on average 56% of the subjects preferred our photo collages; 42% of the subjects preferred the Autocollage results and just 2% preferred the Shape Collage results. In particular, in four datasets (Red, Primary, Selfie and 160 pixels) our collages were preferred by 65% percent of the subjects. For the remaining two datasets (Shallow and Silhouettes) instead, 64% of the subjects preferred the Autocollage results. This is due to the nature of the images used: in both categories the images contain a subject with an out-of-focus (Shallow) or almost uniform (Silhouettes) backgrounds. This makes easier for Autocollage to nicely blend image contents.
8.3 Varying collage sizes
In this experiment we test how the learned definition of pleasantness generalizes to datasets with different number of input images per collage and different canvas sizes. Two smaller and two larger variants of the Red dataset have been considered, containing 5, 10, 25, and 50 images respectively. Canvas sizes have been chosen so that they are almost half of the total area covered by the images as in Section 3. Thus optimization has been performed on canvas having side length equal to 250, 350, 550, and 750. The results are reported in Figure 10 and compared with the results obtained by Autocollage, which resulted in the best competing algorithm in the previous section. All the canvas have been resized to equal size for better visualization. The results of Autocollage in the case of 5 images is not available since the minimum number of images it can handle is 7. The judgments of these collages have been performed by 20 subjects. We asked them to choose which collage among the two they prefer. After the test, the results that we collected are the following. For the three collage with few images, the majority of the subjects chose our collage with percentages of 100%, 70% and 65% for the 5, 10 and 15 images respectively. In the case of 25 images, the difference between our collage and the Auto collage is reduced, with 55% of the users choosing our collage and 45% choosing the Auto collage. Finally, the gap between the two approaches further reduces in the case of 50 images to practically a tie (50% of preferences). The interview with the users revealed that, when presented with the collages with 50 images, the limited canvas size made them paying less attention to the actual content of the images, while favoring the overall image distribution. In this case, the two collages were considered equally cluttered but the smooth carving of Auto collage made this collage more pleasing.
|5 images||10 images||14 images||25 images||50 images|
In this work we have considered the problem of creating pleasing photo collages by exploiting subjective experiments to model and learn user preferences. We designed an experimental framework for the identification of the criteria that need to be taken into account to generate a pleasing photo collage. Starting from collages created using state-of-the-art criteria, namely photo informativeness, canvas area coverage, and information ratio balance, we performed a subjective experiment involving several subjects on different thematic photo datasets. This experiment showed that different and more complex criteria are involved in the subjective definition of pleasantness. Inspired by the responses of the subjects, we have redefined the basic criteria and we have identified and implemented new global and local ones: face ratio, axis alignment, centrality, convexity, color similarity, orientation diversity and minimum orientation difference. The relative importance of all these criteria has been learned by exploiting user rankings. Moreover, with the proposed experimental framework we learned a composite photo informativeness description from saliency, quality and harmony. A new set of collages has been generated using the identified criteria and evaluated in a pairwise comparison experiment against the previous best rated collages. The new collages were preferred by the majority of the subjects for all the photo datasets considered, showing that the proposed framework is able to identify and combine the criteria at the basis of user preference, and to learn a computational model which effectively encodes an inter-user definition of pleasantness. A further experiment has been run, showing that the learned definition of pleasantness generalizes well to new thematic photo datasets not used in the training phase.
Photo informativeness has been described in terms of saliency, quality, and harmony maps, but other maps taking into account different image properties can be incorporated as well in our framework (e.g. photo memorability by Isola et al. [20, 22]). Furthermore, leveraging user preferences, the proposed framework permits to quantify the contribution of different visual features to model new intrinsic properties of the images.
The proposed framework can benefit current collage generation algorithms in two different ways. The first regards its use to estimate the weights of the fitness function (also called energy function) in the different collage generation algorithms, e.g: weights associated to region importance, transition cost, object sensitivity and face presence in Autocollage; representativeness, compactness and transition smoothness in Video collage ; salience visibility, salience ratio balance, penalty of severe occlusions, blank space presence, canvas shape constraint, spatially uniformity and orientation diversity in Picture collage [37, 26]; image complexity and content distinctness in . All these algorithms heuristically set the weights associated to the different terms in their fitness functions. With our framework, these weights can be systematically set using user preferences. This way requires that a training data has to be generated in the form of multiple collages and the collection of user judgments about them. This operation has to be done only once and does not impact collage generation time. The second way in which existing algorithms can leverage our work regards the possibility of including the new criteria here defined inside their fitness/energy functions. This will not dramatically slow down the collage generation process, since the new criteria are fast to compute.
As future work we plan to investigate if the learned definition of pleasantness changes when subjects and photos are linked. We plan also to expand the set of criteria by enlarging the number of subjects in the experiments, and by adding more thematic photo datasets.
- Ali et al.  [author] Ali, BorjiB., Ming-Ming, ChengC., Huaizu, JiangJ. Jia, LiL. (2014). Salient Object Detection: A Survey. arXiv:1411.5878 [cs.CV].
- Battiato et al.  [author] Battiato, SebastianoS., Ciocca, GianluigiG., Gasparini, FrancescaF., Puglisi, GiovanniG. Schettini, RaimondoR. (2008). Smart photo sticking. In Adaptive Multimedia Retrieval: Retrieval, User, and Semantics. Lecture Notes in Computer Science 4918 211–223. Springer.
- Bianco and Schettini  [author] Bianco, SimoneS. Schettini, RaimondoR. (2012). Sampling Optimization for Printer Characterization by Direct Search. IEEE Transactions on Image Processing 21 4868–4873.
- Bianco and Tisato  [author] Bianco, SimoneS. Tisato, FrancescoF. (2012). Sensor Placement Optimization in Buildings. In Image Processing: Machine Vision Applications V 8300 830003. SPIE.
- Calic, Gibson and Campbell  [author] Calic, J.J., Gibson, D. P.D. P. Campbell, N. W.N. W. (2007). Efficient Layout of Comic-Like Video Summaries. Circuits and Systems for Video Technology, IEEE Transactions on 17 931–936.
- Chao et al.  [author] Chao, HuiH., Tretter, Daniel R.D. R., Zhang, XuemeiX. Atkins, C. BrianC. B. (2010). Blocked recursive image composition with exclusion zones. In Proceedings of the 10th ACM symposium on Document engineering. DocEng ’10 111–114. ACM.
- Chen et al.  [author] Chen, Jun-ChengJ.-C., Chu, Wei-TaW.-T., Kuo, Jin-HauJ.-H., Weng, Chung-YiC.-Y. Wu, Ja-LingJ.-L. (2006). Tiling Slideshow. In Proceedings of the 14th Annual ACM International Conference on Multimedia. MULTIMEDIA ’06 25–34. ACM.
- Cheng et al.  [author] Cheng, M.M., Mitra, N. J.N. J., Huang, X.X., Torr, P. H. S.P. H. S. Hu, S.S. (2015). Global Contrast Based Salient Region Detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on 37 569-582.
- Ciocca and Schettini  [author] Ciocca, GianluigiG. Schettini, RaimondoR. (2010). Multiple image thumbnailing. In Digital Photography VI 7537 75370S. SPIE.
- Diakopoulos and Essa  [author] Diakopoulos, NicholasN. Essa, IrfanI. (2005). Mediating Photo Collage Authoring. In Proceedings of the 18th Annual ACM Symposium on User Interface Software and Technology. UIST ’05 183–186. ACM, New York, NY, USA.
- Duncan and Sarkar  [author] Duncan, K.K. Sarkar, S.S. (2012). Saliency in images and video: a brief survey. Computer Vision, IET 6 514–523.
- Eberhart and Kennedy  [author] Eberhart, Russ CR. C. Kennedy, JamesJ. (1995). A new optimizer using particle swarm theory. In Proceedings of the sixth international symposium on micro machine and human science 1 39–43. New York, NY.
Ekhtiyar, Sheida and
[author] Ekhtiyar, HesamH., Sheida, MahdiM. Amintoosi, MahmoodM. (2011). Picture Collage with Genetic Algorithm and Stereo vision. International Journal of Computer Science Issues 8 165–169.
- Fan  [author] Fan, JianJ. (2012). Photo Layout with a Fast Evaluation Method and Genetic Algorithm. In Multimedia and Expo Workshops (ICMEW), 2012 IEEE International Conference on 308–313. IEEE.
- Girgensohn and Chiu  [author] Girgensohn, AndreasA. Chiu, PatrickP. (2003). Stained Glass Photo Collages. In IEEE International Conference on Image Processing 2 871–874.
- Goferman, Tal and Zelnik-Manor  [author] Goferman, StasS., Tal, AyelletA. Zelnik-Manor, LihiL. (2010). Puzzle-like Collage. Computer Graphics Forum 29 459–468.
[author] Goldberg, David E.D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning, 1st ed. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
- Hooke and Jeeves  [author] Hooke, RobertR. Jeeves, To AT. A. (1961). “Direct Search”Solution of Numerical and Statistical Problems. Journal of the ACM (JACM) 8 212–229.
- Huang, Zhang and Zhang  [author] Huang, HuaH., Zhang, LeiL. Zhang, Hong-ChaoH.-C. (2011). Arcimboldo-like Collage Using Internet Images. ACM Transaction on Graphics 30 155:1–155:8.
Isola et al. 
[author] Isola, PhillipP., Xiao, JianxiongJ., Torralba, AntonioA. Oliva, AudeA. (2011). What makes an image memorable? In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on 145–152. IEEE.
- Kendall  [author] Kendall, Maurice GM. G. (1938). A new measure of rank correlation. Biometrika 30 81–93.
- Khosla et al.  [author] Khosla, AdityaA., Xiao, JianxiongJ., Torralba, AntonioA. Oliva, AudeA. (2012). Memorability of Image Regions. In NIPS 2 4.
- Kimura, Yonetani and Hirayama  [author] Kimura, AkisatoA., Yonetani, RyoR. Hirayama, TakatsuguT. (2013). Computational models of human visual attention and their implementations: A survey. IEICE TRANSACTIONS on Information and Systems 96 562–578.
- Kolda, Lewis and Torczon  [author] Kolda, Tamara GT. G., Lewis, Robert MichaelR. M. Torczon, VirginiaV. (2003). Optimization by direct search: New perspectives on some classical and modern methods. SIAM review 45 385–482.
- Lee et al.  [author] Lee, Man HeeM. H., Singhal, NitinN., Cho, SungdaeS. Park, In KyuI. K. (2010). Mobile photo collage. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on 24–30.
- Liu et al.  [author] Liu, TieT., Wang, JingdongJ., Sun, JianJ., Zheng, NanningN., Tang, XiaoouX. Shum, Heung-YeungH.-Y. (2009). Picture collage. Multimedia, IEEE Transactions on 11 1225–1239.
- Luo et al.  [author] Luo, Sheng-JieS.-J., Tsai, Chun-YuC.-Y., Chen, Wei-ChaoW.-C. Chen, Bing-YuB.-Y. (2013). Dynamic Media Assemblage. Circuits and Systems for Video Technology, IEEE Transactions on 23 2044–2053.
- Ma and Zhang  [author] Ma, Yu-FeiY.-F. Zhang, Hong-JiangH.-J. (2003). Contrast-based image attention analysis by using fuzzy growing. In Proceedings of the eleventh ACM international conference on Multimedia. MULTIMEDIA ’03 374–381. ACM.
- Mei et al.  [author] Mei, TaoT., Yang, BoB., Yang, Shi-QiangS.-Q. Hua, Xian-ShengX.-S. (2009). Video collage: presenting a video sequence using a single image. The Visual Computer 25 39–51.
- Mittal, Moorthy and Bovik  [author] Mittal, A.A., Moorthy, A. K.A. K. Bovik, A. C.A. C. (2012). No-Reference Image Quality Assessment in the Spatial Domain. Image Processing, IEEE Transactions on 21 4695–4708.
- Nelder and Mead  [author] Nelder, John AJ. A. Mead, RogerR. (1965). A simplex method for function minimization. The computer journal 7 308–313.
- Ou and Luo  [author] Ou, Li-ChenL.-C. Luo, M. RonnierM. R. (2006). A colour harmony model for two-colour combinations. Color Research & Application 31 191–204.
- Rother et al.  [author] Rother, CarstenC., Kumar, SanjivS., Kolmogorov, VladimirV. Blake, AndrewA. (2005). Digital tapestry [automatic image synthesis]. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on 1 589–596. IEEE.
- Rother et al.  [author] Rother, CarstenC., Bordeaux, LucasL., Hamadi, YoussefY. Blake, AndrewA. (2006). Autocollage. In ACM Transactions on Graphics (TOG) 25 847–852. ACM.
- Sandhaus, Rabbath and Boll  [author] Sandhaus, PhilippP., Rabbath, MohammadM. Boll, SusanneS. (2011). Employing Aesthetic Principles for Automatic Photo Book Layout. In Advances in Multimedia Modeling. Lecture Notes in Computer Science 6523 84–95. Springer Berlin Heidelberg.
- Solli and Lenz  [author] Solli, MartinM. Lenz, ReinerR. (2009). Color harmony for image indexing. In Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on 1885–1892.
- Wang et al.  [author] Wang, JingdongJ., Quan, LongL., Sun, JianJ., Tang, XiaoouX. Shum, Heung-YeungH.-Y. (2006). Picture collage. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on 1 347–354. IEEE.
- Wei, Matsushita and Yang  [author] Wei, YichenY., Matsushita, YasuyukiY. Yang, YingzhenY. (2009). Efficient optimization of photo collage Technical Report No. MSRTR-2009-59, Microsoft Research.
- Wu and Aizawa  [author] Wu, ZhipengZ. Aizawa, K.K. (2013). PicWall: Photo collage on-the-fly. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific 1–10. IEEE.
- Wu and Aizawa  [author] Wu, ZhipengZ. Aizawa, KiyoharuK. (2014). Building Friend Wall for Local Photo Repository by Using Social Attribute Annotation. Journal of Multimedia 9 4–13.
- Yang et al.  [author] Yang, YingzhenY., Wei, YichenY., Liu, ChunxiaoC., Peng, QunshengQ. Matsushita, YasuyukiY. (2009). An improved belief propagation method for dynamic collage. The Visual Computer 25 431–439.
- Yu et al.  [author] Yu, ZongqiaoZ., Lu, LinL., Guo, YanwenY., Fan, RongfeiR., Liu, MingmingM. Wang, WenpingW. (2014). Content-Aware Photo Collage Using Circle Packing. Visualization and Computer Graphics, IEEE Transactions on 20 182–195.