On the Distribution of Salient Objects in Web Images and its Influence on Salient Object Detection

01/10/2015 ∙ by Boris Schauerte, et al. ∙ KIT 0

It has become apparent that a Gaussian center bias can serve as an important prior for visual saliency detection, which has been demonstrated for predicting human eye fixations and salient object detection. Tseng et al. have shown that the photographer's tendency to place interesting objects in the center is a likely cause for the center bias of eye fixations. We investigate the influence of the photographer's center bias on salient object detection, extending our previous work. We show that the centroid locations of salient objects in photographs of Achanta and Liu's data set in fact correlate strongly with a Gaussian model. This is an important insight, because it provides an empirical motivation and justification for the integration of such a center bias in salient object detection algorithms and helps to understand why Gaussian models are so effective. To assess the influence of the center bias on salient object detection, we integrate an explicit Gaussian center bias model into two state-of-the-art salient object detection algorithms. This way, first, we quantify the influence of the Gaussian center bias on pixel- and segment-based salient object detection. Second, we improve the performance in terms of F1 score, Fb score, area under the recall-precision curve, area under the receiver operating characteristic curve, and hit-rate on the well-known data set by Achanta and Liu. Third, by debiasing Cheng et al.'s region contrast model, we exemplarily demonstrate that implicit center biases are partially responsible for the outstanding performance of state-of-the-art algorithms. Last but not least, as a result of debiasing Cheng et al.'s algorithm, we introduce a non-biased salient object detection method, which is of interest for applications in which the image data is not likely to have a photographer's center bias (e.g., image data of surveillance cameras or autonomous robots).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 11

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

(a) Example images
(b) Centroid scatter plot
(c) Example segmentation masks
(d) Mean segment mask
Figure 1: Illustration of the Achanta/Liu data set: example images 1(a), the corresponding segmentation masks 1(c), the mean over all segmentation masks 1(d), and the scatter plot of the centroid locations across all images 1(b).

Among other influences such as task-specific factors, human attention is attracted to salient stimuli. In this context, saliency describes the subjective, perceptual quality that lets some items in the world stand out from their neighbors and immediately grab our attention. Accordingly, the goal of visual saliency detection is to determine what parts of an image are likely to grab the human attention. The task of “traditional” visual saliency detection is to predict where human observers look when presented with a scene, which can be recorded using eye tracking equipment (e.g., Einhäuser et al. (2008); Yang et al. (2010); Judd et al. (2009); Schauerte and Stiefelhagen (2012)). Liu et al. adapted the traditional definition of visual saliency by incorporating the high level concept of a salient object into the process of visual attention computation Liu et al. (2007). Here, a salient object is defined as being the object in an image that attracts most of the user’s interest such as, for example, the man, the cross, the baseball players and the flowers in Fig. 1(a) (left-to-right, resp.). Accordingly, Liu et al. Liu et al. (2007) defined the task of salient object detection as the binary labeling problem of separating the salient object from the background. Thus, in contrast to traditional visual saliency detection, salient object detection does not just comprise of the task to calculate the saliency of image regions, but it also incorporates the task to determine and segment the most salient object in the image. Here, it is important to note that the selection of a salient object happens consciously by the user whereas the gaze trajectories that are recorded using eye trackers are the result of mostly unconscious processes. Consequently, also taking into account that salient objects attract the human gaze (see, e.g., Einhäuser et al. (2008)), salient object detection and predicting where people look are very closely related yet substantially different tasks.

The photographer’s center bias, i.e. the natural tendency of photographers to place the objects of interest near the center of their composition in order to enhance their focus and size relative to the background (see Tseng et al. (2009)), has been identified as one cause for the often reported center bias in eye-tracking data during eye-gaze studies Reinagel and Zador (1999); Parkhurst and Niebur (2003); Tatler (2007)333Here, it is important to note that Tseng et al. – due to their methodology – did not investigate the exact spatial distribution of the objects that attract the gaze. They hired five persons who provided subjective scores from 1 to 5 in terms of how interesting things were biased toward the image center Tseng et al. (2009).. As a consequence, the integration of a center bias has become an increasingly important aspect in visual saliency models that focus on gaze prediction (e.g., Yang et al. (2010); Judd et al. (2009); Borji et al. (2012)). In contrast, most recently proposed salient object detection algorithms do not incorporate an explicit model of the photographer’s center bias (see, e.g., Achanta et al. (2009); Achanta and Süsstrunk (2010); Klein and Frintrop (2011); Cheng et al. (2011)). A notable exception and closely related to our work is the work by Jiang et al. Jiang et al. (2011), in which one of the three main criteria that characterize a salient object is that

“it is most probably placed near the center of the image”

Jiang et al. (2011). The authors justify this characterization with the “rule of thirds”, which is one of the most well-known principles of photographic composition (see, e.g., Luo and Tang (2008)

), and use a Gaussian distance metric as a model. We go beyond following the rule of third and show that the distribution of the objects’ centroids correlates strongly positively with a 2-dimensional Gaussian distribution. This means nothing less than that we provide a strong empirical justification for integrating Gaussian center bias models into salient object detection algorithms. To demonstrate the importance, we adapt two state-of-the-art salient object detection methods to quantify the influence of the photographer’s center bias on salient object detection.

The contribution of this paper is twofold: First, we use the salient object data set by Achanta et al. Achanta et al. (2009) to investigate the spatial distribution of salient objects in images. This way, in Sec. 3

, we show that it is likely that salient objects in photographs are distributed around the image center in such a way that the radii are half-Gaussian distributed and the angles are uniformly distributed. Second, in Sec. 

4, we explicitly integrate Gaussian center bias models in two recently proposed salient object detection methods: The pixel-based maximum symmetric surround salient object detection by Achanta et al. Achanta and Süsstrunk (2010) and the segment-based region contrast method by Cheng et al. Cheng et al. (2011). In order to measure the influence, we use the following evaluation measures: The maximum score, the maximum score with Achanta et al. (2009), the area-under-curve of the precision-recall curve, the AUC of the receiver operating characteristic (ROC AUC), and the hit-rate. In summary, the integration of the center bias model increases the ROC AUC by and the performance with respect to all remaining measures by roughly . Thus, we further advance the state-of-the-art of pixel-based as well as segment-based salient object detection. By modifying Cheng et al.’s region contrast model Cheng et al. (2011), first, we obtained a non-biased salient object detection algorithm that is based on region contrast and, second, we exemplarily demonstrate that implicit center biases can already be found in well-performing, state-of-the-art salient object detection algorithms and substantially influence the performance. This is important to consider when comparing and selecting algorithms for applications in which the data is not necessarily biased towards the center.

The remainder of this paper is organized as follows: In Sec. 2, we provide an overview of related work. Subsequently, in Sec. 3, we introduce and investigate our hypotheses about the spatial distribution of salient objects. Then, in Sec. 4, we integrate our hypotheses into two recently proposed salient object detection methods and evaluate the influence on the salient object detection performance. We conclude with a short summary and discussion in Sec. 5. Furthermore, please feel free to check the supplemental material for additional information such as, e.g., further evaluation results.

2 Related Work

We focus on the most recent related work that addresses bottom-up saliency detection with an emphasis on salient object detection (see, e.g., Tsotsos (2011)

for a more general overview of computational attention models). Such methods may be biologically motivated, or purely computational, or involve both aspects. In 2009, Achanta et al.

Achanta et al. (2009); Achanta and Süsstrunk (2010) introduced a salient object detection approach that basically relies on the difference of pixels to the average color and intensity value. In order to evaluate their approach, they selected a sub-set of 1000 images of the image data set that was collected from the web by Liu et al. Liu et al. (2007) and calculated segmentation masks of the salient objects that were marked by 9 participants using (rough) rectangle annotations Liu et al. (2007). Please note that this procedure also means that during the manual data set annotation the selection of the salient object happens mostly conscious whereas gaze trajectories that are recorded using eye trackers are a result of a mostly unconscious process. Since it was created, the salient object data set by Achanta et al. serves as reference data set to evaluate methods for salient object detection (see, e.g., Achanta et al. (2009); Achanta and Süsstrunk (2010); Klein and Frintrop (2011); Cheng et al. (2011)). Liu et al. Liu et al. (2007) and Alexe et al. Alexe et al. (2010)

approach salient object detection using machine learning. To this end, Liu et al.

Liu et al. (2007) combine multi-scale contrast, center-surround histograms, and color spatial-distributions with conditional random fields. Similarly, Alexe et al. Alexe et al. (2010) combine multi-scale saliency, color contrast, edge density, and superpixels in a Bayesian framework. Closely related to Bayesian surprise Itti and Baldi (2006), Klein et al. Klein and Frintrop (2011)

use the Kullback-Leibler Divergence of the center and surround image patch histograms to calculate the saliency. Cheng et al.

Cheng et al. (2011) use segmentation to define a regional contrast-based method, which simultaneously evaluates global contrast differences and spatial coherence. Here, we can differentiate between algorithms that rely on segmentation-based (e.g., Cheng et al. (2011); Alexe et al. (2010)) and pixel-based contrast measures (e.g., Achanta et al. (2009); Achanta and Süsstrunk (2010); Klein and Frintrop (2011)). Closely related to our work on the quantitative influence of the center bias on salient object detection is the work by Jiang et al. Jiang et al. (2011) and most recently Borji et al. Borji et al. (2012). In Jiang et al.’s work Jiang et al. (2011) one of the main criteria that characterize a salient object is that “it is most probably placed near the center of the image”, which is justified with the “rule of thirds”. Most recently, Borji et al. Borji et al. (2012) evaluated several salient object detection models and also performed tests with an additive Gaussian center bias and conclude that the resulting “change in accuracy is not significant and does not alter model rankings”. But, this neglects the possibility that well-performing models already have an integrated, implicit center bias, which – as one part of our work – we demonstrate exemplarily to be the case for Cheng et al.’s region contrast algorithm Cheng et al. (2011). Furthermore, there exist several approaches that explicitly integrate a center bias, but do not provide a quantitative evaluation of its influence nor an empirical justification of the chosen model (e.g., Scharfenberger et al. (2013)). In this paper, we adapt the pixel-based method by Achanta et al. Achanta and Süsstrunk (2010) and the segmentation-based method by Cheng et al. Cheng et al. (2011) to incorporate a model of the photographer-related center bias and quantify the influence of the center bias on the performance. Furthermore, Borji et al. Borji et al. (2012) do not provide an empirical justification why a Gaussian distribution is an appropriate center bias model, which is another part of the work described in this paper.

It has been observed in several studies that the visual attention of human participants in natural scenes is biased toward the center of static images and videos (see, e.g., Busswell (1935); Tatler (2007); Parkhurst and Niebur (2003)). One possible bottom-up cause of the bias is intrinsic bottom-up visual saliency as predicted by computational saliency models. One possible top-down cause of the center bias is known as photographer bias (see, e.g., Reinagel and Zador (1999); Parkhurst and Niebur (2003); Tatler (2007)), which describes the natural tendency of photographers to place objects of interest near the center of their composition. In fact, what the photographer considers interesting may also be highly bottom-up salient. Additionally, the photographer bias may lead to a viewing strategy bias Parkhurst et al. (2002), which means that viewers may orient their attention more often toward the center of the scene, because they expect salient or interesting objects to be placed there. Thus, since in natural images and videos the distribution of objects of interest and thus saliency is usually biased toward the center, it is often unclear how much the saliency actually contributes in guiding attention. It is possible that people look at the center for reasons other than saliency, but their gaze happens to fall on salient locations. Therefore, this center bias may result in overestimating the influence of saliency computed by the model and contaminate the evaluation of how visual saliency may guide orienting behavior. Recently, Tseng et al. Tseng et al. (2009) were able to demonstrate quantitatively that center bias is correlated strongly with photographer bias and is influenced by viewing strategy at scene onset. Furthermore, e.g., they were able to show that motor bias had no effect. However, they did not evaluate and computationally model how specifically the objects that attract the gaze are distributed spatially in the image. Instead, Tseng et al. hired five naive participants to provide subjective scores from 1 to 5 in terms of how interesting things were biased toward the image center Tseng et al. (2009). In this paper, we use the data set by Achanta et al. Achanta et al. (2009) to investigate the distribution of salient objects in photographs and then evaluate the influence on two state-of-the-art salient object detection models.

3 Center Bias Model

Figure 2: Quantile-Quantile (Q-Q) plots of the angles versus a uniform distribution (left), radii versus a half-Gaussian distribution (middle), transformed radii (see Sec. 3.3

) versus a normal distribution (right).

To investigate the spatial distribution of salient objects in photographs collected from the web, we use the manually annotated segmentation masks by Achanta et al. Achanta et al. (2009); Achanta and Süsstrunk (2010) that mark the salient objects in 1000 images of the salient object data set by Liu et al. Liu et al. (2007). More specifically, we use the segmentation masks to determine the centroids of all salient objects in data set and analyze the centroids’ spatial distribution. The images in the data set by Liu et al. Liu et al. (2007) have been collected from a variety of sources, mostly from image forums and image search engines. Liu et al. collected more than 60,000 images and subsequently selected an image subset in which all images contain a salient object or a distinctive foreground object Liu et al. (2007). 9 users marked the salient objects using (rough) bounding boxes and the salient objects in the image database have been defined based on the “majority agreement”. However, as a consequence of the selection process, the data set does not include images without distinct salient objects. This is an important aspect to consider when trying to generalize the results reported on Achanta et al.’s and Liu et al.’s data set to other data sets or application areas.

In order to statistically analyze the 2-dimensional spatial distribution of the salient objects’ centroids, we first identify the center of the spatial distribution. Then, given the distribution’s center, we can use a polar coordinate system to independently analyze the distribution of the angles and distances between the center and the salient objects.

3.1 The Center

Our model is based on a polar coordinate system that has its pole at the image center. Since the images in Achanta’s data set have varying widths and heights, we use in the following normalized Cartesian image coordinates in the range . The mean salient object centroid location is and the corresponding covariance matrix is . Thus, we can motivate the use of a polar coordinate system that has its pole at to represent all locations relative to the expected distribution’s mode.

3.2 The Angles are Distributed Uniformly

Our first model hypothesis is that the centroids’ angles in the specified polar coordinate system are uniformly distributed in .

In order to investigate the hypothesis, we use a Quantile-Quantile (Q-Q) plot as a graphical method to compare probability distributions (see

NIST/SEMATECH (2012)). In Q-Q plots the quantiles of the samples of two distributions are plotted against each other. Thus, the more similar the two distributions are, the better the points in the Q-Q plot will approximate the line . We calculate the Q-Q plot of the salient object location angles in our polar coordinate system versus uniformly drawn samples in , see Fig. 2 (left). The apparent linearity of the plotted Q-Q points supports the hypothesis that the angles are distributed uniformly.

We can quantify the observed linearity, see Fig. 2 (left), to analyze the correlation between the model distribution and the data samples using probability plot correlation coefficients (PPCC) NIST/SEMATECH (2012). The PPCC is the correlation coefficient between the paired quantiles and measures the agreement of the fitted distribution with the observed data (i.e., goodness-of-fit). The closer the correlation coefficient is to one, the higher the positive correlation and the more likely the distributions are shifted and/or scaled versions of each other. Furthermore, by comparing against critical values of the PPCC (see Vogel and Kroll (1989) and NIST/SEMATECH (2012)), we can use the PPCC as a statistical test, which is closely related to the Shapiro-Wilk test Shapiro and Wilk (1965) and can reject the hypothesis that the data samples match the assumed model distribution. Furthermore, we can use the correlation to test the hypothesis of no correlation by transforming the correlation to create a t-statistic.

The obvious linearity of the Q-Q plot, see Fig. 2 (left), is reflected by a PPCC of 444Mean of several runs with uniform randomly selected samples., which is substantially higher than the critical value of (see Vogel and Kroll (1989)) and thus the hypothesis of identical distributions can not be rejected. Furthermore, the hypothesis of no correlation is rejected at ().

3.3 The Radii follow a Half-Gaussian Distribution

Our second model hypothesis is that the radii of the salient object locations follow a half-Gaussian distribution. We have to consider a half-Gaussian distribution in the interval

, because the radius – as a length – is by definition positive. If we consider the image borders, we could assume a two-sided truncated distribution, but we have three reasons to work with a one-sided model: The variance of the radii seems sufficiently small, the

“true” centroid of the salient object may be outside the image borders (i.e., parts of the salient object can be truncated by the image borders), and it facilitates the use of various, well-known statistical tests (see Schauerte and Stiefelhagen (2013)).

We can use a Q-Q plot against a half-Gaussian distribution to graphically assess the hypothesis, see Fig. 2

(middle). The linearity of the points suggests that the radii are distributed according to a half-Gaussian distribution. The visible outliers in the upper-right are caused by less than 30 centroids that are highly likely to be disturbed by the image borders. Please be aware of the fact that it is not necessary to know the exact distribution parameters when working with Q-Q plots as long as the distributions are linearly related (see

NIST/SEMATECH (2012)). Furthermore, we transform the polar coordinates in such a way that they represent the same point with a combination of positive angles in and radii in . This way, we can compare the distribution of the transformed radii against a normal distribution with its mode and mean at , see Fig. 2 (right).

The obvious correlation that is visible in the Q-Q plots, see Fig. 2 (middle and right), is reflected by a PPCC of , which is above the critical value of (see NIST/SEMATECH (2012)). The hypothesis of no correlation is rejected at ().

4 Quantifying the Influence on Salient Object Detection

To assess the influence of the center bias on pixel- and object-based salient object detection, we integrate a Gaussian center bias into the algorithms by Achanta et al. Achanta and Süsstrunk (2010) and Cheng at al. Cheng et al. (2011).

4.1 Center Biased Saliency Models

4.1.1 Pixel-based

As a pixel-based model, we use maximum symmetric surround saliency detection by Achanta et al. Achanta and Süsstrunk (2010) in combination with a Gaussian center bias map (cf., e.g., Judd et al. (2009); Borji et al. (2012)). To this end, we define the center bias saliency map

(1)
(2)

where is the pixel coordinate, is the image center’s coordinate, and and

are the standard deviation in x- and y-direction depending on the image width and height, respectively.

In order to investigate the influence of the center bias, we investigate different, plausible strategies to investigate the combination of the bottom-up and center bias saliency maps and , respectively:

(3)

where is the chosen center bias integration scheme.

We consider the following schemes, cf. Schauerte and Stiefelhagen (2012): First, a convex, linear integration, i.e. with (). Second, multiplicative integration as a supra-linear combination method, i.e. , where denotes the Hadamard product. Third, the minimum as a further, alternative supra-linear combination, i.e. . Fourth, the maximum to realize a late, sub-linear combination scheme, i.e. . All these schemes are also related to different Fuzzy logic interpretations, which might provide a common theoretical framework and interpretation throughout later applications (e.g., Schauerte et al. (2009)). To improve the readability, we refer to the linear combination for explicit center bias integration – unless stated otherwise, of course – in the following .

4.1.2 Segmentation-based

Figure 3: An example illustrating the influence of the implicit center bias in the region contrast method by Cheng et al. Cheng et al. (2011). Left-to-right: Image, region contrast (w/o explicit center bias), and locally debiased region contrast (w/o explicit center bias).
Figure 4: Examples of the influence of the implicit and explicit center bias on segmentation-based salient object detection. Left-to-right: Image, region contrast without and with center bias (RC and RC+CB, resp.), and locally debiased region contrast without and with center bias (LDRC and LDRC+CB, resp.).

As a segmentation-based model, we adapt Cheng et al.’s region contrast model Cheng et al. (2011). This model is particularly interesting, because it already provides state-of-the-art performance, which is partially caused by an implicit center bias as we will show in the following. This way, we can observe how the model behaves if we remove the implicit center bias – which was neither motivated nor explained by the authors – and add an explicit Gaussian center bias. The spatially weighted region contrast saliency equation is defined as follows

(4)
(5)

is the weight of region , which equals the number of pixels in – i.e., – to emphasize color contrast to bigger regions. is the color distance metric between the two regions

(6)

where is the (frequentist) probability of the i-th color among all colors in the k-th region , which is determined using a color histogram. The probability of the color inside the regions is used as weight to emphasize color differences between dominant colors. measures the distance between the colors and in the following it is defined as being the Euclidean distance in the CIE Lab color space. Finally, is the spatial distance between regions and , where controls the spatial weighting. The spatial distance between two regions is defined as the Euclidean distance between the centroids of the respective regions using pixel coordinates that are normalized to the range . Smaller values of influence the spatial weighting in such a way that the contrast to regions that are farther away contributes less to the saliency of the current region.

It is this unnormalized Gaussian weighted Euclidean distance that causes an implicit Gaussian-like center bias (see Fig. 3 and 5), because it favors regions whose distances to the other neighbors are smaller, which is – in general – the case for segments at the center of the image. Although this biased distance function has a significant impact on the performance, its choice has not been clearly motivated, discussed, or evaluated by Cheng et al. To remove this implicit bias, we introduce a normalized, i.e. locally debiased, distance function that still weights close-by regions higher than further away regions, but does not lead to an implicit center bias

(7)
(8)

Similar to the pixel-based model (see Sec. 4.1.1), we can now integrate an explicit center bias into the segmentation-based model

(9)

Here, is the chosen center bias integration function as in Eq. 3. Furthermore, denotes the centroid of region and is defined as in Eq. 2.

4.2 Quantitative Evaluation

4.2.1 Dataset

As for the graphical investigation of our hypotheses using Q-Q plots (see Fig. 2), we use the manually annotated segmentation masks by Achanta et al. Achanta et al. (2009); Achanta and Süsstrunk (2010), see Sec. 3, to quantify the influence of the Gaussian center bias on salient object detection.

4.2.2 Baseline Algorithms

Figure 5: Illustration of the implicit center bias in the method by Cheng et al. Cheng et al. (2011). Left: Each pixel shows the distance weight sum, i.e. , to all other pixels in a regular grid. Right: The average weight sum depending on the centroid location calculated on the Achanta/Liu data set using Felzenszwalb’s segmentation method Felzenszwalb and Huttenlocher (2004).

In order to compare our results, we use a set of saliency detection algorithms that we group into two coarse categories: First, algorithms that were specifically proposed for salient object detection and, second, algorithms that have been proposed and evaluated in other contexts. From the second category, we use: The well-known saliency model by Itti and Koch Itti et al. (1998), Graph-Based Visual Saliency (GBVS) by Harel at al. Harel et al. (2007), Context-Aware Saliency (CAS) by Goferman et al. Goferman et al. (2010, 2012), and the FFT’s spectral residuals (FFT) and DCT image signatures (DCT) by Hou et al. Hou and Zhang (2007); Hou et al. (2012). For FFT and DCT, we optimized the resolution at which the saliency maps are calculated, which is the most important algorithm parameter and has a significant influence on the performance555We were surprised by the fact that the spectral approaches (i.e., FFT and DCT) performed so well, because the previously reported results for FFT stated otherwise (see, e.g., Achanta et al. (2009); Klein and Frintrop (2011); Cheng et al. (2011)). However, this can probably be explained by the fact that we analyzed the influence of the saliency map resolution on these approaches, which is their most important parameter and has a considerable influence on the results.. As baseline for salient object detection algorithms (first category), we use: The Frequency-Tuned model (FT) by Achanta et al. Achanta et al. (2009)666When comparing with the results in Achanta et al. (2009), please read the erratum that has been published at http://ivrg.epfl.ch/supplementary_material/RK_CVPR09, the Bonn Information-Theoretic Saliency model (BITS) by Klein et al. Klein and Frintrop (2011), the Maximum Symmetric Surround Saliency (MSSS) model by Achanta et al. Achanta and Süsstrunk (2010), and the Region Contrast (RC) model by Cheng et al. Cheng et al. (2011) that uses Felzenszwalb’s image segmentation method Felzenszwalb and Huttenlocher (2004). The latter two are the original algorithms we adapted.

Of course, we evaluate our adapted, center biased models: The maximum symmetric surround saliency with center bias (MSSS+CB; see Sec. 4.1.1) and the region contrast model with explicit center bias (RC+CB; see Sec. 4.1.2). In order to investigate the influence of the implicit center bias in the region contrast model (see Sec. 4.1.2), we calculate the performance of the locally debiased region contrast model without and with explicit center bias (LDRC and LDRC+CB, respectively; see Sec. 4.1.2). Additionally, as a reference we provide the results for the standalone segment-based and pixel-based center bias models, i.e. (CB and CB, respectively).

Figure 6: Precision-recall curves for all evaluated models with full (top) and limited range of the precision (bottom). This graphic is best viewed in color.
Implementation notes

If available, we used the reference implementations that have been provided by the authors. For MSSS we use the C++ implementation by Achanta, because it provides a better performance than the basic Matlab implementation. For Itti we use the iLab Neuromorphic Vision Toolkit (iNVT). We integrated the methods directly into Matlab (mex) in order to avoid quantization and/or compression artifacts that may occur due to saving and loading them as images. For DCT and FFT, we used the implementations in our publicly available Matlab toolbox Schauerte (2011). All calculations have been made using double precision arithmetic. To make our results as reproducible as possible (we have observed that the precision-recall curves of different authors vary), we will make our implementations and evaluation scripts open source. We would like to note that our evaluation measure implementations follow the implementations of Weka and LingPipe. The corresponding precision-recall curves and results of further baseline algorithms can be seen in Fig. 6.

4.2.3 Measures

We can use the binary segmentation masks for saliency evaluation by treating the saliency maps as binary classifiers. At a specific threshold

we regard all pixels that have a saliency value above the thresholds as positives and all pixels with values below the thresholds as negatives. By sweeping over all thresholds , we can evaluate the performance using common binary classifier evaluation measures.

Most commonly, precision-recall curves are used – e.g., by Achanta et al. Achanta and Süsstrunk (2010); Achanta et al. (2009), Cheng et al. Cheng et al. (2011), and Klein et al. Klein and Frintrop (2011)

– to evaluate the salient object detection performance. We use five evaluation measures to quantify the performance of the algorithms. We calculate the area under curve (AUC) of the (interpolated) precision-recall curve (PR) and the receiver operating characteristic (ROC) curve

Davis and Goadrich (2006). Complementary to the PR AUC, we calculate the maximum and scores with

(10)

with has been proposed by Achanta et al. to weight precision more than recall for salient object detection Achanta et al. (2009). Additionally, we calculate the hit-rate (HR) that measures how often the pixel with the maximum saliency belongs to the salient object.

4.2.4 Results

Explicit center bias integration type

How does the performance depend on the chosen center bias integration? To investigate this question, we tested the minimum, maximum, and product as alternative combinations. To account for the influence of different value distributions within the normalized value range, we also weighted the input of the and operation (e.g., ). The results of the algorithms using different combination types are shown in Tab. 1. The presented results are the results that we achieve with the center bias weight that results in the highest score.

In Tab. 1, we can see that the linear combination is the best choice for LDRC+CB. However, for MSSS+CB and RC+CB the product seems to be the combination that provides the best performance. Apparently MSSS+CB benefits more from using the product as combination type than RC+CB. Also interesting to note is that LDRC+CB with the product as combination achieves similar results to RC. However, LDRC+CB remains the algorithm that provides the best performance in terms of score and score whereas RC+CB provides the best performance in terms of PR AUC and HR. Interestingly, LDRC+CB and RC+CB achieve a nearly identical ROC AUC.

Method Combination PR ROC HR
LDRC+CB Linear/Convex 0.8034 0.8183 0.8800 0.9624 0.9240
LDRC+CB Max 0.7504 0.7561 0.8108 0.9422 0.8630
LDRC+CB Min 0.7897 0.8049 0.8584 0.9535 0.8880
LDRC+CB Product 0.7883 0.8024 0.8704 0.9578 0.9130
RC+CB Linear/Convex 0.7973 0.8120 0.8833 0.9620 0.9340
RC+CB Max 0.7855 0.7993 0.8710 0.9568 0.9140
RC+CB Min 0.7962 0.8150 0.8807 0.9603 0.9180
RC+CB Product 0.7974 0.8136 0.8878 0.9623 0.9460
MSSS+CB Linear/Convex 0.7490 0.7678 0.8265 0.9495 0.8900
MSSS+CB Max 0.7165 0.7337 0.7849 0.9270 0.8420
MSSS+CB Min 0.7373 0.7606 0.8211 0.9339 0.9140
MSSS+CB Product 0.7523 0.7748 0.8398 0.9445 0.9350
LDRC 0.7574 0.7675 0.8302 0.9430 0.8680
RC 0.7855 0.7993 0.8710 0.9568 0.9140
MSSS 0.7165 0.7337 0.7849 0.9270 0.8420
CB 0.5793 0.5764 0.5920 0.8623 0.6980
CB 0.5604 0.5452 0.5638 0.8673 0.7120
Table 1: The maximum score, maximum score, PR AUC (PR), ROC AUC (ROC), and Hit-Rate (HR) that we obtain using different combination types.
Convex center bias weight

How does the weight of the center bias influence the performance? To answer this question, we calculated the performance of LDRC+CB, RC+RB, and MSSS+CB with in steps. The resulting curves of the score, score, PR AUC, ROC AUC, and hit-rate are shown in Fig. 7(a), 7(b) and 7(c), respectively.

For each of the three algorithms the values of that lead to the optimal score, score, PR AUC, and ROC AUC lie within a small interval. In contrast, for all algorithms the value of that achieves the highest hit-rate is outside these intervals and substantially higher. Furthermore, the best weight for each measure depends on the algorithm and varies substantially. It is interesting to see that small weights only have a minor (yet positive) influence on RC+CB until a point is reached (roughly at ) where the performance begins to drop significantly. This becomes especially apparent when comparing the curves of RC+CB, see Fig. 7(b), with the curves of LDRC+CB, see Fig. 7(a).

(a) RC
(b) MSSS
(c) LDRC
Figure 7: Illustration of the influence of the weight on the performance of RC+CB, LDRC+CB, and MSSS+CB (convex combination).
Quantitative comparison

The center bias itself already has a considerable predictive power, see Tab. 2, and is relatively close to the performance of FT. However, there is a substantial performance gap between the standalone center bias models (CB and CB) and good non-biased methods such as, e.g., MSSS and LDRC.

Method PR ROC HR
LDRC+CB 0.8034 0.8183 0.8800 0.9624 0.9240
RC+CB 0.7973 0.8120 0.8833 0.9620 0.9340
RC 0.7855 0.7993 0.8710 0.9568 0.9140
MSSS+CB 0.7490 0.7678 0.8265 0.9495 0.8900
LDRC 0.7574 0.7675 0.8302 0.9430 0.8680
BITS 0.7342 0.7582 0.7589 0.9316 0.7540
MSSS 0.7165 0.7337 0.7849 0.9270 0.8420
FFT 0.6455 0.6375 0.6593 0.8926 0.8080
DCT 0.6472 0.6368 0.6612 0.8962 0.8270
GBVS 0.6403 0.6242 0.6970 0.9088 0.8480
FT 0.5995 0.6009 0.6261 0.8392 0.7100
CB 0.5793 0.5764 0.5920 0.8623 0.6980
CAS 0.5857 0.5615 0.5888 0.8741 0.6920
CB 0.5604 0.5452 0.5638 0.8673 0.7120
iNVT 0.3383 0.4012 0.4396 0.5768 0.6870
Table 2: The maximum score, maximum score, PR AUC (PR), ROC AUC (ROC), and Hit-Rate (HR) of the evaluated algorithms (sorted ascending by ).
Method Baseline PR ROC HR
LDRC RC 96.4 96.0 95.3 98.6 95.0
RC+CB RC 101.5 101.6 101.4 100.5 102.2
LDRC+CB RC 102.3 102.4 101.0 100.6 101.1
LDRC+CB LDRC 106.1 106.6 106.0 102.1 106.5
MSSS+CB MSSS 104.5 104.7 105.3 102.4 105.7
Table 3: Relative performance (in %) of our adapted algorithms with respect to their baseline.

As could be expected, the performance of RC drops substantially if we remove the implicit center bias as is done by LDRC (see Sec. 4.1.2), which can best be seen in Tab. 3. What happens if we add our explicit center bias model to unbiased models? As can be seen in the performance difference between MSSS and MSSS+CB as well as the performance difference between LDRC to LDRC+CB, the performance is substantially increased with respect to all evaluation measures, see Tab. 2 and 3. Interestingly, the relative performance improvement from pixel-based MSSS to MSSS+CB and segment-based LDRC to LDRC+CB is comparable, see Tab. 3. Furthermore, with the exception of HR, the performance of LDRC+CB and RC+CB is nearly identical with a slight advantage for LDRC+CB (see Tab. 2 and Tab. 3). This indicates that we did not lose important information by debiasing the distance metric (LDRC+CB vs RC+CB) and that the explicit Gaussian center bias model is advantageous compared to the implicit weight bias (LDRC+CB and RC+CB vs RC).

In summary, MSSS+CB provides a substantially higher performance than MSSS and outperforms, e.g., FT and BITS. RC+CB and LDRC+CB provide a better performance than their unbiased counterparts RC and LDRC, respectively. Furthermore, their performance is very similar and both outperform all other models. Interestingly, LDRC is the best model without center bias in our evaluation on Achanta’s data set. This makes LDRC an interesting candidate for applications in which the image data can not be expected to have a photographer’s center bias (e.g., image data of surveillance cameras, autonomous robots, or human-robot interaction Schauerte and Stiefelhagen (2014)).

Statistical significance

One question remains: Does the integration of an explicit center bias result in a statistically significant performance improvement? To address this question, we test the performance (i.e., , , PR, and

ROC) of LDRC and MSSS with and without an explicit center bias. For this purpose, we rely on two pairwise, two-sample t-tests: First, we perform a two-tailed test to check whether the compared performances with and without an integrated center bias come from distributions with equal means (i.e.,

: “means are equal”). Second, we perform a one-tailed test to check whether the performance with an integrated center bias is worse that without an integrated center bias, i.e. the center biased performance distribution’s mode is lower (i.e., : “mean is lower”). If we can reject both hypotheses, then it is clear that the performance of the algorithm has significantly improved due to the integrated center bias. All tests are performed at a confidence level of , i.e., .

For MSSS, we can reject the hypothesis of equal mean for , , PR, and ROC with , , , and , respectively. Additionally, we can reject the hypothesis that an integrated center bias has a negative influence on the performance with , , , and .

Similarly, we can reject the hypothesis that the performance of LDRC with and without center bias has an equal mean for , , PR, and ROC with , , , and , respectively. And, we can reject the hypothesis that an integrated center bias has a negative influence on the performance with , , , and .

Consequently, it is apparent that the integration of a center bias can lead to statistically significant performance improvements for pixel-based as well as segmentation-based algorithms.

5 Conclusion

We formulated and investigated two hypotheses about the location of salient objects in photographs: First, the radial centroid distribution around the image center is uniform. Second, the distances between their centroids and the image center follow a normal distribution. We investigated these hypotheses using graphical methods, which indicate that our hypotheses are true. This is an important insight, because it provides a strong empirical motivation and justification for the widely applied Gaussian center bias models. To investigate the influence of the center bias on salient object detection, we explicitly integrated the center bias model in two state-of-the-art salient object detection algorithms. We have shown that the explicitly modeled center bias has a significant, positive influence on the performance (in terms of hit-rate, the area under the precision-recall curve, the area under the receiver operating characteristic curve, the score, and the score). Last but not least, by debiasing Cheng et al.’s region contrast model, we have exemplarily shown that implicit center biases might at least partially be responsible for the performance of state-of-the-art salient object detection algorithms and as a consequence we introduced an adapted, non-biased salient object detection algorithm.

References

  • Judd et al. (2009) T. Judd, K. Ehinger, F. Durand, A. Torralba, Learning to predict where humans look, in: Proc. Int. Conf. Comp. Vis., 2009.
  • Yang et al. (2010) Y. Yang, M. Song, N. Li, J. Bu, C. Chen, What is the chance of happening: a new way to predict where people look, in: Proc. European Conf. Comp. Vis., 2010.
  • Borji et al. (2012) A. Borji, D. N. Sihite, L. Itti, Probabilistic learning of task-specific visual attention, in: Proc. Int. Conf. Comp. Vis. Pat. Rec., 2012.
  • Jiang et al. (2011) H. Jiang, J. Wang, Z. Yuan, T. Liu, N. Zheng, Automatic salient object segmentation based on context and shape prior, in: Proc. British Mach. Vis. Conf., 2011.
  • Tseng et al. (2009) P.-H. Tseng, R. Carmi, I. G. M. Cameron, D. P. Munoz, L. Itti, Quantifying center bias of observers in free viewing of dynamic natural scenes, Journal of Vision 9 (2009).
  • Schauerte and Stiefelhagen (2013) B. Schauerte, R. Stiefelhagen, How the distribution of salient objects in images influences salient object detection, in: Proc. Int. Conf. Image Process., 2013.
  • Liu et al. (2007) T. Liu, J. Sun, et al., Learning to detect a salient object, in: Proc. Int. Conf. Comp. Vis. Pat. Rec., 2007.
  • Achanta et al. (2009) R. Achanta, S. Hemami, F. Estrada, S. Süsstrunk, Frequency-tuned Salient Region Detection, in: Proc. Int. Conf. Comp. Vis. Pat. Rec., 2009.
  • Cheng et al. (2011) M.-M. Cheng, G.-X. Zhang, N. J. Mitra, X. Huang, S.-M. Hu, Global contrast based salient region detection, in: Proc. Int. Conf. Comp. Vis. Pat. Rec., 2011.
  • Achanta and Süsstrunk (2010) R. Achanta, S. Süsstrunk, Saliency detection using maximum symmetric surround, in: Proc. Int. Conf. Image Process., 2010.
  • Einhäuser et al. (2008) W. Einhäuser, M. Spain, P. Perona, Objects predict fixations better than early saliency, Journal of Vision 8 (2008).
  • Schauerte and Stiefelhagen (2012) B. Schauerte, R. Stiefelhagen, Quaternion-based spectral saliency detection for eye fixation prediction, in: Proc. European Conf. Comp. Vis., 2012.
  • Reinagel and Zador (1999) P. Reinagel, A. M. Zador, Natural scene statistics at the centre of gaze, in: Network: Computation in Neural Systems, 1999, pp. 341–350.
  • Parkhurst and Niebur (2003) D. Parkhurst, E. Niebur, Scene content selected by active vision, Spatial Vision 16 (2003) 125–154.
  • Tatler (2007) B. W. Tatler, The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions, Journal of Vision 7 (2007).
  • Klein and Frintrop (2011) D. A. Klein, S. Frintrop, Center-surround divergence of feature statistics for salient object detection, in: Proc. Int. Conf. Comp. Vis., 2011.
  • Luo and Tang (2008) Y. Luo, X. Tang, Photo and video quality evaluation: Focusing on the subject, in: Proc. European Conf. Comp. Vis., 2008.
  • Tsotsos (2011) J. K. Tsotsos, A Computational Perspective on Visual Attention, The MIT Press, 2011.
  • Alexe et al. (2010) B. Alexe, T. Deselaers, V. Ferrari, "What is an object?", in: Proc. Int. Conf. Comp. Vis. Pat. Rec., 2010.
  • Itti and Baldi (2006) L. Itti, P. F. Baldi, Bayesian surprise attracts human attention, in: Advances in Neural Information Processing Systems, 2006.
  • Borji et al. (2012) A. Borji, D. N. Sihite, L. Itti, Salient object detection: A benchmark, in: Proc. European Conf. Comp. Vis., 2012.
  • Scharfenberger et al. (2013) C. Scharfenberger, A. Wong, K. Fergani, J. S. Zelek, D. A. Clausi, Statistical textural distinctiveness for salient region detection in natural images, in: Proc. Int. Conf. Comp. Vis. Pat. Rec., 2013.
  • Busswell (1935) G. T. Busswell, How people look at pictures: A study of the psychology of perception in art, University of Chicago Press, 1935.
  • Parkhurst et al. (2002) D. Parkhurst, K. Law, E. Niebur, Modeling the role of salience in the allocation of overt visual attention, Vision Research 42 (2002) 107–123.
  • NIST/SEMATECH (2012) NIST/SEMATECH, Engineering Statistics Handbook, 2012.
  • Vogel and Kroll (1989) R. M. Vogel, C. N. Kroll, Low-flow frequency analysis using probability-plot correlation coefficients, Journal of Water Resources Planning and Management 115 (1989) 338–357.
  • Shapiro and Wilk (1965) S. S. Shapiro, M. B. Wilk, An analysis of variance test for normality (complete samples), Biometrika 52 (1965) 591–611.
  • Schauerte and Stiefelhagen (2012) B. Schauerte, R. Stiefelhagen,

    Predicting human gaze using quaternion dct image signature saliency and face detection,

    in: Proc. Workshop on the Applications of Computer Vision, 2012.

  • Schauerte et al. (2009) B. Schauerte, J. Richarz, T. Plötz, C. Thurau, G. A. Fink, Multi-modal and multi-camera attention in smart environments, in: Proc. Int. Conf. Multimodal Interfaces, 2009.
  • Felzenszwalb and Huttenlocher (2004) P. F. Felzenszwalb, D. P. Huttenlocher, Efficient graph-based image segmentation, Int. J. Comput. Vision 59 (2004) 167–181.
  • Itti et al. (1998) L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Trans. Pattern Anal. Mach. Intell. 20 (1998) 1254–1259.
  • Harel et al. (2007) J. Harel, C. Koch, P. Perona, Graph-based visual saliency, in: Advances in Neural Information Processing Systems, 2007.
  • Goferman et al. (2010) S. Goferman, L. Zelnik-Manor, A. Tal, Context-aware saliency detection, in: Proc. Int. Conf. Comp. Vis. Pat. Rec., 2010.
  • Goferman et al. (2012) S. Goferman, L. Zelnik-Manor, A. Tal, Context-aware saliency detection, IEEE Trans. Pattern Anal. Mach. Intell. (2012).
  • Hou and Zhang (2007) X. Hou, L. Zhang, Saliency detection: A spectral residual approach, in: Proc. Int. Conf. Comp. Vis. Pat. Rec., 2007. doi:10.1109/CVPR.2007.383267.
  • Hou et al. (2012) X. Hou, J. Harel, C. Koch, Image signature: Highlighting sparse salient regions, IEEE Trans. Pattern Anal. Mach. Intell. 34 (2012) 194–201.
  • Schauerte (2011) B. Schauerte, Spectral visual saliency toolbox (SViST), 2011. URL: http://bit.ly/RAPmMk.
  • Davis and Goadrich (2006) J. Davis, M. Goadrich, The relationship between precision-recall and roc curves, in: Proc. Int. Conf. Machine Learning, 2006.
  • Schauerte and Stiefelhagen (2014) B. Schauerte, R. Stiefelhagen, Look at this! Learning to guide visual saliency in human-robot interaction, in: Proc. Int. Conf. Intell. Robots Syst., 2014.