A Replica Inference Approach to Unsupervised Multi-Scale Image Segmentation

06/28/2011 ∙ by Dandan Hu, et al. ∙ 0

We apply a replica inference based Potts model method to unsupervised image segmentation on multiple scales. This approach was inspired by the statistical mechanics problem of "community detection" and its phase diagram. Specifically, the problem is cast as identifying tightly bound clusters ("communities" or "solutes") against a background or "solvent". Within our multiresolution approach, we compute information theory based correlations among multiple solutions ("replicas") of the same graph over a range of resolutions. Significant multiresolution structures are identified by replica correlations as manifest in information theory overlaps. With the aid of these correlations as well as thermodynamic measures, the phase diagram of the corresponding Potts model is analyzed both at zero and finite temperatures. Optimal parameters corresponding to a sensible unsupervised segmentation correspond to the "easy phase" of the Potts model. Our algorithm is fast and shown to be at least as accurate as the best algorithms to date and to be especially suited to the detection of camouflaged images.



There are no comments yet.


page 11

page 12

page 13

page 14

page 15

page 16

page 17

page 23

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

“Image segmentation” refers to the process of partitioning a digital image into multiple segments based on certain visual characteristics book1 ; book2 ; book3 . Image segmentation is typically used to locate objects and boundaries in images. The result of image segmentation is a set of segments that collectively cover the entire image or a set of extracted contours of the image. This problem is challenging (see, e.g., Fig. (1)) and important in many fields. Examples of its omnipresent use include, amongst many others, medical imaging medical

(e.g., locating tumors and anatomical structure), face recognition

face , fingerprint recognition fingerprint , and machine vision mechine . Numerous algorithms and methods have been developed for image segmentation. These include thresholding thresholding , clustering clustering , compression compression and histogram based histogram approaches, edge detection edge , region growing region , split and merge split

, gradient flows and partial differential equation based approaches

kim ; levelset , graph partitioning methods and normalized cuts graphpartition ; normalizedcuts , Markov random fields and mean field theories markov1 ; markov2 ; markov3 ; markov4 , watershed transformation watershed , random walks randomwalks , isoperimetric methods isoperimetric

, neural networks

neural , and a variety of other approaches, e.g., model ; multi ; semi .

Figure 1: Examples of currently challenging problems in image segmentation. Left: The left image is that of zebra (courtesy of Ref.berkeley-zebra ) with the a similar“stripe” background. Right: The image on the right is that of a dalmatian dog dalmatian . Most people do not initially recognize the dog before given clues as to its presence. Once the dog is seen it is nearly impossible to perceive the image in a meaningless way. dalmatian

In this work, we will apply a “community detection” algorithm to image segmentation. This method belongs to the graph partitioning category. Community detection newman ; fortunato ; newman2 ; gmnn seeks to identify groups of nodes densely connected within their own group (“community”) and more weakly connected to other groups. A solution enables the partition of a large physically interacting system into optimally decoupled communities. The image is then divided into different regions (“communities”) based on a certain criterion, and each resulting region corresponds to an object in the original image.

It is notable that by virtue of its graph theoretical nature, community detection is suited for the study of arbitrary dimensional data. However, unlike general high dimensional graphs, images are two (or three) dimensional. Thus, real images are far simpler than higher dimensional data sets as, e.g., evinced by the four color theorem stating that four colors suffice to mark different neighboring regions in a segmentation of any two dimensional image. Thus, geometrical (and topological) constraints can be used to further improve the efficiency of the bare graph theoretical method. In glass1 ; glass2 , in the context of analyzing structures of complex physical systems such as glasses, we used geometry dependent physical potentials to set the graph weights in various two and three dimensional systems. In the case of image segmentation, in the absence of underlying physics, we will invoke geometrical cut-off scales.

In this work, we will discuss “unsupervised” image segmentation. By this term, we allude to a general multi-purpose segmentation method based on a general physical intuition. The current method does not take into account initial “training” of the algorithm- i.e., provide the system with known examples in which specific patterns are told to correspond to specific objects. We leave the study of supervised image segmentation and more sophisticated extensions of our inference procedure to a future work. One possible avenue which can be explored is the use of inference beyond that relating to different “replicas” in the simple form discussed in this manuscript that is built on prior knowledge (and prior probabilities in a Bayesian type analysis) of expected patterns in the studied images.

We will, specifically, apply the multiresolution community detection method, first introduced in peter1 , to investigate the overall structure at different resolutions in the test images. Similar to peter1 , we will employ information based measures (e.g., the normalized mutual information and the variation of information) to determine the significant structures at which the “replicas” (independent solutions of the same community detection algorithm) are strongly correlated. With the aid of these information theory correlations, we illustrate how we may discern structures at different pertinent resolutions (or spatial scales). An image may be segmented at different levels of detail and scales by setting the resolution parameters to these pertinent values. We demonstrate in a detailed study of various test cases, how our method works in practice and the resulting high accuracy of our image segmentation method.

Ii Outline

The outline of our work is as follows. In Section III, we introduce the Potts model representation of image segmentation and Potts model Hamiltonians that we will use. These Hamiltonians were earlier derived for graph theory applications. In Section IV, we discuss how we represent images as graphs. In Section V, we briefly define the key concepts of trials and replicas which are of great importance in our approach. In Sec. VI, we present our community detection algorithm. In Sec. VII, we discuss the multiresolution method and the information based measures. In Section VIII, we illustrate how replica correlations may be used to set graph weights. For the benefit of the reader, we compile the list of parameters in Section IX. We discuss the computational complexity of our method in Section X. In Sec. XI, we provide in silico “experimental results” of our image segmentation method when applied to many different examples. These examples include, amongst others, the Berkeley image segmentation and the Microsoft Research Benchmarks. We conclude in Sec. XII with a summary of our results. Specific aspects are further detailed in the appendices.

Iii Potts Models

In what follows, we will briefly elaborate on our particular Potts model representations of images and the corresponding Hamiltonians (energy or cost functions).

iii.1 Representation

As is well appreciated, different objects in an image or more general communities in complex graph theoretical problems are ultimately denoted by a “Potts type” potts variable . That is, if node lies in a community number then . If there are communities in the graph then can assume values . A state corresponds to a particular partition (or segmentation) of the system into communities (or objects). In the context of image segmentation, Potts model representations can, e.g., also be found in pottsmodel1 ; pottsmodel2 ; pottsmodel3 ; pottsmodel4 .

iii.2 Potts model Hamiltonian for unweighted graphs

In peter2 , a particular Potts model Hamiltonian was introduced for community detection. The ground states of this Hamiltonian (or lowest energy states) correspond to optimal partition of the nodes into communities. This Hamiltonian does not involve a comparison relative to random graphs (“null models”) fortunato and as such was free of the “resolution limit” problems fortunato ; rlimit1 ; rlimit2 wherein the number of found communities or objects scaled with the system size in a way that was independent of the actual system studied. In what follows below, there are elementary nodes in a graph (or pixels in an image), we consider general unweighted graphs in which any pair of nodes may be either linked with a uniform weight or not linked at all. Specifically, a link between sites and is associated with edge weights and . In these unweighted graphs, is an element of the adjacency matrix. That is, if nodes and are connected by an edge and otherwise. The weights .

The goal of the general (or “absolute”) Potts model Hamiltonian peter2 was to energetically favor any pair of linked nodes to be in the same community, to penalize for a pair of unlinked nodes to be in the same community and conversely for nodes in different communities (penalize for having two linked nodes be in different communities and favor disjoint nodes being in different communities). Putting all of these bare energetic considerations together (sans any comparisons to random graphs), the resulting Potts model Hamiltonian (or energy function) for a system of nodes simplifies to peter2 ; explain_solute


In Eq.(1), we emphasize the dependence of the Hamiltonian on the different variables at each lattice site (each of which can assume values). In what follows, the dependence of the Hamiltonian on will always be understood.

The Kronecker delta if and if . In this Hamiltonian, by virtue of the term, each spin interacts only with other spins in its own community. As such, the resulting model is local– a feature that enables high accuracy along with rapid convergence peter2 .

As noted above, minimizing this Hamiltonian corresponds to identifying strongly connected clusters of nodes. The parameter is the so called “resolution parameter” which adjusts the relative weights of the linked and unlinked edges, as in Eq. (1). This is easily seen by inspecting Eq.(1). A high value of leads to forbidding energy penalties unless all intra-community nodes “attract” one another and lie in the same community. By contrast, does not penalize the inclusion of any additional nodes in a given community and the lowest energy solution generally corresponds to the entire system.

iii.3 Potts Model Hamiltonian for Weighted Graphs

In weighted graphs, we assign edges between nodes with the respective weights based on the interaction strength (e.g., the (dis-) similarity of the intensity or color defines the edge weight in image segmentation problem). Specifically, in image segmentation problems, we determine (based on, e.g., color or intensity differences) the weight between each member of a node pair. We then shift each such value by an amount set by a background , i.e., . The subtraction relative to the background of allows for our community detection algorithm to better partition the network of pixels. In principle, this background can be set to be spatially non-uniform. However, in this work we set to be a constant. Thus, we generalize the earlier model of peter1 in Eq. (1) by the inclusion of a background and by allowing for continuous weights instead of discrete weights that are prevalent in graph theory. The resulting Hamiltonian glass1 ; glass2 reads


The form of this Hamiltonian and that of Eq. (1) was inspired by positive and negative energy terms that favor the formation of tightly bound clusters (or “solutes”) that are more weakly coupled to their surroundings explain_solute . Similar to the important effects of the solute found in physical systems mossa , the Hamiltonian of Eq.(2) captures all interactions in the system explain_solute . Earlier glass1 ; glass2 ), we invoked the Hamiltonian of Eq. (2) to analyze static and dynamic structures in glasses.

In Eq. (2), the number of communities may be specified from the input or left arbitrary and have the algorithm decide by steadily increasing the number of communities for which we have low energy solutions. The Heavyside functions “turns on” or “off” the edge designation [ and ] relative to the aforementioned background . As before, minimizing the Hamiltonian of Eq. (2) corresponds to identifying strongly connected clusters of nodes.

While in Eq. (1 (or Eq. 2),) the input concerns two-point () edge weights (or ) , it is, of course, possible to extend these Hamiltonian to allow for more general motifs (such as node triangles) and include point weights (and extensions thereof). These correspond to spin interactions. In the current study, however, we limit ourselves to node weights.

Iv Casting Images as Networks

We will now detail how we translate images into networks with general edge weights that appear in Eqs.(1, 2). We will represent pixels as the nodes in a graph. Edge weights define the (dis-) similarity between the neighborhood pixels.

Images may be broadly divided into two types: (a) those with the uniform and (b) those with varying intensity. “Uniform intensity” means that the entire object or each component is colored by one intensity or color. By the designation of “varying intensity”, we allude to objects or components that exhibit alternating intensities or colors, e.g., the stripes and spots seen in Fig. (1).

Regarding the above two types of images, two different methods may be employed to define the edge weights: (i) The intensity/color difference between nodes is defined as the edge weight in images with uniform intensity. (ii) The overlap between discrete Fourier transforms of blocks is defined as the edge weight in images with varying intensity. The second method is designed to distinguish the target and the background by their specific frequencies. We will detail both methods below in Sec. 

IV.1 (where we discuss images with uniform intensities) and Sec. IV.2 (spatially varying intensities).

iv.1 Edge definition for images with uniform intensity

For images of uniform intensity, we will define edges based on the color (dis-) similarity. For the unweighted Potts model of Eq. (1), we will assign an edge between two pixels ( and ) if the “color” difference () between them is less than some threshold (). That is,


For weighted Potts model in Eq. (2), we will, as we will elaborate on momentarily, set the weights to be the “color” difference () between nodes and , i.e.,


As seen from the energy functions of Eqs. (1, 2), a large dis-similarity favors nodes and being in different clusters.

A grey scale image is an image that in which the value of each pixel carries only intensity information. Images of this sort are composed exclusively of shades of gray, varying from black at the weakest intensity () to white at the strongest (). For a grey-scale image, the “color” difference is the absolute value of the intensity difference, i.e.,


A “color image” is an image that includes color information for each pixel. Each pixel contains three color components: red, green and blue (or RGB). The value of the intensity of each of these three components may attain any of values (any integer in the interval ). For a color-scale image, we define the “color” difference as the average of the differences between the color components red, green and blue. That is, with and respectively denoting the strengths of the red, green, and blue color components at site , we set


We do not store edges between every pair of nodes. Rather, edges connect nodes whose distance is less than a tunable value .

iv.2 Edge definition for images with varying intensities

Typically, images with varying intensities contain different patterns. To separate these patterns, we construct a “block-structure” containing the quintessential pattern information. We next introduce a method to divide blocks and then elaborate on two different ways to connect edges between blocks.

General contending pertinent scales may be determined by, e.g., examining the peaks of the Fourier transform of an entire image (whose location yields the inverse wave-length and whose width reveals the corresponding spatial extent of these modulations). While such simple transforms may aid optimization in determining candidate parameter scales, our algorithm goes far beyond such simple global measures.

iv.2.1 Overlapping blocks

Figure 2: [Color Online.] An example of overlapping blocks. The block size is . The nearest neighbor of the block enclosed by “purple” in x-direction is the one enclosed by “red”, and its nearest neighbor in y-direction is the one enclosed by “yellow”. They are connected due to the nearest neighbor condition.

We will divide an entire image of size into overlapping blocks. These blocks are centered about each (of the ) pixels and are of size . The dimensions of the individual blocks are, generally, far smaller than that of the entire system, . General scales can be gleaned from a Fourier transform of the entire image.

To construct the connection matrix between the blocks, we connect edges between each pair of blocks and set the distance between the nearest block pair to be . This choice has the benefit of overlapping the nearest neighbor blocks, which share more commons. Fig. 2 gives a schematic plot of the “overlapping block” structure.

iv.2.2 Average intensity difference between blocks

Following the construction of the overlap blocks structure, we next compute the average intensity of each block and connect the edges between blocks based on the difference. In this case, each block can be treated as a “super-node” which contains the pattern information of the studied image.

To further incorporate geometrical scales, we multiply the edge weights by (where

is a tunable length scale and the vectors

and denote the spatial locations of points and ). We remind the reader that there are basic blocks and thus possible values of (and possible values of ). We will set in Eq. (2), the weights between block and to be




As seen in Eq. (8), is the sum of the absolute values of the intensity differences between blocks and with each of these blocks being of size . In Eq. (7), is the physical distance between block and (i.e., the distance between the central nodes of each block).

The geometrical factor of () in Eq. (7) with a tunable length scale can be set to prefer (and, as we will illustrate also to detect) certain scales in the image. This enables the algorithm to detect clusters of varying sizes that contain rich textures.

iv.2.3 Fourier amplitude derived weights

As it is applied to image segmentation, the utility of Fourier transformations is well appreciated. We next discuss how to invoke these in our Potts model Hamiltonian. To highlight their well known and obvious use, we note that, e.g., the stripes of the zebra in Fig. 1 contain wave-vectors which are different from those of the more uniformly modulated background. Thus, a spatially local Fourier transform of this image may distinguish the zebra from the background. We will now invoke Fourier transforms in a general way in order to determine the edge weights in our network representation of the image.

With the preliminaries of setting up the block structure in tow, we now apply a discrete Fourier transform inside each block. Rather explicitly, excluding the spatial origin, the local discrete Fourier transform of a general quantity within block with internal Cartesian coordinates is


The wave-vector components and . In applications, we set, for grey-scale images, to be the intensity at site in block (a whose location relative to the origin of the entire image we denote by ). That is, . In color images, we set to be the average over the intensity of the red, green and blue components: . We fix the couplings between blocks and to be


We connect blocks whose spatial separation is less than the aforementioned tunable cutoff distance by links having edge weights . In practice, we fixed . With Eq. (10) in hand, we set


in Eq. (2). In this case, the background would be negative.

When inverting the sign of the left hand side of Eq.(11) (shown in Appendix C), our algorithm will be also suited for the detection of changing objects against a more uniform background.

We now briefly comment on the relation between the Fourier space overlaps and weights in Eqs.(10,11) and the real space overlaps and weights in Eqs.(7,8). It is notable that in Eq.(10), we sum over the modulus of the products of the Fourier amplitudes. By Parseval’s theorem, sans the modulus in the summation in Eq.(10), would be identical to the overlap in real space between and . Such real space overlaps directly relate to the real-space overlaps in Eq.(8) [following a replacement of the absolute value in Eq.(8) by its square and an overall innocuous multiplicative scale factor]. Thus, without the modulus in Eq.(10), the Fourier space calculation outlined above affords no benefit over its real space counter-part. Physically, the removal of the phase factors when performing the summation in Eq.(10) avoids knowledge of the relative location of the origins between different blocks. This allows different regions of a periodic pattern to be strongly correlated and clumped together. By contrast, for a periodic wave of a particular wave-vector, the real space overlaps between blocks and may vanish when the origins of blocks and are displaced by a real space distance that is equal to half of the wave-length of the periodic wave along the modulation direction. Thus, the real space weights as derived from Eqs.(7,8) may vanish when the corresponding Fourier space derived weights (Eqs.(10,11)) are sizable.

It is possible to improve on the simple Fourier space derived weights by a general wavelet analysis.

V Definitions: Trials and Replicas

In the following sections, we will discuss our specific algorithms for (i) community detection and (ii) multi-scale community detection. Before giving the specifics of our algorithms, we wish to introduce two concepts on which our algorithms are based. Both pertain to the use of multiple identical copies of the same system (image) which differ from one another by a permutation of the site indices. Thus, whenever the time evolution may depend on sequentially ordered searches for energy lowering moves (as it will in our greedy algorithm), these copies may generally reach different final candidate solutions. By the use of an ensemble of such identical copies, we can attain accurate result as well as determine information theory correlations between candidate solutions and infer from these a detailed picture of the system.

In the definitions of “trials” and “replicas” given below, we build on the existence of a given algorithm (any algorithm) that may minimize a given energy or cost function. In our particular case, we minimize the Hamiltonian of Eqs. (1, 2).

Trials. We use trials alone in our bare community detection algorithm peter1 ; peter2 . We run the algorithm on the same problem independent times. This may generally lead to different contending states that minimize Eqs. (1, 2). Out of these trials, we will pick the lowest energy state and use that state as the solution.

Replicas. We use both trials and replicas in our multi-scale community detection algorithm peter2 . Each sequence of the above described trials is termed a replica. When using “replicas” in the current context, we run the aforementioned trials (and pick the lowest solution) independent times. By examining information theory correlations between the replicas

we infer which features of the contending solutions are well agreed on (and thus are likely to be correct) and on which features there is a large variance between the disparate contending solutions that may generally mark important physical boundaries. We will compute the information theory correlations within the ensemble of

replicas. Specifically, information theory extrema as a function of the scale parameters, generally correspond to more pertinent solutions that are locally stable to a continuous change of scale. It is in this way that we will detect the important physical scales in the system.

These definitions might seem fairly abstract for the moment. We will flesh these out and re-iterate their definition anew when detailing our specific algorithms to which we turn next.

Vi The community detection algorithm

Our community detection algorithm for minimizing Eqs. (1, 2) follows four steps peter2 .

(1) We partition the nodes based on a “symmetric” or “fixed q” initialization ( is the number of community).

“Symmetric” initialization alludes to an initialization wherein each node forms its own community (and thus, initially, there are communities).

“Fixed q” initialization corresponds to a random initial distribution of all nodes into q communities.

For the application of image segmentation, “symmetric” initialization is used for the “unsupervised” case. In this case, the algorithm does not know what to look for, thus the “symmetric” initialization provides the advantage of no bias towards a particular community. The algorithm will decide the number of community by merging nodes for which we have lower energy solution.

“Fixed q” initialization may be used in a “supervised” image segmentation. The community membership of individual node will be changed to lower the solution energy. One has to decide how much information is needed by observing the original image and enter the number of communities as an input. Different levels of information correspond to different number of communities . For instance, if only one target needs to be identified, is enough. The communities include the target and background.

In the following sections, we will use the “unsupervised” image segmentation and let the algorithm decide the community number .

(2) Next, we sequentially “pick up” each node and place it in the community that best lowers the energy of Eqs. (1, 2) based on the current state of the system.

(3) We repeat this process for all nodes and continue iterating until no energy lowering moves are found after one full cycle through all nodes.

(4) We repeat these processes “t” times (trials) and select the lowest energy result as the best solution.

Vii Multi-scale networks

After determining for the adjacency matrix in Sec. IV.1 and Sec. IV.2, we now turn to the-so called “resolution parameter” () in Eq. (1)/Eq. (2). In peter1 , we introduced the multiresolution algorithm to select the best resolution. Our multi-scale community detection was inspired by the use of overlap between replicas in spin-glass physics. In the current context, we employ information-theory measures, to examine contending partitions for each system scale. Decreasing , the minima of Eqs. (1, 2) lead to solutions progressively lower intra-community edge densities, effectively “zooming out” toward larger structures. We determine all natural graph scales by identifying the values of for which the earlier mentioned “replicas” exhibit extrema in the average of information theory overlaps such as the normalized mutual information () and the variation of information () when expressed as functions of , . The extrema and plateau of the average information theory overlaps as a function of , over all replica pairs indicate the natural network scales peter1 . The replicas can be chosen to be identical copies of the same system for the detection of static structures, e.g., the image segmentation.

We will briefly introduce the information theory measures in the following section.

vii.1 Information theory measures

The normalized mutual information and the variation of information are the accuracy parameters which are employed to calculate the similarity (or overlap) between replicas.

We begin with a list of definitions of the information theory overlaps as they pertain to community detection.

Shannon Entropy: If there are communities in a partition , then the Shannon entropy is


where is the probability for a randomly selected node to be in a community , is the number of nodes in community and is the total number of nodes.

Mutual Information:

The mutual information between partitions found by two replicas and is


where is the number of nodes of community of partition that are shared with community of partition , (or ) is the number of communities in partition (or ) and (or ) is defined the same as before, i.e., the number of nodes in community (or community ).

Variation of information:

The variation of information between two partitions and is given by


Normalized Mutual Information:

The normalized mutual information is


Now, here is a key idea employed in peter1 which will be of great use in our image segmentation analysis: when taken over an entire ensemble of replicas, the average or indicates how strongly a given structure dominates the energy landscape. High values of (or low values of ) corresponds to more dominate and thus more significant structure. From a local point of view, at resolutions where the system has well-defined structure, a set of independent replicas should be highly correlated because the individual nodes have strongly preferred community memberships. Conversely, for resolutions “in-between” two strongly defined configurations, one might expect that independent replicas will be less correlated due to “mixing” between competing divisions of the graph.

vii.2 The application of the multiresolution algorithm for a hierarchal network example

We will shortly illustrate how the multiresolution algorithm peter1 works in practice by presenting an example of the multiresolution algorithm as it is applied to a hierarchal test system of nodes.

To begin the multiresolution algorithm, we need to specify the number of replicas at each test resolution, the number of trials per replica , and the starting and ending resolution . Usually, the number of replicas is , the number of trials is . As detailed in Section V, we select the lowest energy solution among the trials for each replica. The initial states within each of the replicas and trials are generated by reordering the node labels in the “symmetric” initialized state of one node per community. These permutations simply reorder the node numbers (with the image of under a permutation) and thus lead to a different initial state.

(1) The algorithm starts from the initialization of the system described in item (1) of Section VI.

(2) We then minimize Eq. (1) independently for all replicas at a resolution as described in Section VI. Initially (i.e., ).

(3) The algorithm then calculates the average inter-replica information measures like and at that value of .

(4) The algorithm then proceeds to the next resolution point (with ).

(5) We then return to step number (3).

(6) After examining the case of , the algorithm outputs the inter-replica information theory overlaps for entire the range of the resolutions studied (i.e., on the interval ).

(7) We examine those values of corresponding to extrema in the average inter-replica information theory overlaps. Physically, for these values the resulting image segmentation is locally insensitive to the change of scale (i.e., the change in ) and generally highlights prominent features of the image.

With and denoting graph partitions in two different replicas and their overlap, these average inter-replica overlaps for a general quantity peter1 are explicitly


Similarly, for a single replica quantity (such as the Shannon entropy for partitions in different replicas), the average is, trivially,

. (Averages for higher order inter-replica cumulants may be similarly written down with a replica symmetric probability distribution function

peter1 .)

Figure 3: Heterogeneous hierarchical system corresponding to the plots in Fig. 4. In this figure, the node system is divided into a three-level hierarchy. Level has communities with sizes from to nodes. Level has communities with sizes from to nodes. Level is the completely merged system of nodes. The average edge density is . This system has edges.
(a) Plot of information measures , and the community number vs the Potts model weight in Eq. (1) for the three-level heterogeneous hierarchy depicted in Fig. 3.
(b) Plot of information measures , and the community number vs Potts model weight in Eq. (1) for the three-level heterogeneous hierarchy depicted in Fig. 3. .
Figure 4: Plot of information measures , , and vs the Potts model weight in Fig. 3. In panel(a), the peak (plateau) denoted by the arrows correspond to levels 2 and 3 of the hierarchy depicted in Fig. 3. Similarly in panel (b), the minimal values, indicated by arrows, accurately correspond to levels 2 and 3 of the hierarchy. The number of communities is and in disparate plateau regions (denoted by the arrows) in both panels. These communities assignments (and, obviously, also their numbers) are exactly the same as those of the communities in levels 2 and 3 of the original hierarchical graph of Fig. 3. In panels (a) and (b), both the mutual information and the Shannon entropy display a plateau behavior corresponding to the correct solutions.

Fig. 4 shows the result of multiresolution algorithm applied to the three-level hierarchy system in Fig. 3. The system investigated is that of a standard simple graph with unweighted links (i.e., in Eq.(1), if nodes and share an edge and is zero otherwise). In Fig. 3, “level ” communities exhibit a density of links (i.e., a fraction of the intra-community node pairs are connected by a link ()). The individual communities in level have sizes that range between to nodes. The less dense communities in level harbor a density of links ; the nodes in this case, are divided into five groups with sizes that vary from to . Highest up in the hierarchy is the trivial level “partition”- that of a completely merged system of nodes. Thus, as a function of , this easily solvable system exhibits “transitions” between different stable solutions corresponding to different regions (or basins) of . In Sections(X, XI.7), we will further discuss additional transitions between easy solvable regions and regions of parameter space which are very “hard” or impossible (“unsolvable”).

Fig. 4(a) depicts the averages of (on the left axis) and (right axis) over all replica pairs. (We further provide in this figure the number of communities .) A “cascade” composed of three plateaus is evident in these information theory measures. Similarly, Fig. 4(b) shows the in left axis and in right axis average over all replica pairs. The extrema denoted by the arrows in both panel (a) and (b) are the correctly identified levels and respectively of the hierarchy depicted in Fig. 3. The two plateaus with the peak values in panel(a) correspond to a normalized mutual information of size (the highest theoretically possible) and similarly the corresponding minima in panel(b) have a variation of information (the smallest value possible) for the same range of values. These extreme values of and indicate perfect correlations among the replicas for both levels of the hierarchy. The “plateaus” in , and are also important indicators of system structure. These plateau (and more general extrema elsewhere) illustrate when the system is insensitive to parameter changes and approaches stable solutions. In Section X (and in Eq.(21) in particular), we will discuss this more generally in the context of the phase diagram of the community detection problem.

Viii Replica correlations as weights in a graph

Within the multiresolution method, significant structures are identified by strongly-correlated replicas (multiple copies of the studied system). Thus, if a node pair is always in the same community in all replicas, the two nodes must have strong preference to be connected or have a large edge weight. Similarly, if a node pair is not always in the same community in all replicas, they must have preference not to be connected or have a small edge weight. We re-assign edge weights based on the correlations between replicas.

Specifically, we first generate replicas by permuting the “symmetric” initialized state of one node per cluster of the studied system, then apply our community detection algorithm to each replica and record the community membership for each node. We then calculate the probability of each node pair based on the statistics of replicas. The probability is defined as follows




In Eq. (18), when node and belongs to the same community in replica , i.e., , . When node and are not in the same community in replica , i.e., (we use and to represent these two different communities, where and . and denote the size of cluster and in replica ), . As throughout, is the distance between node and in replica . In Eq. (17), we sum the probability in each replica to define the edge weight. The assigned weights given by Eq. (17) are based on a frequency type inference. Although we will not report on it in this work, it is possible to perform Bayesian analysis with weights (“priors”) that are derived from a variant of Eq. (18); this enables an inference of the correlations from the sequence of results concerning the correlations between nodes and in a sequence of different replicas.

In unweighted graphs, we connect nodes if the edge weight between the node pair is larger than some threshold value in Eq. (1), i.e.,


In weighted graph, the analog of Eq. (2) is the Hamiltonian given by


That is, when there is a high probability , relative to a background threshold , that nodes and are linked, we assign a positive edge weight to the link of size . Similarly, if the probability of a link is low, we assign a negative weight of size .

Armed with Eq. (20), we then minimize in an identical fashion to the minimization of Eq. (2) that we discussed earlier. Specifically, we follow the 4 steps outlined in Section VI for non multi-scale images and the 7 steps of Section VII.2 in the analysis of general multi-scale systems.

Ix Summary of parameters

We now very briefly collect and list anew the parameters that define our Hamiltonians and appear in our methods.

The resolution parameter in Eqs.(1, 2, 20). This parameter sets the graph scale over which we search for communities. This parameter is held fixed (typically with a value of ) in the community detection method and varies within our multi-scale analysis. We determine the optimal value of by determining the local extrema of the average information theory overlaps between replicas.

The spatial scale in Eq.(7). Similar to the more general graph scale set by , we may determine optimal by examining extrema in the average inter-replica information theory correlations. In practice, in all but the hardest cases (i.e., the case of the dalmatian dog in Fig.(1)), we ignored this scale and fixed to be infinite. Fixing led to good results in the analysis of the dalmatian dog.

The spatial cutoff scale for defining link weights- see the brief discussions after Eqs.(6, 10). Whenever the spatial distance between two sites or blocks exceeded a threshold distance we set the link weight to be zero. We did not tune this parameter in any of the calculations. It was fixed to the value of .

The scale of the block size introduced in Section IV.2.1. This parameter is far smaller than the image size , yet large enough to cover the image features. We usually set to be for an image size of around .

The background intensities in Eq.(2) and in Eq.(20). Similar to the graph scale set by and , we may determine the optimal and by observing the local extrema of the average information correlations between replicas.

As we will elaborate on briefly, all optimal parameters can be found by determining the local extrema of the information theory correlations that signify no change in structure over variations of scale. In reality, we may fix some parameters and vary others–usually, fixed as , in the range of to , and , and been changed.

As an aside, in this brief paragraph, we briefly note for readers inclined towards spin glass physics and optimization theory that, in principle, in the large limit (images with a large number of pixels) the effective optimal values for the likes of the parameters listed above may be derived by solving the so-called “cavity” equations mezard2 that capture the maximal inference possible (in their application without the aid of replicas that we introduced here) infer ; lenka . In the current context, in applying these equations anew to image segmentation, we arrive at the maximal inference possible of objects in an image. While these equations are tractable for simple cases, solving these equations is relatively forbidding for general cases. In practice, we thus efficiently directly examine our Potts model Hamiltonians of Eqs. (1, 2, 20) and, when needed, directly infer optimal values of the parameters by examining inter-replica correlations as described in the earlier sections. This will be expanded on in the next section (specifically, in Eq.(21)). [Detailed applications of this method are provided in Section XI.7.]

X Computational Complexity, the Phase Diagram, and the determination of optimal parameters

Our community detection algorithm is very rapid. For a system with links, the typical convergence time scales as peter2 . In an image with pixels, with all of the constants (i.e., not scaling with the system size), the number of links .

Our general multi-scale community detection algorithm (that with varying ) has a convergence time peter1 . Thus, generally, for an image of size , the convergence time . Rapid convergence occurs in all but the “hard phase” of the community detection problem.

Specifically, we numerically investigated the phase diagram as a function of noise and temperature (i.e., when different configurations are weighted with a Boltzmann factor with at a temperature for general graphs with an arbitrary number of clusters in dandan .) Related analytic calculations were done for sparse graphs in lenka . In particular, in these and earlier works peter2 ; peter1

it was found that there is a phase transition between the detectable and undetectable clusters. The detectable phase further splinters into an “easy” and a “hard” phase. These three phases in the community detection problem constitute analogs of three related phases in the “SAT-hard-unSAT” in k-SAT problem

mezard1 ; mezard2 . The found phase diagram dandan exhibits universal features. Increasing the temperature can aid the detection of clusters dandan . The universal features of the phase diagram and the known cascade of transitions that appear on introducing temperature enable better confidence in the results of the community detection algorithm. One of the central results of Ref.dandan is that the “easy” solvable phase(s) of the community detection problem which leads to correct relevant solutions (i.e., not noisy partitions of a structureless system) universally appear in a “flat” peter1 ; dandan phase(s) [see also the flat information theoretic curves in Fig. (4) and related discussion in Section(VII.2)] as ascertained by the inter-replica averages of all thermodynamic and information theoretic quantities . These may correspond to the internal energy , average Shannon entropy (), average inter-replica normalized mutual information and variation of information (), the complexity mezard2 or an associated “susceptibility” peter1 ; dandan that monitors the onset of large complexity. [This susceptibility will be defined with the aid in the change in the average normalized mutual information as a function of the number of trials . It is defined as .] That is, with denoting a set of generalized parameters (e.g., artificially added additional noise in networks dandan , temperature dandan , or resolution parameter peter1 ), pertinent partitions appear for those values of the parameters for which


As alluded to above, a particular realization of Eq.(21) appears in the hierarchal system discussed in Section VII.2 wherein and . In that case, Eq.(21) was satisfied in well defined plateaus.

When present, crisp solutions are furthermore generally characterized by relatively high values of , and these correspond to the “easy phase” of the image segmentation problem. In Sec. XI.7, we will provide explicit analysis of the phase diagram and optimal parameters as they pertain to several example images.

All of the results (except the ones in Sec. XI.7) presented below in the current manuscript were attained at zero temperature and may be improved by the incorporation of thermal annealing as the results of dandan illustrate for general systems.

Xi Results

xi.1 Brain Image

xi.1.1 Unweighted graphs

(a) The result of the “multiresolution” algorithm applied to the unweighted brain image shown in (b): , and in terms of the resolution .
(b) The unweighted result of the brain images with different , which correspond to , , .
Figure 5: [Color Online.] The plot of the normalized mutual information , variation of information and the number of communities as a function of for the brain image. This image is reproduced with permission from the Iowa Neuroradiology Library. The axis for is on a logarithmic scale. There are three prominent peaks in the curve. We apply our community detection algorithm to the grey-scale brain image at these three values of s. The corresponding results are shown in panel (b). Note that the results show three-level hierarchy as varies.
(a) The result of “multiresolution” algorithm for the weighted brain image shown in (b): , and in terms of the resolution .
(b) The weighted result of the brain images with different , which correspond to , , .
Figure 6: [Color Online.] The result of “multiresolution” for the weighted brain image shown in panel (b). In panel (a), the “multiresolution” result here behaves different from Fig. 5 but keeps the same trend. The structure is only stable in the resolution range of , compared to the wider range of in Fig. 5. This illustrates that the weighted graph is more sensitive to the change of resolution.

We start the review of the results of our methods by analyzing an unweighted graph (Eq. (1)) for the grey-scale brain image as shown in Fig. 5. We will assign edges between pixels only if the intensity difference is less than the threshold as denoted in panel (a) of Fig. 5. The algorithm uses Eq. (1) to solve for a range of resolution parameters in the interval . In the particular case in Fig. 5, and . There are two more input parameters that are needed in our algorithm: the number of independent replicas that will be solved at each tested resolution and the number of trials per replica . We use and in Fig. 5 respectively.

As noted earlier (see Section V), for each replica, we select the lowest energy solution among the trials. The replicas are generated by reordering the “symmetric” initialized state of one node per community. We then use the information based measures (i.e., or ) to determine the multiresolution structure.

The plots of , and as a function of in Fig. 5 exhibit non-trivial behaviors. Extrema in and correspond to jumps in the number of communities . In the low region, i.e., , the number of communities is stable. However, when , the number of communities sharply increase. This indicates that the structure changes rapidly as the resolution varies. There are three prominent peaks in the (variation of information) curve. We show the corresponding images at these resolutions, that is in panel(b) in Fig. 5. These corresponding segmented images show more and more sophisticated structures. The lower right image at a resolution of shows the information in detail. Different colors in the image correspond to different clusters. There are, at least, five contours surrounding the tumor, that denote the degree by which the tissue was pushed by the tumor. The lower left image at the resolution is less detailed than the one on the right. Nevertheless, it retains the details surrounding the tumor. If we further decrease , the upper right image at the resolution will not keep the details of the tumor boundary, only the rough location of the tumor. Thus, neither too large nor too small resolutions are appropriate for tumor detection in this image. The resolution around is the most suitable in this case. This is in accord with our general found maxim in Section IX concerning a value of . We re-iterate that, in general, the optimal value of is found by Eq.(21) (an example of which is manifest in the information theory plateaus discussed in Section VII.2). In Section XI.7, we will discuss, in depth, how the optimal values of may be determined in (weighted) example systems.

xi.1.2 Weighted graphs

(a) The result of “multiresolution” algorithm for the weighted brain image shown in (b): , and in terms of the threshold .
(b) The weighted result of the images with different , which correspond to , , .
Figure 7: [Color Online.] The “multiresolution” result also shows the hierarchy structure as the threshold varies, as in Fig. 6. Higher “” corresponds to the lower ””, which means pixels intend to merge in higher “”. The structure is stable in the range of , below which the structure is sensitive.

In Figs. (6, 7), we provide the “multiresolution” results for the weighted graph (Eq. (2)) of and for the brain image. Both the resolution and the threshold control the hierarchy structures: the peaks in the normalized mutual information and variation of information always correspond to the jumps in the number of communities . The jumps in correlate with the changes in hierarchal structures on different scales. We can combine both parameters to obtain the desirable results in the test images. See, e.g., the plot of .

The results of our method with weighted edges are more sensitive to the changes of parameters (as seen from a comparison of Fig. 6 with Fig. 5). According to Eq. (2), edges with small (or large) difference will decrease (or increase) the energy by (or ). However, if the unweighted graphs and the Potts model with discrete weights (Eq. (1)) are applied, the edges with small or large “color” difference will decrease or increase the energy by the amount of or . Thus, considerable information (e.g., the “color” of each pixel) is omitted when using an unweighted graph approach.

xi.2 A painting by Salvador Dali

(a) The variation of information as a function of resolution for the image shown in Panel(b) .
(b) The original image and the corresponding segmentation for the specific resolution marked (I) in panel(a).
(c) The corresponding images in the specific resolution marked (II),(III),(IV) and (V) in panel(a).
Figure 8: [Color Online.] The specific image is from lincoln . At close distance, this is “Gala contemplating the Mediterranean sea” while at larger distance is “a portrait of Abraham Lincoln”. Panel(a) shows the variation of information as a function of resolution. We pick the resolution at each “peak” position and apply our algorithm at these particular resolutions. Panels (b) and (c) show the resulted images at the corresponding resolutions marked in panel (a). Note that at low resolution, the resulting segmentation clearly depicts “the portrait of Abraham Lincoln” as shown in panel (b) on the right. In particular, notwithstanding noise, as increases, the segmentation results show more details and we could detect the lady in the middle in (II)-(V) of Panel(c).

We next apply our multiresolution community detection algorithm to the images that are by construction truly multi-scale. The results at different resolutions are shown in Fig. 8. The original image is that of Salvador Dali’s famous painting “Gala contemplating the Mediterranean sea which at twenty meters becomes a portrait of Abraham Lincoln”. Our algorithm perfectly detected the portrait of Lincoln at low resolution as shown in Fig. 8 in the segmentation result appearing in panel (I) of (b). As the resolution parameter increases, the algorithm is able to detect more details. However, due to the non-uniform color and the similarity of the surrounding colors to those of the targets, the results are very noisy. At the threshold of , the algorithm has difficulty in merging pixels to reproduce the lady in the image. For example, in image (II) in Panel (c), the lady’s legs are merged into the background. In image (III), only one leg is detected. In images (IV) and (V), both legs can be detected but belong to different clusters.

xi.3 Benchmarks

In order to assess the success of our method and ascertain general features, we applied it to standard benchmarks. In particular, we examined two known benchmarks: (i) The Berkeley image segmentation benchmark and (ii) that of Microsoft Research.

xi.3.1 Berkeley Image Segmentation Benchmark

We were able to accurately detect the targets in test images, as in Figs.(9, 10). The original images in Fig. 9 were downloaded from the Berkeley image segmentation benchmark BSDS300 berkeley-paper , and those of Fig. 10 are downloaded from the Microsoft Research microsoft . We will now compare our results with the results by other algorithms. The Berkeley image segmentation benchmark provides the platform to compare the boundary detection algorithms by an “F-measure”. This quantity is defined as


“Recall” is computed as the fraction of correct instances among all instances that actually belong to the relevant subset, while “Precision” is the fraction of correct instances among those that the algorithm believes to belong to the relevant subset. Thus, we have to draw the boundaries in our results and compute the F-measure. We use the tool “EdgeDetect” of Mathematica software to draw the boundaries within our region detection results, as shown in the right column in Fig. 9. The comparison of the “F-measure” of our algorithm (“F-Absolute Potts Model”) and the best algorithm in the benchmark (“F-Global Probability of boundary”)gPb1 ; gPb2 is shown in Table. 1. On the whole, our results are better than the algorithm of the Berkeley group.

Figure 9: [Color Online.] Image segmentation results of our algorithm when tested with examples from the Berkeley BSDS300 benchmark. Shown, in the left column, are the original images. The central column contains the results of our method. The right column provides the boundaries of the images in the middle by running “EdgeDetect” of Mathematica on the results of our run in the central column. The parameters of community detection algorithm used for these images are: in (a), , . In panel (b), , . In (c), , . In (d), , . In (e), , . We performed the boundary detection on the results of our community detection algorithm (i.e., the central column) and employed the “F-measure” accuracy parameter in order to compare the results of our algorithm with earlier results reported for the Berkeley image segmentation benchmark (shown in Table. 1).
F-Absolute Potts Model F-Global Probability of boundary
a 0.79 0.78
b 0.94 0.91
c 0.82 0.74
d 0.79 0.83
e 0.75 0.60
Table 1: The comparison of “F measure” for Fig. 9 by our community detection algorithm (“F-Absolute Potts Model”) with the algorithm “Global Probability of boundary” (“gPb”) gPb1 ; gPb2 which has the highest score in the Berkeley image segmentation benchmark (“F-Global Probability of boundary”). The higher F-value corresponds to the better detection. Note that our algorithm is performing better than the “gPb algorithm” in almost all images except the fourth one. Our fourth (d) image gets lower score is mostly because there are dots in the lower grass place. These small dots will lead to small high accuracy features. These features are unexpected in the ground truth and thus lower the F-value.

xi.3.2 Microsoft Research Benchmarks

In Fig. 10, we compare our results (in the rightmost column) with the ground truths provided by Microsoft Research (the central column). By adjusting the and values, we can merge the background pixels and highlight the target. In the segmentation of the image of the flower in the first row, and . For both the picnic table in the middle row and that of the two sheep in the bottom row, we set and .

Figure 10: [Color Online.] The results of some image segmentations by our Potts model (Eq. (2)) and community detection algorithm (peter1 ). The images are downloaded from the website of Microsoft Research (microsoft ). The left column are the original images. The central column are the ground truths defined by the website of Microsoft Research (microsoft ), which are the desirable image segmentation results. The right column are the segmentation results by our algorithm. The parameters used for each image are: (1), for the flower image. (2) , for the image of the picnic table. (3) Similarly, , for the image of the two sheep. Note that our algorithm works very well for this kind of images in which the color is nearly uniform within each object.

xi.4 Detection of quasi-periodic structure in quasicrystals

Quasicrystals quasicrystal are ordered but not periodic (hence the name “quasi”). In Fig. 11, the image in row (a) is such a quasi-crystal formed by “Penrose tiling”. We applied the Fourier transform method to reveal the corresponding underlying structures. In row (a), the image marked by (I) is the original image (downloaded from quasi1 ), the one with notation (II) is the result of our algorithm, and (III) is the image of (II) with the connections of the nearest neighbor nodes. The images marked by (II) and (III) show the first Penrose tiling (tiling P1). Penrose’s first tiling employs a five-pointed pentagram, 3/5 pentagram shape and a thin rhombus. Similarly, the result images of panels (II) and (III) in row (b) reveal the underlying structure of the superlattice with stoichiometry and the structural motif of the () “Archimedean tiling” of the original image (I) (from quasi2 ). The Archimedean tiling displayed in image (III) of row (b) of Fig. 11 employs squares and triangles. It is straightforward to analyze the quasi-periodic structure by applying our image segmentation algorithm as shown in Fig. 11. By iterating the scheme outlined herein, structure on larger and larger scales was revealed.

Figure 11: [Color Online.] Quasicrystal images are displayed in panels (I). The corresponding image segmentation results by our algorithm are shown in (II). In (III), we connect the basic object by line, resulting in large basic blocks. This process can be repeated recursively leading to larger and larger scale structures. Note that we are able to reveal the underlying quasi-periodic structures in both row (a) (the original image in (I) is from quasi1 ) and (b) (the original image in (I) is from quasi2 ). We show the first Penrose tiling (tiling “P1”) in (a), and the structural motif of the () Archimedean tiling in (b).

xi.5 Images with spatially varying intensities

If the target is similar to the background (as in, e.g., animal camouflage), then the simplest initialization of edges with linear weights will, generally, not suffice. For example, in Fig. 12 the zebra appears with black and white stripes. It is hard to directly detect the stripes of the zebra because of the large “color” difference between the black and white stripes of the zebra. Fig. 14 has the similar stripe-shaped background which is very difficult to distinguish from the zebra itself by using the weights of Eqs.(7, 8) for the edges. Towards this end, we will next employ the Fourier transform method of Sec. IV.2.3.

Figure 12: [Color Online.] The image segmentation results by the community detection algorithm with Fourier weights as described in Section IV.2.3. Some of the images are downloaded from the Microsoft Research (microsoft ) and some of them are download from the Berkeley image segmentation benchmark (berkeley-paper ). The left column contains the original images. The central column (apart from the last two rows) provides the “ground truths”. The right images on the right are our results. The parameters used in each image are: (1) , for the tree image. (2) and for the car image. (3) and for the bench image. (4) , for the image of corn. (5) , for the zebra image. Even though the color is not uniform inside the targets, we can nevertheless easily detect the targets by this method.

As seen in Fig. 12, the original images are not uniform. Rather, these images are composed of different basic components such as stripes or spots, etc. With the aid of Fourier transform within each block, as discussed in Section IV.2.3, we are able to detect the target. For some of the images such as the second one in Fig. 12, when the target is composed of more than one uniform color or style, our community detection algorithm is able to detect the boundaries, but the regions inside the boundary are hard to merge. This is because the block size is smaller than that needed to cover both the target and the background. That is, block size of is much smaller than the image size of in the car image in the second row, so most of the blocks are within one color of the target (car) or the background (ground). However, the dominant Fourier wave-vector of the region within one color component of the car is similar to that of the ground. Therefore, the algorithm always treats them as the same cluster, rather than merging the region inside the car with the boundary.

In other instances (e.g., all the other rows except the second in Fig. 12), the targets are markedly different from the backgrounds. Following the scheme discussed in Section X (that will be fleshed out in Section XI.7), we may always optimize parameters such as the resolution, threshold, or the block size to obtain better segmentation.

xi.6 Detection of camouflaged objects

Figure 13: [Color Online.] The results of the image segmentation for a “camouflaged image”. The image of the leopard is from (wikipedia ), the lizard is provided in the Berkeley image segmentation benchmark(berkeley-paper ), and the last image is from the website of the EECS department of Berkeley (berkeley-zebra ). The parameters for the shown segmentations are: (1) , for the image of the leopard, (2) , in the image of the lizard, and (3) , for the zebra image.
(a) The variation of information as a function of the negative threshold for the zebra image in panel (b).
(b) The weighted results of the zebra images at the corresponding thresholds: (I), (II), and (III).
Figure 14: [Color Online.] The “multiresolution” result of zebra with fixed community number and resolution . In panel(a), we plot the variation of information as a function of negative threshold . The peaks in correspond to the changes of structures. We choose three peaks and run the algorithm at these three particular thresholds, and the result images are shown in panel (b). As increases, less regions in the zebra merge to the background, and the boundary becomes more clear. If we increase the threshold further, the result is more noisy as the last image of (III) shows.

In the images of Fig. 12, the target objects are very different from their background. However, there are images wherein (camouflaged) objects are similar to their background. In what follows, we will report on the results of our community detection algorithm when these challenging images were analyzed. In all of the cases below in Section XI.6.1, the edge weights were initialized by the Fourier amplitudes discussed earlier (Section IV.2.3). In the case of the dalmatian dog image in Section XI.6.2, we further employed the method of average intensity difference between blocks discussed in Section IV.2.2. In all cases but this last one of the dalmatian dog, we fixed the length scale parameter of Section IV.2.3 to be infinite.

xi.6.1 Images of a leopard, a lizard, and a zebra

(a) The variation of information as a function of length at
(b) The normalized mutual information as a function of length at
(c) The corresponding image segmentation result at the extremum of / in panel (a)/(b)
(d) The variation of information as a function of length at
(e) The normalized mutual information as a function of length at
(f) The corresponding image segmentation results at the extremum, and at a point close to the peak of / in panel (d)/(e).
Figure 15: [Color Online.] Results of our algorithm as a function of the length scale in Eq. (7) for the dalmatian dog image. Plots of the variation of information and the normalized mutual information (, ) as a function of the length scale appear in panels (a, b)(at resolution of ) and (d, e) ( ). Panel (c) shows the original image. As seen in panels (a,b), a coincident local maximum of and local minimum of appears (for ) at . Similarly, panel (f) shows the images corresponding to the peak of (coincident with a local minimum of in panel (d) (and (e)) at (and ). We examine the results for in panel (f). We are able to detect the body and the back two legs of the dog, even though with some “bleeding” in panel (f). In (c), we are detecting well except for the inclusion of some “shade” noise under the body.

“Camouflage” refers to a method of hiding. It allows for an otherwise visible organism or object to remain unnoticed by blending with its environment. The leopard in the first row of Fig. 13 is color camouflaged. With our algorithm, we are able to detect most parts of the leopard except the head. The lizard in the second row uses not only the color camouflage but also the style camouflage, both the lizard and the ground are composed of grey spots. We can detect the lizard. The zebra in the bottom row uses the camouflage– both the background and the zebra have black-and-white stripes. Our result is very accurate, even though the algorithm treats the middle portion of the zebra (the position of the “hole”) as the background by mistake. This is because, in this region, the stripes within the zebra are very hard to distinguish from the stripes in the background, they are both regular and vertical.

We applied the “multiresolution” algorithm to the zebra image in the last row of Fig. 13 as shown in Fig. 14. The number of communities is , the resolution parameter and the threshold was varied from to . In the low area, some regions inside the zebra tend to merge into the background (the image with the threshold ). As the background threshold increases in magnitude, the boundary of the zebra becomes sharper (the shown segmentation corresponds to a threshold of ). For yet larger values of , the results are noisy (the image with the threshold ). Thus, in the range , we obtain the clear detection seen in the last row of Fig. 13.

(a) A 3d plot of the normalized mutual information as a function of and .
(b) The 3d plot of the variation of information as the function of and
(c) Plots of the susceptibility as the function of and
(d) The Shannon entropy as a function of and .
(e) The energy as a function of and .
Figure 16: Plots of , , , and energy as the function of and for the “dalmatian dog” image in Fig. 15.

xi.6.2 Dalmatian dog

The camouflaged dalmatian dog in panel (c) of Fig. 15 (and Fig. 1) is a particularly challenging image. We invoke the method detailed in Sec. IV.2.2 to assign edge weights. We then apply the multiresolution algorithm to ascertain the length scale in Eq. (7). The inter-replica averages of the variation of information and the normalized mutual information are, respectively, shown in panels (a,b) and panels (d,e) of Fig. 15. These information theory overlaps indicate that, as a function of , there are, broadly, two different regimes separated by a transition at . We determine the value of at the local information theory extremum that is proximate to this transition and determine the edge weights set by this value of . (See Eq. (7).) In Section XI.7.1, we will illustrate how we may determine an optimal value of .

We segment the original image of the dalmatian dog via our community detection algorithm as shown in panels (c) and (f) in Fig. 15. The result in panel (c) corresponds to a resolution of . The image on the right in panel (c) is the superimposed image of our result and the original image (on the left) at the particular length . The “green” color corresponds to the dalmatian dog. The method is able to detect almost all the parts of the dog except the inclusion of “shade” noise under the body. The results in panel (f) correspond to a resolution . The image on the left in panel (f) is the superimposed image of our immediate running result and the original one at the length , which is close to the maximum of (and the local minimum of ). On the right, we provide the result for (a value of corresponding to a maximum of and a minimum of ). The “purple” color in the segmented image corresponds to the dalmatian dog. We are able to detect the body and the two legs in the back, even though with some “bleeding”. As we will discuss in the next subsection, it is possible to relate the contending solutions found in Fig.(15) for different values of and to the character of the phase diagram.

xi.7 Phase Diagram

As previously alluded to in Sec. X, we investigated numerically the phase diagram and the character of the transitions of the community detection problem for general graphs in dandan . From this, we were able to distinguish between the “easy”, “hard” and “unsolvable” phases as well as additional transitions within contending solutions within these phases (e.g., our discussion in Section VII.2). Strictly speaking, of course, different phases appear only in the thermodynamic limit of a large number of nodes (i.e., ). Nevertheless, for large enough systems (), different phases are, essentially, manifest. As we will now illustrate, the analysis of the phase diagram enables the determination of the optimal parameters for the image segmentation problem. To make this connection lucid, we will, in this section, detail the phase diagrams of several of the images that we analyzed thus far.

xi.7.1 Phase diagram of the Potts model corresponding to the dalmatian dog image

We will now analyze the thermodynamic and information theory measures as they pertain to the dalmatian dog image (Fig. 15) for a range of parameters. In a disparate analysis, in subsection XI.7.2, we will extend this approach also to finite temperature (i.e., ) where a heat bath algorithm was employed. Here, we will content ourselves with the study of the zero temperature case that we have focused on thus far.

Plots of the normalized mutual information , variation of information , susceptibility , entropy , and the energy are displayed in Fig. 16. We set the background intensity to . The block size is . We then varied the resolution and the spatial scale within a domain given by and . In Fig. 16, all logarithms are in the common basis (i.e., ).

Several local extrema are manifest in Fig. 16. In the context of the data to be presented below, the quantity of Eq. (21) can be , , , or , and may be or . Examining the squares of the gradients of these quantities, as depicted in Fig. 17, aids the identification of more sharply defined extrema and broad regions of the parameter space that correspond to different phases.

In Fig. 17, we compute the squares of the gradients of , , , and in panels (a) through (e). Panel (f) shows the sum of the squares of the gradients of , and . A red dot denotes parameters for a “good” image segmentation with the parameter pair being (or corresponding to the left hand segmentation in panel (f) of Fig. 15). Clearly, the red dot is located at the local minimum in each panel. This establishes the correspondence between the optimal parameters and the general structure of the information theoretic and thermodynamic quantities.

As evinced in Fig. 17, there is a local single minimum which is surrounded by several peaks in the 3D plots of the squares of the gradients of , (panel(a),(b)) and their sum (panel (f)). For the dalmatian dog image (Fig. 15, setting in Eq. (21) to be the square of the gradients efficiently locates optimal parameters. Note that the other contending solutions in Fig. (15) relate naturally to the one at and . The (i.e., ) solution on the right hand side of panel (f) appears in the same “basin” as that of the solution. Indeed, both segmentations of panel (f) of Fig. (15) share similar features. By contrast, the and (i.e., ()) segmentation result of panel (c) in Fig.(15) relates to a different region.

(a) The square of the gradient of (panel (a) of Fig. 16) as a function of and .
(b) The square of the gradient of (panel (b) of Fig. 16) as a function of and .
(c) The square of the gradient of (panel(c) of Fig. 16) as a function of and .
(d) The square of the gradient of (panel (d) of Fig. 16) as the function of and .
(e) The square of the gradient of (panel (e) of Fig. 16) as the function of and .
(f) The sum of the squares of the gradients of , and (panel (a),(b) and (c) of Fig. 16) as the function of and .
Figure 17: Information theory and thermodynamic measures relating to the dalmatian dog image of Fig. 15. The squares of the gradient of , , , , (panel (a)-(e)) and the sum of the squares of the gradients of , and (panel (f)) as the function of and . The red dot in each panel denotes the location of the parameters (, (i.e., ) of the results in Fig. 15. This good segmentation found for these parameters correlates with a local minimum within each panel.

xi.7.2 A finite temperature phase diagram

(a) The normalized mutual information as a function of the resolution and temperature .
(b) The variation of information as the function of the resolution and temperature .
(c) The susceptibility as the function of the resolution and temperature .
(d) The energy as a function of the resolution and temperature .
(e) The Shannon entropy as the function of the resolution and temperature .
Figure 18: The normalized mutual information , variation of information , susceptibility , energy and Shannon entropy as the function of the resolution and temperature for the “bird” image in Fig. 19. In panel (a), we mark (i) the “easy” phase (where is almost ) as “A”, (ii) the “hard” phase (where decreases) by “B”, and (iii) denote the “unsolvable” phase (where forms a plateau whose value is less than ) by “C”. The physical character of the “easy”, “hard”, and “unsolvable” phases is further evinced by the corresponding image segmentation results in Fig. 19. We can determine the signatures of the three phases in all panels apart from panel(c)-the 3d plot of the susceptibility .
Figure 19: [Color Online.] The image segmentation results of the “bird” image. The original image is on the upper left. The segmentations denoted by “A”, “B” and “C” correspond to results with different parameter pairs (, ) that are marked in panel (a) of Fig. 18. Both results “A” and “B” are able to distinguish the “bird” from the “background”. However, in panel (b), the “bird” is composed of numerous of small clusters. The segmentation “C” does not detect the “bird”. The results shown here at points A, B, C correlate with the corresponding “easy-hard-unsolvable” phases in the phase diagram in Fig. 18.

Fig. 18 depicts the finite temperature () phase diagram of the image of the bird of Fig. 19. We will find that for this easy image, the phase boundaries between the easy, hard, and unsolvable phases of the image are relatively sharply defined.

In the context of the data to be presented, we fixed the background intensity , set the block size to be and took the spatial scale . The varying parameters are the resolution and temperature . Instead of applying our community detection algorithm at zero temperature, we will incorporate the finite temperature dandan in this section. The ranges of the and values are and respectively. In the panels of Fig. 18, we show the normalized mutual information , variation of information , susceptibility , energy and Shannon entropy as the function of the temperature and the logarithm of the resolution .

We can clearly distinguish the“easy”, “hard” and “unsolvable” phases from the 3D plots of (panel (a)), (panel (b)) and (panel (e)). The label“A” in panel (a) marks the “easy” phase, where for and for . The “easy” phase becomes narrower as temperature increases. The corresponding image segmentation result shown in Fig. 19 validates the label of the “easy” phase. The “A” image in Fig. 19 is obtained by running our community detection algorithm with the parameter pairs located in the area labeled by “A” in Fig. 18. The image segmentation denoted by “A” can perfectly detect the bird and the background. The bird is essentially composed of two clusters and the background forms one contiguous cluster. This reflects the true composition of the original image on the upper left. Thus, the bird image can be perfectly segmented in an unsupervised way when choosing parameters to be in the “A” region (corresponding to the computationally “easy” phase ).

The region surrounding point “B” in panel (b) in Fig. 18 denotes the “hard” phase, where is in the range of and in the range of . Within the “hard” phase, as the corresponding image labeled by “B” in Fig. 19 illustrates, the bird is composed of numerous small clusters with the background still forming one cluster. In this phase, the image segmentation becomes harder and some more complicated objects cannot be detected.

The label “C” in panel (c) in Fig. 18 denotes the “unsolvable” phase, where the range for and is about and respectively. The corresponding image in Fig. 19 labeled by “C” is composed of numerous small clusters for which it is virtually impossible to distinguish the bird from the background. In this phase, the normalized mutual information is far less than (indicating, as expected, the low quality of segmentations).

Other 3D plots in Fig. 18 generally show similar phase transitions. Especially, the 3D entropy plot (panel (e)) vividly depicts accurate three phases and their clear boundaries.

Xii Conclusions

In summary, we applied a multi-scale replica inference based community detection algorithm to address unsupervised image segmentation. The resolution parameters can be adjusted to reveal the targets in different levels of details determined by extrema and transitions. In the images with uniform targets, we distributed edge weights based on the color difference. For images with non-uniform targets, we applied a Fourier transformation within blocks and assigned the edge weights based on an overlap. Our image segmentation results were shown to be, at least, as accurate as some of the best to date (see, e.g., Table. 1) for images with both uniform and non-uniform targets. The images analyzed in this work cover a wide range of categories: animals, trees, flowers, cars, brain MRI images, etc. Our algorithm is specially suited for the detection of camouflage images. We illustrated the existence of the analogs of three computational phases (“easy-hard-unsolvable”) found in the satisfiability (SAT) problem mezard1 ; mezard2 in the image segmentation problem as it was formulated in our work. When the system exhibits a hierarchal or general multi-scale structure, transitions further appear between different contending solutions. With the aid of the structure of the general phase diagram, optimal parameters for the image segmentation analysis may be discerned. This general approach of relating the thermodynamic phase diagram to parameters to be used in an image segmentation analysis is not limited to the particular Potts model formulation for unsupervised image segmentation that was introduced in this work. In an upcoming work, we will illustrate how supervised image segmentation with edge weights that are inferred from a Bayesian analysis with prior probabilities for various known patterns (or training sets), can be addressed along similar lines prep . We conclude with a speculation. It may well be that, in real biological neural networks, parameters are adjusted such that the system is solvable for a generic expected input and critically poised next to the boundaries between different contending solutions caudill .

The software package for the “multi-resolution community detection” algorithm peter1 that was used in this work is available at http://www.physics.wustl.edu/zohar/communitydetection/.

We wish to thank S. Chakrabarty, R. Darst, P. Johnson, B. Leonard, A. Middleton, D. Reichman, V. Tran, and L. Zdeborova for discussions and ongoing work and are especially grateful to Xiaobai Sun (Duke) for a very careful reading of the manuscript, comments, and encouragement.

Appendix A: Improved F-value by removing small high precision features

As seen in Sec. XI.3.1, our results in the first three images except the last one are better than the corresponding ones by the best algorithm in the Berkeley Image Segmentation Benchmark. One possible reason to cause the worse result in the last image is that our algorithm is too accurate. For example, the top image in Fig. 20, our result could detect the small white spray, which becomes the dots in the background. These small dots will form small circles in the boundary image shown in the right column, which are unexpected from the groundtruth, thus will reduce the value of precision and . (In this case, .)

Merging these high precision small dots with the background as, e.g., fleshed out in the second row in Fig. 20, leads to results that are equivalent to or better than those determined by the algorithm of global probability of boundary (gPb). A summary is presented in Table. 2.

Figure 20: [Color Online.] The image segmentation results by our algorithm. The original images in the left most column are downloaded from Berkeley Image segmentation benchmark. The central image in the first row/ the third row is the result of our algorithm at and . The right image in the first/the third row is the boundary detection result of the corresponding central image by the software Mathematica. There are many dots/circles which denote the white spray in original image in the first row. The small dots/circles in the third row denote the shadow in the original image. We merge these small dots in the first and third row into the background and the results shown in the second and fourth row are more smooth and close to the groundtruth. This is confirmed by the larger F value shown in Table. 2.

Appendix B: The image segmentation corresponding to the mutual information () peak

(a) The curve of as a function of negative threshold for the zebra image in panel (b).
(b) The weighted result of the zebra images at the corresponding thresholds: , , and .
Figure 21: [Color Online] The “multiresolution” result of zebra with fixed community number and resolution . In panel(a), we plot the normalized mutual information as a function of negative threshold . The peaks in also correspond to the changes of structures. We choose three peaks and run the algorithm at these three particular thresholds, and the result images are shown in panel (b). As increases, less regions in the zebra merge to the background, and the boundary becomes more clear.

As emphasized throughout this work, we focus on inter-replica information theory overlap extrema. In some of the earlier examples, we discussed the results pertaining to variation of information maxima (often correlating with normalized mutual information minima). We now briefly discuss sample results for the normalized mutual information maxima. We provide one such example in Fig. 21. Herein, we plot as a function of and provide the corresponding segmented images at the peaks of . As shown before, in panels I-III of Fig. 14, we provide the image segmentation that correspond to the values of for which the variation of information exhibits a local maximum. In Fig. 21, we do the same for the normalized mutual information .

Appendix C: The image segmentation with negative and positive Fourier weight

F-Our algorithm F-Our algorithm without noise F-gPb
a 0.56 0.85 0.82
b 0.65 0.73 0.74
Table 2: The F-measure of the images shown in Fig. 20. We provide the comparison with the results by algorithm Global Probability of Boundary (gPb). Note that after removing the small dots/noise in both images, the value of F increase significantly. After this merger, our results become equivalent to (or even better than) the best results to date.

In this brief appendix, we wish to compare results obtained with the weights given by those of Eq. (11) to those obtained when is set to be of the same magnitude as in Eq.(11) but of opposite sign (referred to below as its “negative counterpart”). In the latter case, a large weight corresponds to a large overlap between patterns in blocks. Thus, minimizing the Hamiltonian will tend to fragment a nearly uniform background (for which the overlap between different blocks within is large) and will tend to group together regions that change. The results of the application of Eq. (11) and that of its negative counterpart are shown side by side in Fig. 22 II and III. In both cases, the zebra is successfully detected from the similar stripe-shaped background, as long as using the right parameters. In (II), the parameters used are as follows: the background , the resolution parameter is and the block size is . In (III), we use a positive background but with a negative Eq. (11), resolution and block size . The difference shown in Fig. 22 between result (II) and (III) due to different fourier weights is that: In (II), the background forms a large cluster and the zebra is composed of lots of small clusters. In (III), the zebra forms a large single community while the background is composed of many small communities. For the images in Fig. 12, we substitute in Eq. (2), the weights of Eq. (11) along with a negative background .

Figure 22: [Color Online.] The image segmentation results (II and III) of the original camouflaged zebra in (I). In panel (II), we used the Fourier based edge weights of Eq. (11) and with a negative background (Other parameters are , block size ). (III) The resulting segmentation when the sign on the right hand side of Eq. (11) is flipped. Here, we applied a positive background (Other parameters are , block size ). Both of the results shown here (i.e., II and III) are able to detect the zebra.


  • (1) Y-J Zhang, Advances in Image and Video Segmentation, Editor IRM Press, (2006).
  • (2) J. Bigun,

    Vision with Direction: A Systematic Introduction to Image Processing and Computer Vision

    , Springer, (2010).
  • (3) L. O’Gorman, M. J. Sammon, and M. Seul, Practical Algorithms for Image Analysis, Cambridge University Press, (2008).
  • (4) D. L. Pham, C. Xu, and J. L. Prince, Current Methods in Medical Image Segmentation, Annual Review of Biomedical Engineering, 2, 315-337 (2000).
  • (5) R. Brunelli and T. Poggio, Face Recognition: Features versus Templates, IEEE transaction on pattern analysis and machine intelligence, 15, 10 (1993).
  • (6) B. Bhanu, X. Tan, Computational algorithms for fingerprint recognition, Boston : Kluwer Academic Publishers (2004).
  • (7) S. Carsten, M. Ulrich, and C. Wiedemann, Machine Vision Algorithms and Applications. Weinheim: Wiley-VCH. p. 1. ISBN 9783527407347 (2008).
  • (8) M. Sezgin and B. Sankur, Survey over image thresholding techniques and quantitative performance evaluation, Journal of Electronic Imaging 13 (1): 146-165 (2003).
  • (9) T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, A. Y. Wu,

    An efficient k-means clustering algorithm: Analysis and implementation

    . IEEE Trans. Pattern Analysis and Machine Intelligence 24: 881-892 (2002).
  • (10) H. Mobahi, S. Rao, A. Yang, S. Sastry and Y. Ma, Segmentation of Natural Images by Texture and Boundary Compression, http://arxiv.org/abs/1006.3679 (2010).
  • (11) O. Chapelle, P. Haffner, and V. N. Vapnik, Support Vector Machines for Histogram-Based Image Classification, IEEE Transactions on neural networks, 10, 5 (1999).
  • (12) P. Meer, S. Wang, H. Wechsler, Edge detection by associative mapping

    , Pattern Recognition,

    22, 491-504 (1989).
  • (13) S. A. Hojjatoleslami and J. Kittler, Region Growing: A New Approach, IEEE Transaction on image processing, 7, No. 7, 1057-7149, p. 1079 (1998).
  • (14) J. Kim, J. W. Fisher, A. Yezzi, M. Cetin, and A. S. Willsky, A Nonparameteric Statistical Method for Image Segmentation Using Information Theory and Curve Evolution, IEEE Transitions on Image Processing, 14, No. 10, 1057-7149, p. 1486 (2005).
  • (15) C. Wojtan, N. Thürey, M. Gross, G. Turk, Deforming Meshes that Split and Merge. ACM Trans. Graph. 28, 3, Article 76 (2009).
  • (16) S. Osher and N. Paragios, Geometric Level Set Methods in Imaging Vision and Graphics, Springer Verlag, ISBN 0387954880 (2003).
  • (17) S. Fortunato, Community detection in graphs, Physics Reports 486, 75-174 (2010).
  • (18) J. Shi, J. Malik, Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, Issue 8, 888-905 (2000).
  • (19) S. German and D. German, Stochastic Relaxation, Gibbs distributions and the Bayesian restoration of images, IEEE Transactions on Patterns, Anal. Mach. Intell. PAMI-6: 721 (1984).
  • (20) J. L. Marroquin,

    Deterministic Bayesian Estimation of Markovian random fields with applications to computer vision

    , Proc. 1st Inter. Conf. Comput. Vision, London (1987).
  • (21) E. B. Gamble and T. Poggio, Visual integration and detection of discontinuities: The key role of intensity edges

    , A. I. Memo No. 970, Artificial Intelligence Laboratory, MIT (1987).

  • (22) D. Geiger and F. Girosi, Parallel and deterministic algorithms for MRFs: surface reconstruction and integration, PAMI-13(5) (1991).
  • (23) S. Beucher, The Watershed Transformation Applied To Image Segmentation, Scanning Microscopy International, supp. 6, 299-314 (1992).
  • (24) L. Grady, Random Walks from Image segmentation, IEEE Transaction on Pattern Analysis and Machine Intelligence, 28, No. 11 (2006).
  • (25)