Universal Joint Image Clustering and Registration using Partition Information

01/10/2017 ∙ by Ravi Kiran Raman, et al. ∙ 0

We consider the problem of universal joint clustering and registration of images and define algorithms using multivariate information functionals. We first study registering two images using maximum mutual information and prove its asymptotic optimality. We then show the shortcomings of pairwise registration in multi-image registration, and design an asymptotically optimal algorithm based on multiinformation. Further, we define a novel multivariate information functional to perform joint clustering and registration of images, and prove consistency of the algorithm. Finally, we consider registration and clustering of numerous limited-resolution images, defining algorithms that are order-optimal in scaling of number of pixels in each image with the number of images.



There are no comments yet.


page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Suppose you have an unlabeled repository of MRI, CT, and PET scans of brain regions corresponding to different patients from different stages of the diagnostic process. You wish to sort them into clusters corresponding to individual patients and align the images within each cluster. In this work, we address this exact problem of joint clustering and registration of images using novel multivariate information functionals.

Image registration is the task of geometrically aligning two or more images of the same scene taken at different points in time, from different viewpoints, or by different imaging devices. It is a crucial step in most image processing tasks where a comparative study of the different images is required such as medical diagnosis, target detection, image fusion, change detection, and multimodal image restoration. In such applications it is also essential to classify images of different scenes prior to registering images of the same kind. Thus clustering images according to the scene is also critical to computer vision problems such as object tracking, face tagging, cryo-electron microscopy, and remote sensing.

Different digital images of the same scene can appear significantly different from each other, e.g., consider images of a scene that are negatives of each other. Such factors tend to make clustering and registering images harder. Further, such meta-data about the digital images is often not available a priori. This emphasizes the need for universality, i.e., the design of reliable clustering and registration algorithms that work without the specific knowledge of the priors or channel models that govern the image generation.

Image clustering and registration have often been dealt with separately. However, it is easy to see that clustering registered images and registering images within clusters are both relatively easier. Here we emphasize that the two problems are not separate and define universal, reliable, joint clustering and registration algorithms.

I-a Prior Work

There is rich literature on clustering and registration; we describe a non-exhaustive listing of relevant prior work.

Supervised learning for classification has recently gained prominence through deep convolutional neural networks and other machine learning algorithms. However, these methods require vast amounts of costly labeled training data. Thus, unsupervised image classification is of interest.

Unsupervised clustering of objects has been studied under numerous optimality and similarity criteria [2]. The -means algorithm and its generalization to Bregman divergences [3] are some popular distance-based methods. Popular techniques for unsupervised image clustering include affinity propagation [4]

, expectation maximization


, independent component analysis

[6], and orthogonal subspace projection [7]. We focus on information-based clustering algorithms [8, 9, 10], owing to the ubiquitous nature of information functionals in universal information processing. Universal clustering has been studied in the communication and crowdsourcing [11, 12].

Separate from clustering, multi-image registration has been studied extensively [13]. Prominent region-based registration methods include maximum likelihood (ML) [14], minimum KL divergence [15], correlation detection [16], and maximum mutual information (MMI) [17, 18]. Feature-based techniques have also been explored [19].

Lower bounds on mean squared error for image registration in the presence of additive noise using Ziv-Zakai and Cramer-Rao bounds have been explored recently [20],[21]. The MMI decoder was originally developed in universal communication [22]. Deterministic reasons for its effectiveness in image registration have been identified [23]. Correctness has been established through information-theoretic arguments [24].

A problem closely related to image registration is multireference alignment. There the aim is to denoise a signal from noisy, circularly-translated versions of itself, under Gaussian or binary noise models [25, 26]. Versions of this problem have been considered for image denoising under Gaussian noise [27]. Unlike denoising, we consider the task of registration alone but for a wider class of noise models under a universal setting.

I-B Our Contributions

While MMI has been found to perform well in numerous empirical studies, concrete theoretical guarantees are still lacking. In this work, we extend the framework of universal delay estimation

[28] to derive universal asymptotic optimality guarantees for MMI in registering two images under the Hamming loss, under mild assumptions on the image models.

Even though the MMI method is universally asymptotically optimal for registering two images, we show that the method is strictly suboptimal in multi-image registration. We define the max multiinformation (MM) image registration algorithm that uses the multiinformation functional in place of pairwise MMI. We prove that the method is universal and asymptotically optimal using type counting arguments.

Then, we consider the task of joint clustering and registration. We define novel multivariate information functionals to characterize dependence in a collection of images. Under a variety of clustering criteria, we define algorithms using these functionals to perform joint clustering and registration and prove consistency of the methods.

Applications such as cryo-electron microscopy handle a large number of images of several molecular conformations. The task of clustering and registration is critical. With such motivation, we revisit joint clustering and registration under the constraint of limited resolution of images. We define blockwise clustering and registration algorithms, further showing they are order-optimal in the scaling of resolution with number of pixels in the system.

Ii Model

We now formulate the joint image clustering and registration problem and define the model of images we work with.

Ii-a Image and Noise

Consider a simple model of images, where each image is a collection of pixels drawn independently and identically from an unknown prior distribution defined on finite space of pixel values . Since the pixels are i.i.d. , we represent the original scene of the image by an

-dimensional random vector,

. More specifically, the scene is drawn according to .

Consider a finite collection of distinct scenes (drawn i.i.d. according to the prior) . This set can be interpreted as a collection of different scenes. Each image may be viewed as a noisy depiction of an underlying scene.

Consider a collection, , of scenes drawn from , with each scene chosen independently and identically according to the pmf , so image corresponds to a depiction of the scene , for .

We model images corresponding to this collection of scenes as noisy versions of the underlying scenes drawn as follows


where is the inclusion-wise maximal subset such that for all . That is, images corresponding to the same scene are jointly corrupted by a discrete memoryless channel, while the images corresponding to different scenes are independent conditioned on the scene. Here we assume . The system is depicted in Fig. 1.

Fig. 1: Model of the joint image clustering and registration problem

Consider any two images generated as above. Let be the conditional distribution that a pixel in the second image is , given the corresponding pixel in the first image is . Let

where are i.i.d. copies of pixel values generated according to the marginal distribution of the first image. Note that if , then with sufficient samples of the image, one can easily identify corresponding pixels in the copy. Hence to avoid the trivial case, we presume there exist finite positive constants such that, for any two images,


Ii-B Image Transformations

Corrupted images are also subject to independent rigid-body transformations such as rotation and translation. Since images are vectors of length , transformations are represented by permutations of . Let be the set of all allowable transformations. We assume is known.

Let be the transformation of image . Then, the final image is , for all . Image transformed by is depicted interchangeably as .

We assume forms a commutative algebra over the composition operator . More specifically,

  1. for , ;

  2. there exists unique s.t. , for all ;

  3. for any , there exists a unique inverse , s.t. .

The number of distinct rigid transformations of images with pixels on the -lattice is polynomial in , i.e., for some [29].

Definition 1

A permutation cycle, , is a subset of permutation , such that , for all and .

It is clear from the pigeonhole principle that any permutation is composed of at least one permutation cycle. Let the number of permutation cycles of a permutation be .

Definition 2

Identity block of permutation is the inclusion-wise maximal subset such that , for all .

Definition 3

A permutation is simple if , .

Definition 4

Any two permutations are said to be non-overlapping if for all .

Lemma 1

Let be chosen uniformly at random from the set of all permutations of . Then for any constants ,


First, we observe that the number of permutations that have an identity block of size at least is given by

Thus, from Stirling’s approximation,

Lengths and number of cycles in a random permutation may be analyzed as detailed in [30]. In particular, we note that for a random permutation , . Using Markov’s inequality, the result follows.

Following Lem. 1, we assume that for any , , i.e., the number of permutation cycles does not grow very fast. Further, let for any .

Ii-C Performance Metrics

We now introduce formal metrics to quantify performance of the joint clustering and registration algorithms.

Definition 5

A clustering of images is a partition of . The sets of a partition are referred to as clusters. The clustering is said to be correct if

Let be the set of all partitions of . For a given collection of scenes, we represent the correct clustering by .

Definition 6

A partition is finer than , , if . Similarly, a partition is denser than , , if .

Definition 7

The correct registration of an image transformed by is .

Definition 8

A universal clustering and registration algorithm is a sequence of functions that are designed in the absence of knowledge of , , and . Here the index corresponds to the number of pixels in each image.

We focus on the -loss function to quantify performance.

Definition 9


error probability

of an algorithm that outputs , is

Definition 10

Alg.  is asymptotically consistent if , and is exponentially consistent if .

Definition 11

The error exponent of an algorithm is


We use to denote when clear from context.

Iii Registration of two images

We first consider the problem of registering two images, i.e., , . Thus the problem reduces to registering an image obtained as a result of transforming the output of an equivalent discrete memoryless channel , given input image (reference) . The channel model is depicted in Fig. 2.

Fig. 2: Model of the two image registration problem: Image is to be registered with source image .

This problem has been well-studied in practice and a heuristic method used is the MMI method defined as


where is the mutual information of the empirical distribution where is the indicator function. Note that the MMI method is universal.

Since transformations are chosen uniformly at random, the maximum likelihood (ML) estimate is Bayes optimal:


We first show that the MMI method, and consequently the ML method, are exponentially consistent. We then show that the error exponent of MMI matches that of ML.

Iii-a Exponential Consistency

The empirical mutual information of i.i.d. samples is asymptotically exponentially consistent.

Theorem 1


be random samples drawn i.i.d. according to the joint distribution

defined on the finite space . For fixed alphabet sizes, , the ML estimate of entropy and mutual information are asymptotically exponentially consistent and satisfy


where , .


The proof is given in [31, Lem. 7].

Theorem 2

MMI and ML are exponentially consistent.


Let and let the correct registration be . Then,


where (10), (11), and (12) follow from the union bound, the triangle inequality, and (9), respectively. Here is a constant.

Thus MMI is exponentially consistent as . Finally, and thus, the ML estimate is also exponentially consistent.

Thm. 2 implies there exists such that

Iii-B Whittle’s Law and Markov Types

We now summarize a few results on the number of types and Markov types which are eventually used to analyze the error exponent of image registration.

Consider a sequence . The empirical distribution of is the type of the sequence. Let

be a dummy random variable. Let

be the set of all sequences of length , of type . The number of possible types of sequences of length is polynomial in , i.e., [32].

The number of sequences of length , of type , is

From bounds on multinomial coefficients, the number of sequences of length and type is bounded as [32]


Consider a Markov chain defined on the space

. Given a sequence of samples from we can compute the matrix of transition counts, where corresponds to the number of transitions from state to state . By Whittle’s formula [33], the number of sequences with , with and is


where corresponds to the th cofactor of the matrix with

The first-order Markov type of a sequence is defined as the empirical distribution , given by

Here we assume that the sequence is cyclic with period , i.e., for any . Let . Then, from (14), the set of sequences of type , , satisfies

From the definition of , we can bound the trace of the cofactor matrix of as

Again using the bounds on multinomial coefficients, we have


The joint first-order Markov type of a pair of sequences , is the empirical distribution

Then given , the set of conditional first-order Markov type sequences, satisfies [28]

Lemma 2

Let be any two non-overlapping permutations and let be the corresponding permutations of . Let be a simple permutation. Then, for every , there exists , such that


Since permutations are non-overlapping, there is a bijection from to , where . Specifically, consider the permutation defined iteratively as , with . Then, for any , the sequence . Further, this map is invertible and so the sets are of equal size.

Result for conditional types follows mutatis mutandis.

Lem. 2 implies and satisfy (15) and (16) respectively. We now show that the result of Lem. 2 can be extended to any two permutations .

Lemma 3

Let . For any ,


Let and . For let the length of permutation cycle of be for . Further, . Let be the identity block of and let . Then we have the decomposition


for all . Here, is the first-order Markov type defined on the th permutation cycle of and is the zeroth-order Markov type corresponding to the identity block of .

From (17), we see that given a valid decomposition of types , the number of sequences can be computed as a product of the number of subsequences of each type, i.e.,

where the sum is over all valid decompositions in (17).

Additionally, from Lem. 2 we know the number of valid subsequences of each type. Let be the marginal corresponding to the first-order Markov type .

Thus, we upper bound the size of the set as



the maximum taken over all valid decompositions in (17). Here, (18) follows from (13) and (15), and (19) follows since the total number of possible types is polynomial in the length of the sequence. Since ,

Now, let , for all , and . And let . Since , using Jensen’s inequality, we have


To obtain the lower bound, we note that the total number of sequences is at least the number of sequences obtained from any single valid decomposition. Thus from (13) and (15),

Now, for large , consider for some . Any other cycle of smaller length contributes to the exponent due to Lem. 1. One valid decomposition of (17) is to have , for all . However, the lengths of the subsequences are different and may not be a valid type of the corresponding length. Nevertheless, for each , there exists a type such that , where is the total variational distance. Further, entropy is continuous in the distribution [34] and satisfies

This in turn indicates that

Similar decomposition follows for conditional types as well.

Iii-C Error Analysis

We are interested in the error exponent of MMI-based image registration, in comparison to ML. We first note that the error exponent of the problem is characterized by the pair of transformations that are the hardest to compare.

Define as the binary hypothesis testing problem corresponding to image registration when the allowed transformations are only . Let be the corresponding error probability and error exponent.

Lemma 4

Let be an asymptotically exponentially consistent estimator. Then,


Let be the estimate output by and be the correct registration. We first upper bound the error probability as


where (21) follows from the union bound.

Additionally, we have

Finally, since , the result follows.

Thus, it suffices to consider the binary hypothesis tests to study the error exponent of image registration.

Theorem 3

Let . Then,


Probabilities of i.i.d. sequences are defined by their joint type. Thus, we have


where the summation in (23) is over the set of all joint types of sequences of length and is the number of sequences of length with joint type such that the MMI algorithm makes a decision error.

If a sequence is in error, then all sequences in are in error as is drawn from an i.i.d. source. Now, we can decompose the number of sequences under error as follows

where the sum is taken over such that there is a decision error. The final line follows from the fact that given the joint first-order Markov type, the size of the conditional type is independent of the exact sequence .

Finally, we have


The result then follows from the forthcoming Lem. 5.

Lemma 5

Observe that for images with i.i.d. pixels, MMI is the same as minimizing the joint entropy, i.e.,

Further we know that there is a bijective mapping between sequences corresponding to the permutations and the sequences of the corresponding first-order Markov type from Lem. 2 and 3. Thus, the result follows from [28, Lem. 1].

Theorem 4

This follows from Thm. 3 and Lem. 4.

Thus, we can see that using MMI for image registration is not only universal, but also asymptotically optimal. We next study the problem of image registration for multiple images.

Iv Multi-image Registration

Having universally registered two images, we now consider aligning multiple copies of the same image. For simplicity, let us consider aligning three images; ; results can directly be extended to any finite number of images. Let be the source image and be the noisy, transformed versions to be aligned as shown in Fig. 3.

Fig. 3: Model of the three-image registration problem: Images are to be registered with source image .

Here, the ML estimates are


Iv-a MMI is not optimal

We know MMI is asymptotically optimal at aligning two images. Is pairwise MMI, i.e.


optimal for multi-image registration? We show pairwise MMI is suboptimal even though individual transformations are chosen independently and uniformly from .

Theorem 5

There exists channel and prior, such that pairwise MMI is suboptimal for multi-image registration.


Let . Consider physically degraded images obtained as outputs of the channel . This is depicted in Fig. 4.

Fig. 4: Three-image registration problem with image a physically-degraded version of and to be registered with source image .

Naturally, the ML estimate of the transformations is obtained by registering image to and subsequently registering to , instead of registering each of the images pairwise to . That is, from (27)

It is evident that is estimated based on the estimate of .

Let be the error exponent of for the physically-degraded channel. Since the ML estimate involves registration of to and to , the error exponent is .

Let be the error exponent of MMI. Then, error exponent of pairwise MMI is . We know MMI is asymptotically optimal for two image registration and so