I Introduction
This paper considers estimation of parameters of distributions whose domain is a particular non-Euclidean geometry: a topological space divided into equivalence classes by actions of a finite spherical symmetry group. A well known example of a finite spherical symmetry group is the point group in 3 dimensions describing the soccer ball, or football, with truncated icosahedral symmetry that also corresponds to the symmetry of the Carbon-60 molecule. This paper formulates a general approach to parameter estimation in distributions defined over such domains. We use a restricted finite mixture representation introduced in [1]
for probability distributions that are invariant to actions of any topological group. This representation has the property that the number of mixture components is equal to the order of the group, the distributions in the mixture are all parameterized by the same parameters, and the mixture coefficients are all equal. This is practically significant since many reliable algorithms have been developed for parameter estimation when samples come from finite mixture distributions
[2, 3].We apply the representation to an important problem in materials science: analysis of mean orientation in polycrystals. Crystal orientation characterizes properties of materials including electrical conductivity and thermal conductivity. Polycrystalline materials are composed of grains of varying size and orientation, where each grain contains crystal forms with similar orientations. The quality of the material is mainly determined by the grain structure i.e. the arrangement of the grains, their orientations, as well as the distribution of the precipitates. Thus accurate estimation of crystal orientation of the grains is useful for predicting how materials fail and what modes of failure are more likely to occur [4].
The mean orientation of the grain, characterized for example by its Euler angles, can only be specified modulo a set of angular rotations determined by the symmetry group associated with the specific type of crystal, e.g. hexagonal, cubic. This multiplicity of equivalent Euler angles complicates the development of reliable mean orientation estimators. The problem becomes even harder when the orientations are sampled from a region encompassing more than one grain such that the orientations cluster over different mean directions. In such a case, we would like to identify whether the orientations are multi-modally distributed and also estimate the mean direction for each cluster.
In our previous work [1], we introduced the finite mixture of Von Mises-Fisher (VMF) distribution for observations that are invariant to actions of a spherical symmetry group. We applied the expectation maximization (EM) maximum likelihood (ML) algorithm, called EM-VMF, to estimate the group-invariant parameters of this distribution. In this paper, we develop a hyperbolic representation simplification of the EM-VMF algorithm that reduces the computation time by a factor of . We also introduce a new group invariant distribution for spherical symmetry groups, called the -invariant Watson distribution, which like VMF is a density parameterized by location (angle mean) and scale (angle concentration) over the -dimensional sphere. An EM algorithm is presented for estimation of the parameters, called the EM-Watson algorithm. Furthermore, mixture-of--invariant Watson (mGIW) and von Mises-Fisher (mGIV) distributions are introduced to perform clustering on the -invariant sphere. An EM algorithm is presented for estimation of the parameters of the mGIW and mGIV distributions. We illustrate how the Generalized Likelihood Ratio Test (GLRT) can be used to detect the presence of multiple modes in a sample and how it can be combined with the EM algorithm for mGIW and mGIV distributions to cluster multiple orientations on the sphere.
The performance of the proposed EM orientation estimators is evaluated by simulation and compared to other estimators. The EM orientation estimators are then illustrated on Electron Backscatter Diffraction EBSD data collected from a Nickel alloy whose crystal form induces the [5]
cubic point symmetry group. We establish that the EM orientation estimators result in significantly improved estimates of the mean direction in addition to providing an accurate estimate of concentration about the mean. Furthermore, with the extended mixture models, we are able to identify and cluster multi-modally distributed samples more accurately than the K-means algorithm.
The paper is organized as follows. Section II
describes group invariant random variables and gives the mixture representation for their densities. Section
III specializes to random variables invariant relative to actions of the spherical symmetry group and develops the -invariant VMF and Watson distributions along with EM-ML parameter estimator. The clustering methods based on the -invariant distributions along with the GLRT are elaborated in Section IV. The crystallography application and data simulation are presented in Section V and the experiment results are shown in Section VI. Section VII has concluding remarks.Ii Group-invariant random variables
Consider a finite topological group of distinct actions on a topological space , and a binary operation ”*” defining the action composition , denoted . has the properties that composition of multiple actions is associative, for every action there exists an inverse action, and there exists an identity action [6]. A real valued function on is said to be invariant under if: for . Let be a random variable defined on . We have the following theorem for the probability density of .
Theorem II.1
The density function is invariant under if and only if
(1) |
This theorem is a slight generalization of [1, Thm. 2.1] in that the density is not necessarily the same as . The proof is analogous to that of [1, Thm. 2.1].
Theorem II.1 says that any density that is invariant under group can be represented as a finite mixture of a function and its translates under the group’s actions . As pointed out in [1], Thm. II.1 has important implications on -invariant density estimation and parameter estimation. In particular it can be used to construct maximum likelihood estimators for parametric densities. Let be a density on that is parameterized by a parameter in a parameter space . We extend to a -invariant density by using Thm. II.1, obtaining:
(2) |
where . This density is of the form of a finite mixture of densities of known parametric form where the mixture coefficients are all identical and equal to . Maximum likelihood (ML) estimation of the parameter from an i.i.d. sample from any -invariant density can now be performed using finite mixture model methods [7] such as the Expectation-Maximization (EM) algorithm [2]
or the restricted Boltzman machine (RBM)
[3].Iii ML within a Spherical Symmetry Group
As in [1] we specialize Thm. II.1 to estimation of parameters for the case that the probability density is on a sphere and is invariant to actions in a spherical symmetry group. In Section V this will be applied to a crystallography example under spherical distribution likelihood models for the mean crystal orientation. In general, the measured and mean orientations can be represented by Euler angles [8]
, Rodrigues Vectors
[9], or Quaternions [10]. As in [1], we use the quaternion representation to enable orientations to be modeled by spherical distributions since the quaternion representation is a D vector on the -sphere , i.e. such that .Any of the aforementioned orientation representations have inherent ambiguity due to crystal symmetries. For example, if the crystal has cubic symmetry, its orientation is only uniquely defined up to a 24-fold set of proper rotations of the cube about its symmetry axes. These actions form a point symmetry group, called , a sub-group of . In quaternion space, since each orientation corresponds to two quaternions with different sign , these rotations reflections, and inversions can be represented as a spherical symmetry group of quaternionic matrices , with sign symmetry such that , where for cubic symmetry.
Based on the symmetry group , we can define the distance between two quaternions under as:
(3) |
Two quaternions are called symmetry-equivalent to each other if they are mapped to an equivalent orientation under , i.e. . A fundamental zone (FZ), also called the fundamental domain, is a conic solid subset of the sphere that can be specified to disambiguate any particular orientation . However, as will be seen in Sec. V, reduction of the entire data sample to a FZ destroys information necessary for maximum likelihood estimation: the entire -invariant density (2) must be used. In the following two subsections, we introduce two -invariant spherical distributions: von Mises-Fisher and Watson distributions [11] to model orientations in quaternion space.
Iii-a Hyperbolic -invariant von Mises-Fisher Distribution
First we review the -invariant von Mises-Fisher distribution presented in [1]. The von Mises-Fisher (VMF) distribution arises in directional statistics [11]
as an analogue of the multivariate Gaussian distribution on the
-dimensional sphere , where . The VMF distribution is parameterized by the mean direction and the concentration parameter :(4) |
where and is the modified Bessel function of the first kind of order . Given an i.i.d. sample from the VMF distribution, the ML estimator has the closed-form expressions [11]
(5) |
where and .
Let be a group of symmetric actions acting on the quaternionic representation of orientation on the -dimensional sphere . We extend the VMF distribution (4) using the mixture representation in Thm II.1:
(6) | |||||
(7) |
where in going from (6) to (7) we used the inner product form in (4) and the symmetry of . The expression (7) for the extended VMF distribution is in the form of a finite mixture of standard VMF distributions on the same random variable having different mean parameters but having the same concentration parameter .
The finite mixture (7) for the -invariant density is in a form for which an EM algorithm [2] can be implemented to compute the ML estimates of and . Denoting the parameter pair as , the EM algorithm generates a sequence of estimates that monotonically increase the likelihood. These estimates are given by , where is a latent variable assigning to a particular mixture component in (7) and is the likelihood function of given the complete data . Specifically,
where . The EM algorithm takes the form:
E-step:
(9) |
M-step:
(10) | ||||
(11) |
Iii-B -invariant Watson Distribution
As described at the beginning of this section, each orientation corresponds to two quaternions with different sign, which is equivalent to an axis of the sphere. For axial data it is more natural to use the Watson distribution [12], which models the probability distribution of axially symmetric vectors on the -dimensional unit sphere, i.e. are equivalent. Similar to VMF, the distribution is parametrized by a mean direction , and a concentration parameter
that is no longer necessarily non-negative. Its probability density function is
(14) |
where is the Kummer confluent hypergeometric function defined in [13]. According to (14), the positive-negative pair of group actions contribute the same value in the density function. The set of the group action pairs is the quotient group , where and
is the identity matrix of dimension
. Therefore, is also a group and we can use Thm II.1 to extend the Watson distribution to the mixture representation under :(15) |
where . The ML estimates of and can also be calculated by the EM algorithm. The E-step for the Watson mixture distribution is
(16) |
For the M-step, we take a similar approach as [11] as follows:
(17) |
where is the scatter matrix of . Let
be the eigenvalues of
with(18) |
and let
be the corresponding unit eigenvectors. Since we want to find
which maximizes (17) such that , the estimator of for fixed has the following form:(19) |
Iv Clustering with a Spherical Symmetry Group
In this section we extend the parameter estimation problem to the situation where there are multiple group-invariant distributions with different parameters that govern the samples. This problem arises, for example, in poly-crystaline materials when estimating the mean crystal orientation over a region containing more than one grain (perhaps undetected). This problem can be solved by first applying some standard clustering methods, e.g. K-means[14], and then estimating the parameters for each cluster. However, clustering methods based on the distance relation between the samples are complicated by the presence of spherical symmetry because it is necessary to distinguish modes that are due only to symmetry from those that distinguish different clusters. Therefore, we propose a model-based clustering algorithm which accommodates symmetry to handle this problem.
Consider the situation where the samples follow a mixture of -invariant density functions. For the VMF distribution, the mixture density has the following form:
(21) |
where is the number of clusters assumed to be fixed a priori, are the parameters for the -th cluster and are the mixing coefficients where and for all . The parameters of (21) can be estimated by the EM algorithm:
E-step:
(22) |
M-step:
(23) | ||||
(24) |
where is the probability of sample belonging to the -th cluster and the -th symmetric component.
For the Watson distribution, the mixture of -invariant Watson density is
(25) |
The E-step is similar to (22) with replaced by function. The M-step can be computed with a similar approach as in Section III-B with the following modifications:
(26) | ||||
(27) |
where .
Iv-a Multi-modality Tests on -invariant Spherical Distributions
Given sample set on , the objective is to determine whether the samples are drawn from one single distribution or a mixture of distributions. For polycrystalline materials, the result of this determination can be used to discover undetected grains within a region. We propose to use a multi-modal hypothesis test based on the -invariant distributions to solve this problem. The two hypotheses are : The samples are from a single -invariant distribution ; and : The samples are from a mixture of distributions . The Generalized Likelihood Ratio Test (GLRT) [15] has the following form:
(28) | ||||
where are the parameter spaces for the two hypotheses. The and functions for VMF and Watson distributions are defined in (7), (15) and (21), (25
) respectively and the test statistic
can be calculated by the proposed EM algorithm. According to Wilks’s theorem [16] as approaches , the test statistic will be asymptotically-distributed with degrees of freedom equal to
, which is the difference in dimensionality of and . Therefore, the threshold in (28) can be determined by a given significance level .V Application to Crystallographic Orientation
Crystal orientation and the grain distribution in polycrystalline materials determine the mechanical properties of the material, such as, stiffness, elasticity, and deformability. Locating the grain regions and estimating their orientation and dispersion play an essential role in detecting anomalies and vulnerable parts of materials.
Electron backscatter diffraction (EBSD) microscopy acquires crystal orientation at multiple locations within a grain by capturing the Kikuchi diffraction patterns of the backscatter electrons [17]. A Kikuchi pattern can be translated to crystal orientation through Hough Transformation analysis [18] or Dictionary-Based indexing [19]. The process of assigning mean orientation values to each grain is known as indexing. Crystal forms possess point symmetries, e.g. triclinic, tetragonal, or cubic, leading to a probability density of measured orientations that is invariant over an associated spherical symmetry group . Therefore, when the type of material has known symmetries, e.g., cubic-type symmetry for nickel or gold, the -invariant VMF and Watson models introduced in Section III can be applied to estimate the mean orientation and the concentration associated with each grain. Furthermore, the clustering method along with the multi-sample hypothesis test in Section IV can be used to detect the underlying grains within a region.
V-a Simulation of Crystallographic Orientation
To simulate the crystallographic orientations, we first draw random samples from VMF and Watson distributions with . The random variable in a spherical distribution can be decomposed [11]:
(29) |
where and . Let be the p.d.f. of the distribution where is the mean direction. According to the normal-tangent decomposition property, for any rotationally symmetric distribution,
is uniformly distributed on
, the -dimensional sphere normal to , and the density of is given by:(30) |
For VMF distribution, substituting (29) into (4) and combining with (30), we have the density of the tangent component as:
(31) |
Similarly, the density of the tangent component of Watson distribution is:
(32) |
Random samples from the density functions (31) and (32) can be easily generated by rejection sampling.
The generated quaternions from VMF and Watson distributions are then mapped into the Fundamental Zone (FZ) with the symmetric group actions to simulate the wrap-around problem we observe in real data, i.e. observations are restricted to a single FZ. For cubic symmetry, the FZ in quaternion space is defined in the following set of equations:
Vi Experimental Results
Vi-a -invariant EM-ML Parameter Estimation on Simulated Data
Sets of i.i.d. samples were simulated from the VMF or Watson distributions using the method described in Sec.V-A with given for the point symmetry group associated with the symmetries of cubic crystal lattice planes. The number of samples for each simulation was set to and was swept from to while, for each simulation run, was selected uniformly at random. The experiment was repeated times and the average values of and the inner product are shown in Fig. 1 and 2. In the figures we compare performance for the following methods: (1) the naive ML estimator for the standard VMF or Watson model that does not account for the point group structure (labeled ”ML Estimator”). (2) Mapping each of the samples toward a reference direction (randomly selected from ), i.e. , where , to remove possible ambiguity. Then performing ML for the standard VMF or Watson distribution (labeled ”Modified ML”). (3) Applying our proposed EM algorithm directly to the samples using the mixture of VMF distribution (9)-(11) (labeled ”EM-VMF”) (4) Applying our proposed EM algorithm to the mixture of Watson distribution (16)-(20) (labeled ”EM-Watson”).
Figure 1 shows the inner product values . The proposed EM-VMF and EM-Watson estimators have similar performance in that they achieve perfect recovery of the mean orientation () much faster than the other methods as the concentration parameter increases (lower dispersion of the samples about the mean) no matter whether the data is generated from VMF (Fig. 1(a)) or Watson distribution (Fig. 1(b)), indicating the robustness of the proposed approaches under model mismatch. Notice that when is small ( for VMF data and for Watson data), none of the methods can accurately estimate the mean orientation. The reason is that when is small the samples become nearly uniformly distributed over the sphere. The threshold value at which performance starts to degrade depends on the choice of point symmetry group and the distribution used to simulate the data. In Fig. 2 it is seen that the biases of the proposed EM-VMF [1] and EM-Watson estimators are significantly lower than that of the other methods compared. While the modified ML performs better than the naive ML estimator, its bias is significantly worse than the proposed EM-VMF and EM-Watson approaches.
Figure 3 shows the computation time of the estimation algorithms presented in Fig. 1 and Fig. 2. The computation time for all methods decreases as becomes larger. When is small ( for VMF data and for Watson data), because the samples are almost uniformly distributed around the sphere, it is difficult for the EM algorithms to converge to the optimal solution and they therefore require maximum number of iterations to stop, forming the plateaus in Fig. 3. Notice that EM-Watson requires less time than EM-VMF even though it has more complicated E and M-steps. The reason is that EM-Watson uses only half of the symmetry operators, which corresponds to the size of the quotient group as described in Section III-B. By applying the hyperbolic sinusoidal simplification in Section III-A (labeled ”EM-VMF-Hyper”), we can further reduce the computation time by more than a factor of compared to the original EM-VMF.
Vi-B -invariant Clustering on Simulated Data
In this section, we demonstrate the performance of our proposed EM approaches for clustering. Sets of i.i.d. samples were simulated from the VMF or Watson distributions with and one of two mean directions () to generate two clusters of samples. The spherical symmetry group is as before. The number of samples for each set was set to and was swept from to while, for each set, was selected uniformly at random. The experiment was repeated times and the average values of the inner product are shown in Fig. 4. In the figure we compare performances of the following methods: (1) Cluster the samples by standard K-means algorithm with the distance defined by the arc-cosine of the inner product and then use the naive ML within each cluster to estimate the mean directions (labeled ”K-means”). (2) Cluster the samples by K-means with the distance defined as (3) and then use the aforementioned modified ML estimator (labeled ”Modified K-means”). (3) Apply our proposed multi-cluster EM-VMF algorithm to the samples directly (22)-(24) (labeled ”EM-VMF”) (4) Apply our multi-cluster EM-Watson algorithm to the samples directly (26)-(27) (labeled ”EM-Watson”).
Figure 4 shows the average inner product values from the mean direction estimation. The proposed EM-VMF and EM-Watson are able to correctly cluster the samples and achieve perfect recovery of the two mean orientations much faster than the other K-means approaches. Notice that the region where all the methods fail is larger than the single cluster case since multiple clusters increase the difficulty of parameter estimation. Again, no matter whether the samples are simulated from VMF or Watson distribution, our proposed approaches perform equally well under both cases.
To further test the ability to detect multiple clusters given a set of samples, we generate sets of samples. Each set has samples and is assigned randomly to label or . If the set is labeled , the samples are generated from a single distribution; If the set is labeled , then the samples in the set are randomly generated from two distributions with different means. The GLRT is used with the four aforementioned clustering methods to test whether the samples in each set are uni-modal or multi-modal. The Receiver Operating Characteristic (ROC) curves of the test results are shown in Fig. 5. The naive K-means with ML estimator which does not consider the symmetry group actions fails to distinguish whether the multiple modes are from actual multiple distributions or due to the wrap-around effect from the fundamental zone mapping. Therefore, this approach tends to over-estimate the goodness of fit of the model for true negative cases and under-estimate it for true positive cases, resulting in a result that is even worse than random guessing. The modified K-means performs better than K-means but worse than our proposed EM-VMF and EM-Watson algorithms.
Vi-C EM-ML orientation estimation for IN100 Nickel Sample
We next illustrate the proposed EM-VMF and EM-Watson orientation estimators on a real IN100 sample acquired from US Air Force Research Laboratory (AFRL) [19]. The IN100 sample is a polycrystalline Ni superalloy which has cubic symmetry in the point symmetry group. EBSD orientation measurements were acquired on a pixel grid, corresponding to spatial resolution of nm. The Kikuchi diffraction patterns were recorded on a photosensitive detector for each of the pixels.
Figure 6 (a) shows a sub-region of the full EBSD sample where the orientations are shown in the inverse pole figure (IPF) coloring obtained from the OEM EBSD imaging software and (b) is the back-scattered electron (BSE) image. Note that the OEM-estimated orientations in some grain regions of the IPF image are very inhomogeneous, having a mottled appearance, which is likely due to a fundamental zone wrap-around problem. As an alternative, we apply a combination of the proposed EM estimators (EM-VMF or EM-Watson) and the GLRT (28) with and significance level to detect multi-modal distributions within each OEM-segmented region. Figure 6 (c)(e) show the estimates of the mean orientations of the regions/sub-regions, where the sub-regions surrounded by white boundaries indicate those that have been detected as deviating from the distribution of the majority of samples from the same region. The multi-modally distributed regions may be due to undetected grains, inaccurate segmentation, or noisy orientation observations. To distinguish the latter situations from the first in which the region really consists of two grains, the misalignment/noise test introduced in [20] can be used. Figures 6 (d)(f) show the estimated concentration parameter for the regions/sub-regions. Note that the estimated are large for most of the regions/sub-regions because those regions which have multi-modally distributed samples are detected and their concentration parameters are estimated separately for each sub-region.
Vii Conclusion
A hyperbolic -invariant von Mises-Fisher distribution was shown to be equivalent to the distribution proposed in [1]. The advantage of the hyperbolic form is parameter estimation can be performed with substantially fewer computations. A different group invariant orientation distribution was introduced, called the -invariant Watson distribution, and an EM algorithm was presented that iteratively estimates its orientation and concentration parameters. We introduced multi-modal generalizations of these -invariant distributions using mixture models and showed that these can be used to effectively cluster populations of orientations that have spherical symmetry group invariances. The mixture of VMF and Watson models were applied to the problem of estimation of mean grain orientation parameters in polycrystalline materials whose orientations lie in the point symmetry group. Application of the finite mixture representation to other types of groups would be worthwhile future work.
Acknowledgment
The authors are grateful for inputs from Megna Shah, Mike Jackson and Mike Groeber. AOH would like to acknowledge financial support from USAF/AFMC grant FA8650-9-D-5037/04 and AFOSR grant FA9550-13-1-0043. MDG would like to acknowledge financial support from AFOSR MURI grant FA9550-12-1-0458.
References
- [1] Y. Chen, D. Wei, G. Newstadt, M. De Graef, J. Simmons, and A. Hero, “Parameter Estimation in Spherical Symmetry Groups,” Signal Processing Letters, IEEE, vol. 22, no. 8, pp. 1152–1155, Aug. 2015.
- [2] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 1–38, 1977.
- [3] K. Sohn, D. Y. Jung, H. Lee, and A. O. Hero, “Efficient learning of sparse, distributed, convolutional feature representations for object recognition,” in Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011, pp. 2643–2650.
- [4] M. De Graef and M. E. McHenry, Structure of materials: an introduction to crystallography, diffraction and symmetry. Cambridge University Press, 2007.
- [5]
- [6] G. Birkhoff and S. Mac Lane, A brief survey of modern algebra. Macmillan, 1963.
- [7] G. McLachlan and D. Peel, “Finite Mixture Models,” 2004. [Online]. Available: http://cds.cern.ch/record/997567
- [8] D. Eberly, “Euler angle formulas,” Geometric Tools, LLC, Technical Report, 2008.
- [9] O. Rodrigues, Des lois géométriques qui régissent les déplacements d’un système solide dans l’espace: et de la variation des cordonnées provenant de ces déplacements considérés indépendamment des causes qui peuvent les produire. publisher not identified, 1840.
- [10] S. L. Altmann, “Rotations, quaternions, and double groups,” 2005. [Online]. Available: http://cds.cern.ch/record/1436474
- [11] K. V. Mardia and P. E. Jupp, “Directional statistics,” 1999. [Online]. Available: http://cds.cern.ch/record/1254260
- [12] G. Watson, “Equatorial distributions on a sphere,” Biometrika, pp. 193–201, 1965.
- [13] H. Bateman, A. Erdélyi, W. Magnus, F. Oberhettinger, and F. G. Tricomi, Higher transcendental functions. McGraw-Hill New York, 1955, vol. 3.
- [14] J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A k-means clustering algorithm,” Applied statistics, pp. 100–108, 1979.
- [15] A. O. Hero, “Statistical methods for signal processing,” MATRIX, vol. 2, p. 6, 2000.
- [16] S. S. Wilks, “The large-sample distribution of the likelihood ratio for testing composite hypotheses,” The Annals of Mathematical Statistics, vol. 9, no. 1, pp. 60–62, 1938.
- [17] K. Saruwatari, J. Akai, Y. Fukumori, N. Ozaki, H. Nagasawa, and T. Kogure, “Crystal orientation analyses of biominerals using Kikuchi patterns in TEM,” Journal of Mineralogical and Petrological Sciences, vol. 103, no. 1, pp. 16–22, 2007.
- [18] N. C. K. Lassen, Automated determination of crystal orientations from electron backscattering patterns. Institut for Matematisk Modellering, Danmarks Tekniske Universitet, 1994.
- [19] S. U. Park, D. Wei, M. De Graef, M. Shah, J. Simmons, and A. O. Hero, “EBSD image segmentation using a physics-based forward model.” in ICIP, 2013, pp. 3780–3784.
- [20] Y.-H. Chen, D. Wei, G. Newstadt, J. Simmons, and others, “Coercive Region-level Registration for Multi-modal Images,” arXiv preprint arXiv:1502.07432, 2015.
Comments
There are no comments yet.