As the result of a proton-proton collision at a hadron collider, hundreds of particles are created and detected Aad:2010ac ; Khachatryan:2010nk . While some particles can be identified by their type, such as electrons Khachatryan:2015hwa ; Aad:2014nim and muons Chatrchyan:2012xi ; Aad:2014rra , most of the detected particles are light hadrons produced in collimated sprays called jets. Jets are the consequence of high energy quarks or gluons fragmenting into colorless hadrons. Experimentally, jets are defined by clustering schemes which group together measured calorimeter energy deposits or reconstructed charged particle tracks. A jet algorithm is a clustering scheme that connects the measured objects with theoretical quantities that can be calculated and simulated. At a hadron collider, the natural coordinates for describing particles are , , and , where is the magnitude of the momentum transverse to the proton beam, is the rapidity, and is the azimuthal angle. Particles or calorimeter energy deposits are clustered using jet algorithms based on distance metrics on their coordinates in . In order for a jet algorithm to be useful to experimentalists and theorists, the collection of jets should be IRC safe in the following sense:
Infrared safe (IR): if a particle is added with , the jets are unaffected.
Collinear safe (C): if a particle with momentum is replaced with two particles and with momenta such that , then the jets are unaffected.
The jet algorithms most widely used at hadron colliders fall into a class of schemes known as sequential recombination Ellis:1993tq . These IRC safe schemes require metrics on momenta and proceed as follows:
Assign each particle as a proto-jet.
Repeat until there are no proto-jets left: Let and without loss of generality, . If , declare proto-jet a jet and remove it from the list. Otherwise, combine proto-jets and into a new proto-jet with momentum .
One common prescription is called the Cambridge-Aachen (C/A) algorithm Dokshitzer:1997in ; Wobisch:1998wt , which uses and . The fixed quantity is roughly the size of the jet in . By far, the most ubiquitous jet algorithm used at the Large Hadron Collider (LHC) is the anti- algorithm Cacciari:2008gp with and .
The purpose of this paper is to introduce a new paradigm for jet clustering, called fuzzy jets, based on probabilistic mixture modeling. Section 2 introduces the statistical concept of a mixture model and describes the necessary modification to make the procedure IRC safe. Section 3
gives one efficient method for clustering fuzzy jets based on the Expectation-Maximization (EM) algorithm. Section4.4 contains several examples comparing fuzzy jets with sequential recombination and Sec. 5 describes how one might mitigate the impact of overlapping proton-proton collisions (pileup). We conclude in Sec. 6 with some summary remarks and outlook for the future.
2 Mixture Model Jets
Mixture models opac-b1097397
are a statistical tool for clustering which postulate a particular class of probability densities for the data to be clustered. Generically, for grouping-dimensional data points into clusters, the mixture model density is
where is the unknown weight of cluster such that and is a probability density on -dimensions with unknown parameters to be learned from the data. A common choice for is the normal density with for the -dimensional mean and the covariance matrix. In the mixture model paradigm, the are the cluster properties; in the Gaussian case, is the location of cluster and describes its shape in the -dimensional space. When clustering with a finite mixture, the number of clusters must be specified ahead of time111There is a wealth of literature on the subject of choosing , for a survey of methods, see determiningk . The likelihood monotonically increases with ; as alternatives to maximum likelihood, one can for instance look for kinks in the likelihood as a function of gap ., which is dual to the usual use of sequential recombination222It is similar to the exclusive form of the sequential recombination scheme Catani:1993hr . The exclusive nature of the algorithm (and the minimization procedure used to find the jets) is similar to the XCone algorithm Stewart:2015waa ; Thaler:2015xaa that became public as this manuscript was in its final preparation. in which is learned and the size of jets is specified ahead of time. The standard objective in (frequentist) mixture modeling is to select the parameters which maximize the likelihood (Eq. 1) of the observed dataset. Figure 1 illustrates what the learned event density might look like for and Gaussian in dimensions.
An equivalent way of approaching mixture modeling is to view Eq. (1) as the density used to generate the data. We view the data as having been drawn randomly from the density specified in Eq. (1), with the following setup:
Throw independent and identical -sided dice with probability to land on side and label the outcomes .
Independent of the others, data point is drawn randomly from .
Once and are learned by minimizing Eq. (1), we can compute
, the posterior probability thatwas generated by or, intuitively, the posterior probability that belongs to cluster . The are the soft assignments of particles to jet and will play an important role in Sec. 3 when we show how to maximize the likelihood in Eq. (1). Jets produced with mixture modeling are called fuzzy jets because of the soft memberships - every particle can belong to every jet with some probability333Soft assignments for jets during clustering was studied in the context of the “optimal jet finder” Grigoriev:2003tn which maximizes a function of the soft assignments.. This can be seen explicitly in Fig. 1 where the densities of all three clusters are everywhere nonzero, so for all . The idea of probabilistic membership was recently studied in the context of the Q-jets algorithm Ellis:2012sn in which the same event is interpreted many times by injecting randomness into the clustering procedure. Unlike Q-jets, fuzzy jets allocates the soft membership functions deterministically throughout the clustering procedure. However, like Q-jets, there is an ambiguity in how to assign kinematic properties to the clustered jets. Fuzzy jets are defined by their shape (and location), not their constituents. This is in contrast to anti- jets, which are defined by their constituents without an explicit shape determined from the clustering procedure. One simple assignment scheme is to define the momentum of a fuzzy jet as
In other words, this procedure assigns every particle to its most probable associated jet. This scheme will be known as the hard maximum likelihood (HML) scheme, but is not the only possible assignment algorithm. The dual problem in sequential recombination is the jet area, which must be defined areas , whereas the jet kinematics are the ‘natural’ coordinates.
We now specialize the likelihood in Eq. (1) to the case of clustering particles into jets at a collider like the LHC. Consider a mixture model in two dimensions444One must take care in selecting a class of densities appropriate for the angular quantity . For more details on the wrapped Gaussian distribution and motivation for its use in this context, see Appendix
. For more details on the wrapped Gaussian distribution and motivation for its use in this context, see AppendixA. with . The resulting mixture model (MM) jets are inherently not IR safe: particle does not appear in the likelihood and therefore arbitrarily low energy particles can influence the clustering procedure. Therefore, we add a modification to the log likelihood:
where is a weighting factor. Equation (3) is the log of Eq. (1) with the term inserted in the outer sum. For , the resulting modified mixture model (mMM) jets are IR safe, and when , the jets are C safe. Therefore, for , the jets are IRC safe. Different choices of component densities in Eq. (3) give rise to different IRC safe MM jet algorithms. We have studied several possibilities for , but for the remainder of this paper will specialize to (wrappedfootnote 4) Gaussian
. The resulting fuzzy jets are called modified Gaussian Mixture Model jets (mGMM) and are parameterized by the locations, the covariance matrices , and the cluster weights . We initialize and .
Since practical procedures for maximizing the modified likelihood in Eq. (3) may converge to stationary points that are not globally optimal, the output of a fuzzy jet algorithm will depend on an initial setting of the cluster parameters and . One simple procedure, used exclusively for the rest of the paper, is to seed fuzzy jets based on the output of a sequential recombination jet algorithm. This guarantees an IRC safe initial condition and therefore the entire procedure is IRC safe. We now discuss practically how one can find the maximum of the fuzzy jets likelihood.
3 Clustering Fuzzy Jets: the EM Algorithm
One iterative procedure for maximizing the mixture model likelihood in Eq. (1) is the Expectation-Maximization (EM) algorithm em1 ; em2 ; em3 . After initializing the cluster locations and prior density , the following two steps are repeated:
Given the current values of , compute the fuzzy membership probabilities .
Given , maximize the expected modified complete log likelihood over the parameters .
The expected modified complete log likelihood has the form
Note that the expected modified complete log likelihood is not the same as the expected modified log likelihood, shown in Eq. (3). They differ in that the complete log likelihood has the second sum outside the logarithm while Eq. (3) has the sum inside the logarithm. The power of the EM algorithm is that maximizing the complete log likelihood results in fixed point iteration to monotonically improve the original log likelihood. This desirable property of the EM algortihm is still true when ; for a proof, see Appendix B. Many choices for have closed form maxima for the M step; in the Gaussian case outlined above, the updates are given by
where . The well-known -means clustering algorithm macqueen1967 can be recovered as the limit of expectation-maximization in a Gaussian mixture model with . Figure 2 illustrates GMM clustering using the EM algorithm with clusters. The EM algorithm readily accommodates constraints on the model parameters. One constraint we will consider throughout the rest of the paper is for all , which requires the curves of constant likelihood in to be circular. We will see in the next section that the learned value of is useful for distinguishing jets originating from different physics processes.
4 Comparisons with Sequential Recombination and Jet Tagging
This section describes some numerical comparisons between sequential recombination and fuzzy jets. Section 4.1 summarizes the simulation details with some first event displays showing both fuzzy and sequential recombination jets. These two approaches to jet clustering are studied over an ensemble of events in Sec. 4.2. A third subsection, Sec. 4.3, illustrates that fuzzy jets captures new information about the hadronic final state, and in the fourth section, Sec 4.4
, it is demonstrated that this new information can be used to classify the jet type.
4.1 Details of the Simulation
In order to study fuzzy jets in a realistic scenario, we run Monte Carlo (MC) simulations. Three physics processes are generated using Pythia 8.170 Pythia8 ; Pythia at TeV. Hadronic boson and top quarks are used for studying hard 2- and 3-prong type jets, respectively. To simulate high hadronic decays, bosons are generated to decay exclusively into a and boson which subsequently decay into quarks and leptons, respectively. The scale of the hadronically decaying is set by the mass of the which is tuned to 800 GeV for this study so that the GeV. In this range, the decay products are expected to merge within a cone of where . A sample enriched in 3-prong type jets is generated with , where the mass sets the energy scale of the hadronically decaying top quarks. In this analysis, we use TeV, which sets GeV. To study the impact on signal versus background, QCD dijets are generated with a range of that is approximately in the same range as the relevant signal process. In all distributions, the QCD spectrum is weighted to exactly match that of the signal to control for differences between signal and background due only to the spectrum differences. Pileup is simulated by overlaying additional independently generated minimum-bias interactions with each signal event. For the rest of this section, the number of pileup interactions . See Sec. 5 for studies of .
For a comparison to fuzzy jets, anti- jets are clustered using FastJet fastjet 3.0.3. The signal processes are chosen such that jets with radius parameter are most appropriate in capturing the decay products of the heavy particles. The anti- jets are trimmed trimming by re-clustering the constituents into subjets and dropping those which have . Anti- jets are also used to seed the fuzzy jet clustering; the threshold for this initialization is 5 GeV555This low threshold guarantees that there are enough seed jets around to capture the radiation from the underlying event. Another strategy could be to use the Event Jet (see Sec. 5) even when there is no pileup., and the impact of this choice is studied in Appendix C.
To model the discretization and finite acceptance of a real detector, a calorimeter of towers with size in extends out to
. The total energy of the simulated particles incident upon a particular cell are added as scalars and the four-vectorof any particular tower is given by
To simulate a particle flow reconstruction, the sum in Eq. (6) contains only neutral particles for and both charged and neutral particles for . Charged particles within are individually added to the list of inputs for clustering, unless they originate from a pileup collision. Anti- jet momenta are corrected for pileup on average using area subtraction areas . The median pileup density, , is estimated by clustering hard scatter particles, neutral pileup particles, and charged pileup particles in the range using jets in FastJet with ghosted areas.
A representative event display for a event is shown in Figure 3. The top right plot in Figure 3 shows the anti- jets with GeV as filled in (partial) circles. The filled area is determined by the jet area and there are deviations from circles only one a low jet is close to a higher jet. The two top quarks are depicted as red stars, each of which sits at the center of two high jets. The top left plot in Figure 3 shows mGMM fuzzy jets. The fuzzy jets are depicted by their 1- contours. In contrast to the anti- jets, fuzzy jets vary widely in radial size. Gray crosses in the top left plot indicate the locations of the anti- jets shown in the top right plot. The long tail of the crosses point toward the fuzzy jet for which they were the seed. The two jets closest to the top quarks did not move a long distance from the seed location, though the size did change significantly from . The lowest fuzzy jet moved a long distance from the seed to the final location.
Another new feature of fuzzy jets compared to anti- jets is that they can overlap with each other. This is seen by the four jets with overlapping 1- contours in the top left plot of Figure 3. Overlapping mGMM jets are an expression of structure inadequately captured with a single Gaussian shape. The ability to learn features at different scales in the same event without relying on a size parameter like the anti- radius parameter can give mGMM fuzzy jets additional descriptive power over anti- and other traditional jet algorithms. This particular event will be used again for reference in Section 5 during a discussion on the performance of the technique in the presence of pileup interactions.
4.2 Kinematic Properties of Fuzzy Jets
Jets clustered according to the mGMM algorithm capture similar hard jet locations and jet energy (under HML) as those clustered by anti- . In Figure 4, the distribution for the highest jets for three different physics processes are plotted as given by anti- and mGMM jets. The anti- distributions are re-weighted so that all three processes have identical distributions in the left plot. On the right, the distributions are in good correspondence with those in the left plot, though there is a slight shift of the peak. Additionally, the locations of the highest mGMM jets are in excellent correspondence with the locations of the anti- jets as was already discussed in reference to Figure 3.
The mGMM algorithm differs from the anti- algorithm in how the size and structure of clustered jets. This was already shown qualitatively in Figure 3: fuzzy jets come in a variety of sizes, and can overlap in complex ways. The matter is further complicated by the choice of particle assignment scheme for defining kinematic properties in the mGMM family of algorithms. The catchment area’s volume and shape of a fuzzy jet depends in general on the full set of learned jet locations and model parameters, . In contrast, for anti- jets, the catchment area is bounded from above by and is only smaller when another high jet is nearby. The nonlocality of the mGMM clustering model can be observed quantitatively by examining jet mass, given in Eq. (7), which is sensitive to the distribution of energy within a jet. The jet mass distributions for both mGMM (HML assignment) and anti- jets are shown in Figure 5, with the same weighting as in Figure 4. Even though fuzzy jets learn the same core (i.e. ) for jets as anti-, they do not learn the same mass. The white dashed lines in Figure 5 mark the locations of the boson and top quark masses at about 80 GeV and 175 GeV, respectively Agashe:2014kda . For both anti- and fuzzy jets, there are clear peaks at the mass for the boosted from simulated events and at the top quark mass for simulated events. However, there are clear differences in the shape of these distributions. The mass peak for events is more peaked for fuzzy jets, though there is also a low-mass contribution to the distribution. For events, the top quark mass peak is less populated for fuzzy jets, which instead has shifted events to the mass peak. This often happens when the tree-prong structure is learned by two (overlapping) fuzzy jets. The QCD multi-jet jet mass distribution is also qualitatively different between fuzzy jets and anti- jets, with the former shifted to lower values of the mass.
4.3 New Information from Fuzzy Jets
The properties of a fuzzy jet can be useful in distinguishing jets resulting from different physics processes. In the simplest realization of mGMM jets already described above, , where is a measure of the size of the core of a jet. Although is a simple variable to construct from the wealth of data available after clustering with the mGMM algorithm, it captures at least some of the schematic differences in the likelihood for and relative to a QCD multijet background (shown below).
The left plot of Figure 6 also shows the average over all fuzzy jets in an event. The generic fuzzy jet is rather independent of the physics process and tends to be quite large. This is because fuzzy jets capturing hard radiation tend to be small, but most of the fuzzy jets needed to capture the sparse radiation pattern from the underlying event need to be large. In contrast, the for the leading mGMM jets are shown the right plot of Figure 6 for each of the three physics processes. As expected, the decay relative size of the highest jets depends on the physics process. For the decay of a boosted heavy particle with mass and , the radial size of the decay products scales as and thus since the distribution in Figure 6 is fixed, one would expect that the top quark jets have a larger than the boson jets, which are in turn larger than the quark and gluon jets. This is reflected666At leading order, there is an exact relationship between and the jet mass - See Appendix D. in the three peaks in the left plot of Figure 6. The separation between the three physics processes it not 100% correlated with the naive scaling of the corresponding leading anti- jets. Figure 7 shows that there is a strong positive correlation between and the corresponding anti- mass over as expected. There are two peaks in the correlation for the events because the anti- mass spectrum has peaks at both the top mass, and the boson mass. While the correlations between the fuzzy jet and the anti- are non-negligible, they are far from unity and thus there may be additional information contained in the fuzzy jet that is useful for tagging the flavour of a jet.
4.4 Fuzzy Jets for Tagging
In this section, is compared with another class of jet substructure variables known to be useful for tagging: the -subjettiness ratios nsub .
-subjettiness moments are defined over a set ofaxes777We use the “one-pass” axes optimization technique, which uses an exclusive algorithm to find axes and then refines them by minimizing the -subjettiness value., and calculated as:
where is the normalization
and is the radius of the jet. In practice, the useful variables for determining how much more -pronged a jet is compared to -pronged are the -subjettiness ratios:
The variable is often used for the separation of from QCD jets wbosonCMS ; wbosonATLAS and is a measure of the compatibility of a jet with a 2-prong hypothesis compared to a 1-prong hypothesis. Low value of indicates that the jet likely has a 2-prong structure. Similarly, is useful for top tagging in that it measures whether a 3-prong structure is a better description of a jet relative to a 2-prong structure.
The rest of his section contains comparisons of the performance of relative to for separating from QCD jets, as well as relative to for tagging amongst a QCD jet background. In Figure 8, a -nearest neighbors classifier was trained with 2-fold cross validation in TMVA Hocker:2007ht . The left plot in Figure 8 demonstrates an increase in performance for discriminating from QCD relative to using alone. The fuzzy jet is roughly equally useful to the -subjettiness ratio at a sigma efficiency of , and using both variables greatly improves background rejection. Similar results can be seen in the right plot of Figure 8, where boosts background rejection relative to alone. In each case, the training and classification was performed in a mass window around the particles of interest, the top quark mass in the sample and the boson mass to discriminate from QCD.
The comparisons of the fuzzy jet and -subjettiness are intended to be an illustrative example. As discussed in the opening of this section, is just one
variable that can be constructed by using mGMM clustered jets. Expanded studies of the various learned parameters could come up with additional variables, or the full learned parameter set could be thrown into an off the shelf classifier or machine learning model.
5 Underlying Event and Pileup
As with any new jet algorithm or jet variable, understanding the effect of pileup vertices from additional proton-proton collisions is essential to make meaningful statements about how the method will be applicable to real data analyses at the LHC. Studying pileup in the context of mGMM jets is complicated by the effective catchment area of the jets. For hierarchical-agglomerative algorithms like anti-, the catchment area scales with the radius parameter. However fuzzy jets can have infinite catchment area because the likelihood for particle membership is nonzero for any finite distance and arrangement of Gaussian jets and particles. Furthermore, the catchment area can change depending on the other jets in an event. Although this effect also occurs in the hierarchical-agglomerative case, the effect is much more pronounced in the mGMM clustering algorithm, with some jets having finite catchment areas while others cluster infinite area.
The challenge of pileup for fuzzy jets is illustrated in Figure 9, where the same event is shown with , and with . The event displays show the central region of the detector, where most of the decay products of the hard scatter lie. Qualitatively, it can be seen that the introduction of additional interaction vertices broadens all of the mGMM jets. This broadening clearly impacts the power of for differentiating QCD background from signal processes.
The next sections explore two methods for mitigating the impact of pileup in relation to fuzzy jets, illustrated with the variable .
5.1 Changing for Pileup Suppression
In Section 2, it was discussed that choosing in the likelihood (Eq. (3)) guarantees IRC safety. With , the mGMM algorithm treats hard structure and soft structure linearly in the particle or tower . However, one can exploit the fact that is disproportionately a measure of the shape and extent of the leading jet hard structure to make the variable more resilient to the effects of pileup. In particular, choosing stabilizes at high because so long as the average input particle due to pileup is significantly smaller than the of the particles constituting the leading jet hard structure, the change in likelihood will be suppressed roughly according to . An example of this effect is illustrated in Figure 10, which shows the same event as in Figure 9. The price for adjusting is the loss of collinear safety. Varying is not explored further, as Section 5.2 demonstrates a method for dealing with pileup effectively that does not rely on moving away from the IRC safe value of one.
5.2 Tower Subtraction and the Event Jet: Effective Pileup Correction
Recent developments in pileup mitigation have led to several algorithms for correcting jet inputs before jet clustering beings. Such techniques include Pileup Per Particle Identification (PUPPI), Constituent Subtraction, and SoftKiller puppi ; constsub ; softkill . One simple input-correction scheme is to subtract from each calorimeter tower the estimated pileup density per unit area multiplied by the size of the tower in the detector. As a first step, is calculated in the same way as described in Sec. 4.1. Tower momenta are then corrected according to Eq. (11), where is the corrected momentum, is the original momentum, and is the area of the tower. In this study, all towers have area in - space.
While subtracting the average background from towers before clustering is a relatively safe way of reducing the effect of pileup, at least when the scales of the tower to tower fluctuations are small compared to the hard scatter scale, it would still be helpful to systematically address the question of catchment areas. The mGMM clustering algorithm provides a natural framework in which to think about pileup, however, because the algorithm deals fundamentally with likelihoods, and the pileup likelihood is to leading order uniform over the detector (this is the motivation for the area-subtraction technique). This is the motivation for modifying the mGMM likelihood using a technique we call the event jet.
In addition to learning mGMM jets throughout clustering, the event jet includes another background contribution to the likelihood which attempts to capture the intuition of a uniform contribution of particle likelihood due to pileup. Constraints are further imposed on the likelihood on the event level jet so that it has constant likelihood during the clustering process, making the necessary modifications to the algorithm procedures simpler.
Practically, the effect of the event jet can be parameterized through the introduction of an algorithmic parameter . Particle membership probabilities change according to Eq. (12) with corresponding changes to the analytical M step for the Gaussian kernel type. The choice of is important, and it should reflect the fact that not all events are created equal in the sense that not all events have the same contributions due to pileup. Although there is no strict way of dealing with this issue, it is reasonable to replace by a meaningful combination of parameters which is sensitive to our estimates of the amount of pileup in a particular event. We have chosen to take where is our estimate of the density due to pileup, is the calorimeter area, and is a parameter of the algorithm controlling the strength of the event jet. Initial studies with the event jet indicate that introducing a dependent is much more effective than a independent one.
Studies of the pileup conditions similar to LHC Run I, with pileup interactions, indicate that with a 5 GeV cut, provides reasonable stability of the learned . This is demonstrated qualitatively in Figure 11, in which the tower and event jet corrections are applied to the same event shown in Figure 3 at both and . Unlike any of the methods discussed previously, this method for correction maintains IRC safety, demonstrates very little jet broadening at , and is not drastically different in its qualitative features by comparison to the standard mGMM algorithm. Note that the assignment of towers to jets under the HML scheme is impacted with the event jet because many towers belong to the event jet with higher probability than any of the other fuzzy jets. To preserve tower-to-jet assignments under pileup, a smaller value of should be chosen. The event jet is useful instead because it changes the dynamics of clustering, making jets less sensitive to soft radiation far away from the jet axis during the EM update steps, and therefore increasing the stability of the hard core that is eventually clustered.
shows how the mean and standard deviation of learnedevolve with . The uncorrected is shown in red downward pointing triangles while the tower subtraction and event jet corrections are shown in blue upward pointing triangles. For both and QCD, the pileup dependence is dramatically reduced with the tower subtraction and the event jet. The uncorrected mean increases as a function of as all of the fuzzy jets become the same size. The standard deviation of the uncorrected actually decreases beyond as all of the fuzzy jets become the same size. For modest levels of pileup, tower subtraction and event and the event jet maintain the mean and standard deviation of the distribution.
The modified mixture model algorithms provide a new way of looking at whole event structure. In contrast to the usual uses of hierarchical-agglomerative algorithms like anti-, the number of seeds is fixed ahead of time and their properties are learned during the clustering process. The learned parameters provide a new set of handles for distinguishing jets of different types. Even simple variables constructed out of the learned parameters of a mixture of isotropic Gaussian jets, like , offer complementary information to the -subjettiness variables and for tagging boson and top quark jets. Even though the variable is sensitive to changes in pileup conditions, small modifications to the fuzzy jets algorithm – correcting jet inputs and adding a pileup likelihood – can mitigate the impact of pileup.
Fuzzy jets are new paradigm for jet clustering in high energy physics. These IRC safe likelihood-based clustering schemes set the stage for many possibilities for future studies related to jet tagging, probabilistic clustering, and pileup suppression.
We would like to thank Jesse Thaler for useful discussions and helpful feedback on the manuscript. In addition, we thank Gavin Salam for useful comments on the algorithm description. This work is supported by the US Department of Energy (DOE) Early Career Research Program and grant DE-AC02-76SF00515. BN is supported by the NSF Graduate Research Fellowship under Grant No. DGE-4747 and by the Stanford Graduate Fellowship.
Appendix A Wrapped Gaussian
In the EM algorithm described in Sec. 3, there are explicit (and implicit) dependencies on the topology. For instance, if a Gaussian density is used to model , then, in the E step, a particle with near will be deemed far from a cluster with location near . To avoid this undesirable behavior and enforce the equivalence of the angles and , we associate with a wrapped Gaussian density and with a standard Gaussian density:
is a normal distribution and. In order to approximate the sum in Eq. (13), we take only the leading contribution by choosing for . As other contributions are exponentially suppressed, this is a good approximation and recovers continuity near and . Figure 13 illustrates the improved clustering behavior that results when is modeled using the wrapped Gaussian approximation in place of the standard Gaussian density.
Appendix B The EM algorithm
This appendix contains two derivations: the modified EM algorithm updates in Eq. (5) and the proof that the modified EM algorithm generically improves the original modified log likelihood Eq. (3) with every iteration. Recall the expected modified complete log likelihood (mmCLL) from Eq. (4):
Viewing the mCLL as a function of and for fixed and we can maximize. For , we optimize
where the last term is needed so that the optimal is a probability. The derivative of this expression with respect to is
and then summing the equation over and using and the constraint equation , we find that
The updates for and follow from the standard derivation (by similarly taking derivatives of the mCLL with respect to components of these multi-dimensional objects) by noting that the only difference is that and there are no Lagrange multipliers needed unlike for .
Finally, we prove the claim that the modified EM algorithm described in the body of the text monotonically improves the modified log likelihood in Eq. (3). First, we note that we can rewrite the (log) likelihood as
where the inequality in the last line follows from Jensen’s inequality. Now, we are ready to prove the claim that improves monotonically with , the index for the iteration of the EM algorithm. First, note that
where the first term is the mCLL and the second term has no dependance and so maximize over is equivalent to maximize the mCLL over . Therefore, . By the inequality above, . The E step can be recast as choosing