Sample implementation of the MotionMapper behavioral analysis algorithms
Most animals possess the ability to actuate a vast diversity of movements, ostensibly constrained only by morphology and physics. In practice, however, a frequent assumption in behavioral science is that most of an animal's activities can be described in terms of a small set of stereotyped motifs. Here we introduce a method for mapping the behavioral space of organisms, relying only upon the underlying structure of postural movement data to organize and classify behaviors. We find that six different drosophilid species each perform a mix of non-stereotyped actions and over one hundred hierarchically-organized, stereotyped behaviors. Moreover, we use this approach to compare these species' behavioral spaces, systematically identifying subtle behavioral differences between closely-related species.READ FULL TEXT VIEW PDF
We model a situation in which a collection of species derive their fitne...
To effectively connect animal behaviors to activities and patterns in th...
Quality-Diversity optimization is a new family of optimization algorithm...
Advances in GPS telemetry technology have enabled analysis of animal mov...
We describe a data-driven discovery method that leverages Simpson's para...
Yampolskiy and others have shown that the space of possible minds is vas...
Narwhal is one of the most mysterious marine mammals, due to its isolate...
Sample implementation of the MotionMapper behavioral analysis algorithms
The concept of stereotypy–that an organism’s behaviours can be decomposed into discrete, reproducible elements–has influenced the study of ethology, behavioural genetics, and neuroscience for decades ALTMANN:1974uj ; lehner96 . Animals possess the ability to move in a vast continuum of ways, theoretically constrained only by the biomechanical limits of their own morphology. Despite this, the range of behavioural actions typically performed by an animal is thought to be much smaller, constructed largely of stereotyped actions that are consistent across time, individuals, and, in some cases, even species gould82 ; Stephens:2011eg . A discrete behavioural repertoire can potentially arise via a number of mechanisms, including mechanical limits of gait control, habit formation, and selective pressure to generate robust or optimal actions. In many instances, the search for an individual behavioural neural circuit or gene begins with the assumption that a particular action of interest is stereotyped across time and individuals glimcher04 ; Manoli:2006dp .
Despite the centrality of this concept, with few exceptions Osborne:2005bj ; Stephens:2008dk ; Stephens:2011ks ; Desrochers:2010he ; Brown:2013ew , the existence of stereotypy has not been probed experimentally. This is largely due to the lack of a comprehensive and compelling mathematical framework for behavioural analysis. Here, we introduce a new method for quantifying postural dynamics that retains an animal’s full behavioural complexity, using the fruit fly Drosophila melanogaster as a model organism to discover and map stereotyped motions.
Most prior methods for quantifying animal behaviour lie in one of two regimes. One of these is the use of coarse metrics such as a gross activity level (e.g. mean velocity or number of times the organism crosses a barrier) or counting the relative frequencies of particular events engrained into the experimental set-up (e.g. turning left or right in a maze). While this approach allows for high-throughput analysis of various organisms, strains, and species, only the most gross aspects of behaviour can be ascertained, potentially overlooking the often subtle effects of the manipulations of interest that are only apparent at a finer descriptive level. The other common approach for behavioural quantification is to use a set of user-defined behavioural categories. These categories, such as walking, grooming, or fighting, are codified heuristically and scored either by hand or, more recently, via supervised machine-learning techniquesDankert:2009fa ; Branson:2009jf ; Kabra:2013jk ; deChaumont:2012du ; 2013NatCo4E1910K . While the latter approach allows for higher throughput and more consistent labeling, it remains prone to human bias and anthropomorphism and often precludes objective comparisons between data sets due to the reliance on subjective definitions of behaviour. Furthermore, these analyses assume, a priori, that stereotyped classes of behaviour exist without first showing, from the data, that an organism’s actions can be meaningfully categorised in a discrete manner.
Ideally, a behavioural description should manifest itself directly from the data, based upon clearly-stated assumptions, each with testable consequences. The basis of our approach is to view behaviour as a trajectory through a high-dimensional space of postural dynamics. In this space, discrete behaviours correspond to epochs in which the trajectory exhibits pauses, corresponding to a temporally-extended bout of a particular set of motions. Epochs that pause near particular, repeatable positions represent stereotyped behaviours. Moreover, moments in time in which the trajectory is not stationary, but instead moves rapidly, correspond to non-stereotyped actions.
In this paper, we construct a behavioural space for freely-moving fruit flies. We observe that the flies exhibit approximately 100 stereotyped behaviours that are interspersed with frequent bouts of non-stereotyped behaviours. These stereotyped behaviours manifest themselves as distinguishable peaks in the behavioural space and correspond to recognizably distinct behaviours such as walking, running, head grooming, wing grooming, etc. Using this framework, we begin to address biological questions about the underlying postural dynamics that generate behaviour, opening the door for a wide range of other inquiries into the dynamics, neurobiology, and evolution of behaviour.
We probed the spontaneous behaviours of ground-based flies (Drosophila melanogaster) in a largely featureless circular arena (Fig 1). Under these conditions, flies display a multitude of complex, non-aerial behaviours such as locomotion and grooming, typically involving multiple parts of their bodies. To capture dynamic rearrangements of the fly’s posture, we recorded video of individual behaving animals with sufficient spatiotemporal resolution to resolve moving body parts such as the legs, wings, and proboscis.
We designed our arena based on previous work which showed that a thin chamber with gently sloping sides prevents flies from flying, jumping, and climbing the walls Simon:2010ku . To keep the flies in the focal plane of our camera, we inverted the previous design. Our arena consists of a custom-made vacuum-formed, clear PETG plastic dome 100mm in diameter and 2mm in height with sloping sides at the edge clamped to a flat glass plate. The edges of the plastic cover are sloped to prevent the flies from being occluded and to limit their ability to climb upside-down on the cover. The underside of the dome is coated with a repellent silane compound (heptane and 1,7-dichloro-1,1,3,3,5,5,7,7-octamethylte-trasiloxane) to prevent the flies from adhering to the surface. In practice, we find that this set-up results in no bouts of upside-down walking.
Over the course of these experiments, we studied the behaviour of 59 male and 51 female D. melanogaster (Oregon-R strain). Each animal was imaged using a high-speed camera (, pixels). A proportional-integral-derivative (PID) feedback algorithm is used to keep the moving fly inside the camera frame by controlling the position of the X-Y stage based on the camera image in real time. In each frame we focus our analysis on a pixel square containing the fly. We imaged each of the flies for one hour, yielding movie frames per individual, or approximately frames in total. All aspects of the instrumentation are controlled by a single computer using a custom-written LabView graphical user interface.
Each of these flies was isolated within 4 hours of eclosion and imaging occurred 1-14 days after that. Flies were placed into the arena via aspiration and were subsequently allowed 5 minutes for adaptation before data collection (Fig S1). All recording occurred between the hours of 9:00 AM and 1:00 PM, thus reducing the effect of circadian rhythms, and the temperature during all recordings was .
The general framework of our analysis is described in Fig 2
. Images are first segmented and registered in order to isolate the fly from the background and enforce translational and rotational invariance. After this, they are decomposed into postural time series and converted into wavelet spectrograms, thus creating a spatio-temporal representation for the fly’s dynamics within the images. These spectrograms are used to construct spectral feature vectors that we embed into two dimensions using t-Distributed Stochastic Neighbor EmbeddingvanderMaaten:2008tm
. Lastly, we estimate the probability distribution over this two dimensional space and identify resolvable peaks in the distribution. We confirm that sustained pauses near these peaks correspond to discrete behavioural states.
Given a sequence of images, we wish to build a spatio-temporal representation for the fly’s postural dynamics. We start by isolating the fly within each frame, followed by rotational and translational registration to produce a sequence of images in the coordinate frame of the insect. Details of these procedures are listed in Appendix A. In brief, we apply Canny’s method for edge detection canny86 , morphological dilation, and erosion to create a binary mask for the fly. After applying this mask, we rotationally align the images via polar cross-correlation with a template image, similar to previously developed methods decastro87 ; reddy96 ; wilson06 . We then we use a sub-pixel cross-correlation to translationally align the images guizar08 . Lastly, every image is re-sized so that, on average, each fly’s body covers the same number of pixels. An example segmentation and alignment is shown in Supplementary Movie S1.
As the fly body is made up of relatively inflexible segments connected by mobile joints, the number of postural degrees of freedom is relatively small when compared to the 40,000 pixels in each image. Accordingly, a natural representation for the fly’s posture would be to enumerate the relative angles of each of the fly’s appendages as a function of timeRevzen:2012cj ; Ristroph:2009eb ; Fontaine:2009em . Extracting these variables directly from the images, however, is prohibitively difficult due to occlusions and the complex fly limb and wing geometry.
As an alternative strategy, we find that nearly all of the variance in the
pixel images can be explained by projecting the observed pixel values onto a Euclidean space of just 50 dimensions. We apply Principal Component Analysis (PCA) to radon transforms of the images. PCA is a frequently-used method for converting a set of correlated variables into a set of values of linearly uncorrelated eigenmodes. Results from this analysis can be described as the space spanned by the eigenvectors of the data covariance matrix,, corresponding to the largest eigenvalues out of the total latent dimensionality of the data. While, in general, there is no rigorous manner to choose , here, we will keep all modes containing correlations larger than the finite sampling error within our data set. According to this heuristic, we set (Fig 3(c)), a number of modes explaining approximately 93% of the observed variation (Fig 3(d)). Details of this computation can be found in Appendix B.
We refer to these directions of correlated variation as postural modes. As seen in Fig 3(b), these modes are fly-like in appearance, but do not lend themselves to intuitive interpretation. However, projecting individual images onto these axes, we can convert a movie of fly behaviour into a 50-dimensional time series,
as exemplified in Fig 3(e).
The instantaneous values of the postural modes do not provide a complete description of behaviour, as our definition of stereotypy is inherently dynamical. Previously published studies of behaviour have searched for motifs – repeated subsequences of finite length – within a behavioural time series Ye:2011wn ; Brown:2013ew . However, this paradigm is often confounded by problems of temporal alignment and relative phasing between the component time series. Additionally, certain behaviours (for example, wing grooming in Drosophila) involve multiple appendages moving at different time scales, thus complicating the choice of motif length.
As an alternative to this approach, we use a spectrogram representation for the postural dynamics, measuring the power at frequency associated with each postural mode, , in a window surrounding a moment in time, . More specifically, we compute the amplitudes of the Morlet continuous wavelet transform for each postural mode morlet84 . Although similar to a Fourier spectrogram, wavelets possess a multi-resolution time-frequency trade-off, allowing for a more complete description of postural dynamics occurring at several time scales daubechies92 . In particular, the Morlet wavelet is adept at isolating chirps of periodic motion, similar to what we observe in our data set. By measuring only the amplitudes of the transform, we eliminate the need for precise temporal alignment that is required in any motif-based analysis. Details of these calculations are shown in Appendix C, and an example spectrogram is displayed in Fig 3(f). For the results presented here, we look at 25 frequency channels, dyadically spaced between 1 Hz and 50 Hz, the larger of which being the Nyquist frequency of the system.
is comprised of 25 frequency channels for each of the 50 eigenmodes, making each point in time represented by a 1,250-dimensional feature vector encoding the postural dynamics. As correlations, often strong, exist between the various mode-frequency channels, we expect that the dimensionality of the manifold containing the observed values of should be vastly smaller. As such, we would like to find a low-dimensional representation that captures the important features of the data set.
Our strategy for dimensional reduction of the feature vectors is to construct a space, , such that trajectories within it pause near a repeatable position whenever a particular stereotyped behaviour is observed. This means that our embedding should minimise any local distortions. However, we do not require preservation of structure on longer length scales. Hence, we chose an embedding that reduces dimensionality by altering the distances between more distant points on the manifold.
Most common dimensionality reduction methods, including PCA, Multi-dimensional scaling, and Isomap, do precisely the opposite, sacrificing local verisimilitude in service of larger-scale accuracy cox00 ; Tenenbaum:2000jp ; Roweis:2000ey . One method that does possess this property is t-Distributed Stochastic Neighbor Embedding (t-SNE) vanderMaaten:2008tm . Like other embedding algorithms, t-SNE aims to take data from a high-dimensional space and embed it into a space of much smaller dimensionality, preserving some set of invariants as best as possible. For t-SNE, the conserved invariants are related to the Markov transition probabilities if a random walk is performed on the data set. Specifically, we define the transition probability from time point to time point , , to be proportional to a Gaussian kernel of the distance (as of yet, undefined) between them:
All self-transitions (i.e. ) are assumed to be zero. Each of the are set such that all points have the same transition entropy, . This can be interpreted as restricting transitions to roughly 32 neighbors.
The t-SNE algorithm then embeds the data points in the smaller space while keeping the new set of transition probabilities, , as similar to the as possible. The are defined similarly to the larger-space transition probabilities, but are now, for technical reasons, proportional to a Cauchy (or Student-t) kernel of the points’ Euclidean distances in the embedded space. This algorithm results in an embedding that minimises local distortions vanderMaaten:2008tm . If is initially very small or zero, it will place little to no constraint on the relative positions of the two points, but if the original transition probability is large, it will factor significantly into the cost function.
This method’s primary drawback, however, is its poor memory complexity scaling (). To incorporate our entire data set into the embedding, we use an importance sampling technique to select a training set of 35,000 data points, build the space from these data, and then re-embed the remaining points into the space as best as possible (see Appendix D for implementation details).
Lastly, we need to define a distance function, , between the feature vectors. We desire this function to accurately measure how different the shapes of two mode-frequency spectra are, ignoring the overall multiplicative scaling that occurs at the beginning and the end of behavioural bouts due to the finite nature of the wavelet transform. Simply measuring the Euclidean norm between two spectra will be greatly affected by such amplitude modulations. However, because is composed of a set of wavelet amplitudes, it must therefore be positive semi-definite. As such, if we define
then we can treat this normalised feature vector as a probability distribution over all mode-frequency channels at a given point in time. Hence, a reasonable distance function is the Kullback-Leibler (KL) divergence coverthomas between two feature vectors:
Figure 4 shows the embedding of our spectral feature vectors into two dimensions, the space , for all of the 59 individual male flies. We first note that nearby points have similar power (), even though the embedding algorithm normalises-out variations in the total power of the postural motions. Embedding the same data into three dimensions yields a very similar structure with less than 2% reduction of the embedding cost function (Eq. 12, Fig S3).
We generated an estimate of the probability density, by convolving each point in the embedded map with a gaussian of relatively small width (, Fig 4
(b)). Far from being uniformly distributed across this space,contains a large number of resolved local maxima. The locations of these peaks provide a potential representation for the stereotyped behaviours that the flies perform. As expected, we find that individuals display significantly less intra- than inter-individual variation when their behavioural maps are compared (Fig S4).
This space not only contains peaks, but the trajectory through it also pauses at repeatable locations. Through numerical differentiation of and , we observe a two-state “pause-move” pattern of dynamics. Typical time traces of and show this type of trajectory, with long stationary periods interspersed by quick bouts of movement (Fig 5(a)). More quantitatively, we find that the distribution of velocities within the embedded space is well-represented by a two-component log-normal mixture model in which the the two peaks are separated by almost two orders of magnitude (Fig 5(b)). The distribution of points in the low-velocity case (approximately 45% of all time points) is highly localized with distinguishable peaks (Fig 6). The high-velocity points, in contrast, are more uniformly distributed.
The embedded space is comprised of peaks surrounded by valleys. Finding connected areas in the plane such that climbing up the gradient of probability density always leads to the same local maximum, often referred to as a watershed transform meyer94 , we delineate 122 regions of the embedded space. Each of these contains a single local maximum of probability density (Fig 7(a)). When the trajectory, pauses at one of these peaks, we find that each of these epochs correspond to the fly performing a particular stereotyped behaviour. These pauses last anywhere from .05 s up to nearly 25 s (Fig 8(a)).
Observing segments of the original movies corresponding to pauses in one of the regions, we consistently observe the flies performing a distinct action that corresponds to a recognizable behaviour when viewed by eye (Supplementary Movies S2-11). Many of the movements we detect are similar to familiar, intuitively defined behavioural classifications such as walking, running, front leg grooming, and proboscis extension, but here, the segmentation of the movies into behavioural categories has emerged from the data itself, not through a priori definitions. Moreover, we see that near-by regions of our behavioural space correspond to similar, yet distinct, behaviours (Fig 7(c)).
This classification is consistent across individuals (Figures 8-9, Supplementary Movies S3-11). The vast majority of these regions are visited by almost all of the flies at some point (Fig 8(b)). 104 of the 122 regions were visited by over 50 (of 59 total) flies, and the remaining behaviours were all low-probability events, containing, in total, less than 3% of the overall activity.
Periodic orbits in postural movements are suggestive of underlying low-dimensional dynamic attractors that produce stable behavioural templates Full:1999vc . These types of motifs have been hypothesized to form the basis for neural and mechanical control of legged locomotion at fast time scales Holmes:2006uf . Because our behavioural mapping algorithm is based upon similarities between postural frequencies exhibited at different times, a potential hypothesis is that pauses in behavioural space correspond to periodic trajectories in the space of postural movements (Eqn 1). In our data, a fast running gait (the bottom-most region of Fig 10(h)) corresponds to periodic oscillations of the postural time series with a clear peak at 12.9 Hz in the power spectral density (Fig 10(a)-(b)). This frequency is in good agreement with previous measurements of the fly walking gait Strauss:1990ua ; Wosnitza:2013il .
To systematically investigate the periodicity of the postural dynamics, for each behavioural bout we map time onto a phase variable, a cyclic coordinate defined on the unit circle. This process is usually referred to as phase reconstruction. The method we use, Phaser Revzen:2008hb , performs Hilbert transforms to construct phase estimations from several time series separately, then combines these estimates via a maximum likelihood estimate that uses Fourier-series based corrections. Here, we apply Phaser to the postural mode time series, , treating the correlated motions along all 50 postural eigenmodes as synchronized oscillators. We performed this reconstruction for each multi-cycle behavioural bout. After reconstructing the phases for all of the 5,483 bouts of fast running observed in male flies, we observe a clear periodic pattern across several of the postural modes (Fig 10(c)-(f)).
This type of analysis also brings additional insight into the subtle distinctions between our observed behavioural states. If we construct phase-averaged orbits for seven of the running behaviours, we observe many differences in the gait dynamics (see Appendix E, Fig 10(g)). For instance, we observe an increase in many mode amplitudes as the gait frequency increases (e.g. in modes 3, 12, and 13), as noted in previous work Mendes:2013cu . In addition, we also see subtle changes in phase (e.g. in mode 4), as well as a near-elimination of a period-doubled trajectory (seen in mode 14). This type of observation could allow for a more thorough understanding of speed control and gait transitions in hexapod locomotion.
We also find oscillatory postural dynamics for other stereotyped behaviours, with many behaviours resulting in a periodic orbit in postural space (Fig 10(i)). These behaviours are found in many regions of behavioural space, suggesting that much of behaviour is indeed confined to low-dimensional postural dynamics. It is important to note that periodic trajectories emerge directly from our analysis, even though the wavelet transform used to define our feature vectors does not preserve phase information.
To demonstrate the power of this method to detect subtle differences in behaviour, we compared the behavioural spaces of male and female fruit flies by embedding the postural time series data from females into the behavioural space derived from the male flies (Fig 4). Figure 11(a) displays the male and female behavioural probability densities. We find a striking difference between the two sexes, with locomotory behaviours greatly enhanced but resting and slow motions largely suppressed in females when compared to males. This is in agreement with previous results, showing that young females are more active than their male counterparts LeBourg:1987wb .
We then sought to isolate subtle behavioural differences between the sexes that are evident in the fine-scale structure of these maps. An example of this can be seen in the “Wing Movements” portion of the behavioural space (the lower left corner of the map). First, we obtained both male and female region-normalised (R-N) PDFs (Fig, 11(c)), where the integral of the behavioural space density within the ”Wing Movements” region integrates to one. Within the space of wing movements, we identified regions that show statistically significant differences between the two sexes using a Wilcoxon rank sum test Wilcoxon45 at each point in behavioural space. This test determines the locations of significant difference between the median male PDF value and the median female PDF value (-value ). Regions where significant differences were found are indicated by the dashed lines in Figure 11(d).
Particular behaviours, such as left-wing grooming, are sexually dimorphic (11(d) solid box, Movies S12-S13). Male-preferred grooming includes a kick of the middle leg on the left side of the body that clears the profile of the wing and moves anteriorly before pushing back towards the posterior. Female-preferred grooming lacks this additional leg movement. We verified this differences by isolating the mean postural-space orbits associated with each of these regions (Figures 11(f), S6). Importantly, while these orbits are statistically different, the average frequencies for the behaviours are not ( Hz versus Hz). We note that these results are consistent across a large range of the behavioural-map smoothing parameter (Fig S5), such that fine-tuning of the spatial structure of the behavioural map is not necessary to obtain the results seen here.
It should be noted that future study is necessary to determine the ethological relevance of these findings and to understand how much of the variance we observe is related to the specifics of our experimental paradigm. However, the fact that these distinctions are found without specifically looking for any of them – emerging only from underlying statistics of the behavioural map – provides quantitative verification that the classifications we make are meaningful. Inherent in any unsupervised classification method is the question of how to validate its accuracy. Here, there is no ground truth with which to compare, since a significant aim of our work is to dispense with a priori behavioural definitions. However, by showing that meaningful distinctions and agglomerations can be made between different behavioural instances, we provide evidence that the approach introduced here can become the basis undergirding a wide range of experimental investigations into the behaviour of animals.
The ability to map and compare the behavioural repertoire of individuals and populations of animals has applications beyond the study of terrestrial dynamics in fruit flies. Combined with tools for genetic manipulation, DNA sequencing, neural imaging, and electrophysiology, the identification of subtle behavioural distinctions and patterns between groups of individuals will impact deep questions related to the interactions between genes, neurons, behaviour, and evolution. In this initial study, we probed the motion of individuals in a largely featureless environment. Extensions to more complicated situations, e.g. where sensory inputs are measured and/or controlled, genes are manipulated, or multiple individuals are present, are readily implemented.
Finally, we note that the only Drosophila-specific step in our analysis pipeline is the generation of the postural eigenmodes. Given movies of sufficient quality and length from different organisms, spectral feature vectors and behavioural spaces can be similarly generated, allowing for potential applications from worms to mice to humans and a greater understanding of how animals behave.
We thank Yi Deng, Kieren James-Lubin, Kelsi Lindblad, and Ugne Klibaite for assistance in data collection and analysis, as well as Jessica Cande, David Stern, and David Schwab for discussions and suggestions. JWS and GJB also acknowledge the Howard Hughes Medical Institute Janelia Farm Visitor Program and the Aspen Center for Physics, where many ideas for this work were formulated. This work was funded through awards from the National Institutes of Health (GM098090, GM071508), The National Science Foundation (PHY–0957573, PHY–1066293), the Pew Charitable Trusts, the Swartz Foundation, and the Alfred P. Sloan Foundation.
Example code for running the algorithms can be found at https://github.com/gordonberman/MotionMapper. For access to raw image data, please email JWS at firstname.lastname@example.org.
To isolate the fly from the background, we apply Canny’s method for edge detection canny86 , resulting in a binary image containing the edge positions. We then morphologically dilate this binary image by a square in order to fill any spurious holes in the edges and proceed to fill all closed curves. This filled image is then morphologically eroded by a square of the same size, resulting in a mask. After applying this mask to the original image, we now have our segmented image.
While our tracking algorithm ensures that the fly remains within the image boundaries, the centre of the fly and the orientation within the frame vary over time. Having obtained a sequence of isolated fly images, we next register them both translationally and rotationally with respect to a template image. The template image is generated by taking a typical image of a fly and then manually ablating the wings and legs digitally.
For our first step, we rotationally align. This is achieved through finding the angle that maximises the cross-correlation between the magnitudes of the 2D polar Fourier transforms for each image and the template. Because all translation information appears in the phase of the 2D Fourier transform, this rotational alignment, based only upon the magnitude of the transform, is independent of any initial translations between the images. Accordingly, once rotational alignment is achieved, we can subsequently register the images translationally via a cross-correlation.
The aim of the postural decomposition is to take our set of aligned images and create a lower-dimensional representation that can be made into time series. Naively, one would simply perform PCA on the the images, using each pixel value as a separate dimension. The fly images, however, contain too many pixels to analyse due to memory limitations.
To make this problem more tractable, we analyse only the subset of these pixels which have non-negligible variance. Many pixels within the fly image are either always zero or always saturated, thus containing almost no dynamical information. Accordingly, we would like to use only a subsample of these measurements. The most obvious manner to go about this is to find the pixels containing the highest variance and keep only those above a certain threshold. The primary difficulty here, however, is that there is not an obvious truncation point (Fig S2
A). This is most likely the result of the fact that the fly legs can potentially occupy the majority of the pixels in the image but only are present in a relatively small number in any given frame. Hence, many of these periphery pixels all have similarly moderate standard deviations, making them difficult to differentiate.
A more compact scheme is to represent the images in Radon-transform space, which more sparsely parameterises lines such as legs or wing veins. After Radon transformation, the probability density function of pixel-value standard deviations has a clear minimum and we keep pixels whose standard deviation is larger than this value (Fig S2B). This results in keeping 6,763 pixels out of 18,090, retaining approximately 95% of the total variation in the images. If there are images in our sample, we can represent our data set, as an -element matrix. We then proceed to calculate the principal directions of variation in these data using PCA, as seen in Fig 3.
Lastly, the question remains of how many modes to keep in our analysis, a task made more ambiguous due to the smoothness of the eigenvalue spectrum. Our approach to determining the truncation point is to compare the PCA eigenvalues with a null model based on the noise properties of our data set. Specifically, we assume that the noise is due to finite data collection. Although additional errors in image segmentation and registration assuredly exist in our data set, this set null model provides an upper-bound on the number of statistically meaningful eigenmodes.
To calculate this truncation point, we take our data matrix, , and shuffle each of its columns independently from one another, hence eliminating all meaningful correlations between them. Given finite sampling (even if very large), however, there will still remain some residual correlations, resulting in off-diagonal non-zero terms in the covariance matrix. Hence, if we diagonalise this new covariance matrix, the largest eigenvalue provides a resolution limit for our ability to distinguish signal from finite sampling noise. Performing this analysis, we find that only 50 modes have eigenvalues larger than this largest shuffled eigenvalue. These 50 modes account for slightly more than 93% of the observed variation in the data.
We use the Morlet continuous wavelet transform to provide a multiple time scale representation of our postural mode dynamics. More explicitly, we calculate this transform, , via
Here, is a postural time series, is the time scale of interest, is a point in time, and is a non-dimensional parameter (set to 5 here).
The Morlet wavelet has the additional property that the time scale, , is related to the Fourier frequency, , by
This can be derived by maximizing the response to a pure sine wave, , with respect to .
However, is disproportionally large when responding to pure sine waves of lower frequencies. To correct for this, we find a scalar function such that
where is times the Fourier frequency found in Eq. 7. For a Morlet wavelet, this function is
Accordingly, we can define our power spectrum, , via
Last, we use a dyadically-spaced set of frequencies between Hz and the Nyquist frequency ( Hz) via
for (and their corresponding scales via Eq. 7). This creates a wavelet spectrogram that is resolved at multiple time-scales for each of the first 50 postural modes.
For our initial embedding using t-SNE, we largely follow the method introduced in vanderMaaten:2008tm , minimizing the cost function
and is the Euclidean distance between points and in the embedded space. The cost function is optimised through a gradient descent procedure that is preceded by an early-exaggeration period, allowing for the system to more readily escape local minima.
The memory complexity of this algorithm prevents the practical number of points from exceeding . Although improving this number is the subject of current research 2013arXiv1301.3342V , our solution here is to generate an embedding using a selection of roughly 600 data points from each of the 59 individuals observed (out of data points per individual). To ensure that these points create a representative sample, we perform t-SNE on 20,000 randomly-selected data points from each individual. This embedding is then used to estimate a probability density by convolving each point with a 2D gaussian whose whose width is equal to the distance from the point to its nearest neighbours. This space is segmented by applying a watershed transform meyer94 to the inverse of the PDF, creating a set of regions. Finally, points are grouped by the region to which they belong and the number of points selected out of each region is proportional to the integral over the PDF in that region. This is performed for all data sets, yielding a total of 35,000 data points in the training set.
Given the embedding resulting from applying t-SNE to our training set, we wish to embed additional points into our behavioural space by comparing each to the training set individually. Mathematically, let be the set of all feature vectors in the training set, be their associated embeddings via t-SNE, be a new feature vector that we would like to embed according to the mapping between and , and be the embedding of that we would like to determine.
As with the t-SNE cost function, we will embed by enforcing that its transition probabilities in the two spaces are as similar as possible. Like before, the transitions in the full space, , are given by
is the Kullback-Leibler divergence betweenand , and is once again found by constraining the entropy of the condition transition probability distribution, using the same parameters as for the t-SNE embedding. Similarly, the transition probabilities in the embedded space are given by
where is the Euclidean distance between and .
For each , we then seek the that minimises the Kullback-Leibler divergence between the transition probability distributions in the two spaces:
As before, this is a non-convex function, leading to potential complexities in performing our desired optimization. However, if we start a local optimization (using the Nelder-Mead Simplex algorithm jongen04 ; lagarias98 ) from a weighted average of points, , where
this point is almost always within the basin of attraction of the global minimum. To ensure that this is true in all cases, however, we also perform the same minimisation procedure, but starting from the point , where
This returned a better solution approximately 5% of the time.
Because this embedding can be calculated independently for each value of , the algorithm scales linearly with the number of points. We also make use of the fact that this algorithm is embarrassingly parallelizable. Moreover, because we have set our transition entropy, , to be equal to 5, there are rarely more than 50 points to which a given has a non-zero transition probability. Accordingly, we can speed up our cost function evaluation considerably by only allowing for the nearest 200 points to in the original space.
Lastly, we find the space of behaviours for the female data sets by embedding these data into the space created with the male training set. We find that the median re-embedding cost (Eqn. 16) for the female cost is only 1% more than the median re-embedding cost for the male data (5.08 bits vs. 5.12 bits) indicating that the embedding works well for both sexes.
After applying the Phaser algorithm, we find the phase-averaged orbit via a von Mises distribution weighted average. More precisely, we construct the average orbit for eigenmode , via
where is the projection onto the eigenmode at time point , is the phase associated with the same time point, and is related to the standard deviation of the von Mises distribution (, where is the modified Bessel function of order). Here we find the value of , which is the resulting in .
Because phase reconstruction only is unique up to an additive constant, to compare phase-averaged curves of different behavioural bouts, an additional alignment needs to occur. This is performed by first finding the maximum value of cross-correlation between the phase-averaged curves for each mode. Then, the phase offset between that pair of 50-dimensional orbits is given by the median of these found phase shifts.
|Number of angles used in Radon transforms||90|
|Number of postural eigenmodes (found, not defined a priori)||50|
|Number of frequency channels||25|
|Non-dimensional Morlet wavelet parameter||5|
|High-pass frequency cut-off||1 Hz|
|Low-pass frequency cut-off||50 Hz|
|Training set size||35,000|
|Maximum non-zero re-embedding transitions||200|
|Nearest neighbors for training set calculation||10|
Width of gaussian kernel density estimator for embedded points
|Minimum behavioural time scale||s|
|Width of gaussian for embedded-space velocity calculations||s|
|Standard deviation of von Mises smoothing for phase averaging||radians|
Supplementary movies can be obtained by emailing GJB (email@example.com)
Movie S1. Raw video data of a behaving fly (left) and the corresponding segmented and aligned data (right).
Movie S2. Dynamics in behavioural space. Raw video of a behaving D. melanogaster (middle) is displayed alongside coordinates of the fly’s position within the filming apparatus (left) and its position in the embedded behavioural space (right). The red circles represent the positions in the appropriate coordinate system and the trailing lines are the positions traversed in the previous .5 s. The light blue shading indicates that a particular behaviour is being performed, and the blue text below the video of the fly gives a coarse label for the behaviour. The first portion of the movie is 5 s, played at real time (indicated by “Real Time” above the fly video), and the subsequent portion of the movie is slowed down by a factor of 5 for clarity (indicated by “Slowed ”).
Movies S3-11. Each movie is a mosaic of multiple instances of specific regions in behavioural space as displayed in Fig 9 and Table S5. Every movie contains multiple segments from many different individuals and are slowed by a factor of 4 for clarity.
|Movie S4||Right wing grooming|
|Movie S5||Left wing grooming|
|Movie S6||Left wing and legs grooming|
|Movie S7||Wing waggle|
|Movie S8||Abdomen grooming|
|Movie S10||Front leg grooming|
|Movie S11||Head grooming|
Movie S12. Composite movie (slowed by a factor of 4) of randomly chosen instances of flies from the male-preferred behavioural region in Fig 11 of the main text.
Movie S13. Composite movie (slowed by a factor of 4) of randomly chosen instances of flies from the female-preferred behavioural region in Fig 11 of the main text.