1 Introduction
Repetitive, periodic behavior is ubiquitous in our world. Animal locomotion, mechanical motion, biological rhythms, and musical rhythms can all be characterized as periodic phenomena. In this work, we consider the problem of taking a video that contains such repetitive motion of many periods, and synthesizing a single fine detail, slow motion template cycle. This fine detailed analysis can be applied, for example, to characterize subtle progressions of blood flow in the face during a heartbeat [17]. Having access to a consensus template can also be used to visualize variations from cycle to cycle. This could be used to indicate the onset of failure in a repetitive automated action on an assembly line, to assess the stress of motions in repetitive human actions [13], and to optimize performance in athletic activities.
Our approach to slow motion templates is Eulerian; that is, we process the video pixel by pixel with no tracking. Eulerian approaches for video synthesis are attractive due to their simplicity and ease of implementation. For instance, a Fourier bandpass filter can successfully elucidate subtle periodic motions in videos [37, 36]. Timedomain Eulerian approaches can synthesize infinitely playing stochastic “video textures” [25], as well as seamlessly loop video templates from a single cycle of periodic motion [20, 19], though such approaches are not applicable to our problem since we fuse multiple cycles. There are some pitfalls with Eulerian techniques, however. They are known to fail for large motions, and also require the user to manually specify frequencies of interest [37, 36]. Furthermore, periodtoperiod drift due to a moving camera or small motion variations can be problematic, as can unrelated motion in the background or occlusions [26].
To address the problems with an Eulerian representation, we devise a geometric and topological Eulerian framework by constructing sliding window embeddings of videos. Sliding window embeddings, or delay reconstructions, have found a diverse array of applications in activity recognition [10, 34], video analysis [31], and motion capture analysis [34]. Sliding window embeddings of periodic time series, in particular, form samples of a topological loop [24], as do generalized multivariate sliding windows on periodic video data, regardless of the type of motion present [29, 31]. Furthermore, the sliding window embedding of an Eulerian video provides a form of regularization that mitigates the effects of drift [31]. Our geometric framework also allows us to use topological data analysis (TDA) [9, 7, 8, 4, 11] and fundamental frequency estimation [22] to autotune the spatial and temporal scales of the periodic motion, respectively, so that no user intervention is required. We use Laplacian Eigenmaps [2] computed on the sliding window embedding to obtain a circular phase for each window. We then reorder the windows by , and we use windows that overlap in time vote on final pixels in each frame of the template, further mitigating drift. Figure 1 highlights the steps of our technique for a sampled 1D periodic time series. There are only 12 samples per period in the original signal, so the details of each period are coarse and noisy. However, once resorted, we get a finedetailed representation of one period.
Many prior works use a combination of sliding window embeddings and TDA, for instance to detect chatter in mechanical systems [16] and quantify periodic activities in motion capture data [33, 35]. However, our approach in using a sliding window jointly with TDA to synthesize a new result is novel. Related works use spectral geometry to rearrange unstructured collections of images around a loop as a preprocessing step for structure from motion [1] or along a line to order the microscopic images of a developing embryo [6]. By constrast, we seek a cyclic reordering given input frames ordered in time, enabling us to construct a sliding window.
2 Slow Motion Templates
Figure 1 shows the different parts of our method for a 1D periodic signal for illustration purposes. The actual input to our method is a 3channel video of frames, each frame of resolution . We treat each frame as a point in Euclidean space of dimension , and denote the set of all videos by the sequence for integervalued . For steps (ac), we do not rely on the extrinsic geometry of the set of points , only its set of pairwise distances. Thus, we can project all of the points into a lowerdimensional space and apply the set of techniques to the projection, assuming that pairwise distances are preserved. Since , a distancepreserving projection into an dimensional space is always possible. Denoting this projection by , our method for the steps in (a) operates on . We also use the third layer of a Gaussian pyramid on each frame in place of the raw frames before applying to mitigate drift.
2.1 Sliding Window Embeddings
The sliding window video embedding [3, 29, 31] for is defined as
(1) 
where is the dimension of the embedding. As shown by Takens [27], a sliding window of dimension of even a single generic observation function of a dynamical system of intrinsic dimension is sufficient to reconstruct a topological embedding of the underlying trajectory in the original state space. For periodic signals, this state space is a torus, and should be twice the number of harmonics present to injectively reconstruct loops on that torus [24]. The same is true of videos in general [31]. Furthermore, the sliding window length maximizes the topological persistence (Section 2.3) of the embeddings when for some integer , where is the fundamental period [24, 31]. To estimate , and hence , we perform 1D ISOMAP [28] on to generate a 1D surrogate signal, and perform an autocorrelationbased fundamental frequency estimation [22].
2.2 Laplacian Eigenmaps
Since sliding window embeddings of periodic videos lie on a topological loop, this means that, at an appropriate scale, a graph built on the sliding window point cloud will be approximately circular. Based on this, we use tools from spectral graph theory [5], in particular the graph Laplacian, to estimate the phase of each sliding window. We first construct an adjacency matrix in which the entry at represents the similarity between windows at times and . Given a scale , we define the unweighted adjacency as if and 0 otherwise. We also define a weighted version in which . In both cases, we define the graph Laplacian as:
(2) 
where is the degree matrix representing the sum of all outgoing weights on the diagonal.
We now examine the unweighted Laplacian of an ideal model to motivate our approach. Suppose we have extracted a sliding window embedding for a video which repeats itself exactly every frames for times, for a total of windows. Let us further suppose it is possible to choose a so that the unweighted adjacency matrix contains an entry of 1 for edges corresponding to windows that are adjacent in time, as well as for the corresponding windows at, before, and after repetitions of the window at intervals of ; that is, the Laplacian is a symmetric, circulant matrix defined as
(3) 
Circulant matrices are diagonalized by the Discrete Fourier Transform
[12], and their nonzero eigenvalues come in pairs with multiplicity two, with corresponding eigenvectors
, , , . In the case of Equation 3, it can be shown using the DFT that the eigenvalues are and(4) 
The smallest two nonzero eigenvalues occur when , corresponding to the eigenvectors ^{3}^{3}3This generalizes the circle graph used in [1], in which
(5) 
Therefore, the smallest two numerically nonzero eigenvalues each correspond to sinusoids with period which are mutually orthogonal. When plotted against each other, they form a circle with an arbitrary phase offset. Therefore, we compute the circular phase numerically , where is the eigenvector corresponding to the smallest numerically nonzero eigenvalue.
In practice, graphs of the sliding window embedding may deviate from the ideal Laplacian model in Equation 3, though they do so gracefully. Figure 2 shows an example of and for the jumping jacks video. To improve robustness, we default to the weighted Laplacian so that small changes in the threshold lead to small changes in . In this case, harmonics of the actual frequency of interest occasionally have smaller eigenvalues when the corresponding harmonics are strongly present in the video. To mitigate this, we search through the 10 eigenvectors corresponding to the smallest 10 eigenvalues, sorted by eigenvalue, and we use the pair of adjacent eigenvectors with the smallest number of zero crossings whose zero crossing counts are within a factor of 20 of each other.
2.3 Persistent Homology
It remains to find the spatial scale for the Laplacian. To adapt to the data, we leverage 1D persistent homology from TDA [8] to find the scale at which the primary topological feature – the single cycle – exists. Specifically, we compute the 1D Vietoris Rips Filtration on our sliding window data , which tracks equivalence classes of loops, known as homology classes [14]. The algorithm returns a socalled persistence diagram, a multiset of points on a 2D birth/death grid, with each point corresponding to a loop class. The birth value indicates the scale at which the loop class forms, and the death value indicates the scale at which that class no longer exists. The difference is known as the persistence of the class. For our scenario, one point in the diagram should have a much larger persistence than the others (e.g. Figure 1) [24, 31], and this ideally reflects the single cycle of motion that we seek. We take our scale to be for the largest , where is a parameter which will be explored experimentally in Section 3.
Finally, homology computation requires the specification of coefficients that belong to a usergiven field [14, 21]. Coefficients in are commonly chosen, however in our scenario this is problematic. For instance, the motion of a jumping jack contains a second harmonic, since an individual jumps twice per cycle, and using coefficients in would lead to socalled Möbius splitting [24, 32, 30]. Thus, we use coefficients in the field in order to capture these types of complex motions.
2.4 Cycle Reordering And Median Voting
Given the phase estimates , we can now synthesize the final slow motion template by lining up the sliding windows by . For a template with frames, we choose a set of equally spaced angles around the circle at which to sample the template. Let be the unwrapped phase of . Based on this, we estimate the number of periods goes through as . Assuming is correct and that has a constant slope over , there is a phase gap of between adjacent stacked frames in each sliding window. Thus, spans the interval . A given
will potentially be contained in the intervals for many different windows, though it is unlikely to coincide exactly with any of the frames in each window, so we use linear interpolation to fill in a frame corresponding to
in each window overlapping it. The frame at angle is taken as the median of all interpolated frames. Furthermore, we only use interpolated frames that are a userprescribed amount away in time for each given reordered frame, to avoid ghosting artifacts [23].3 Experiments
3.1 Quantitative Tests of Circular Coordinates
We first experimentally quantify the accuracy of our circular coordinate inference, since we cannot hope to get a good slow motion template without accurate circular coordinates. We generate 3 different 600 frame synthetic periodic videos for which we know the ground truth circular coordinates, using software from [15]^{4}^{4}4Please see supplementary material for these videos and simulated errors.. Each video is roughly pixels. We vary the number of cycles that the videos undergo between . We then vary the “shake” of the video (width of a motion blur kernel) from to
pixels to assess the effect of drift. We then add Gaussian noise width standard deviations of
(original RGB ranges are in . Finally, we add color drifting, occluding squares of varying lengths taking a random walk to assess the effect of occlusions / background motion. We also compare the which weights the birth time and death time from TDA. Figure 3 shows the average angular error in degrees for our pipeline under these variations, over 50 trials per condition. Overall, performance is stable to the choice of , and the errors are low for severe noise and for moderate shake and occlusions with a sliding window. Without a sliding window, the only video that performs reasonably is the “crowd” video, though the errors increase more rapidly with shake/noise/occlusion than with the sliding window, validating the “time regularization” aspect of sliding windows.3.2 Qualitative Video Template Results
We now qualitatively examine the results of our slow motion templates on some examples. Figure 4 shows the difference between a simple reordering and a median consensus reordering. Due to natural variation from cycle to cycle, the simple reordering has many temporal discontinuities when interleaving these cycles. By contrast, the median voting is clean, and it has the added benefit of removing nonperiodic background components. Figure 5 shows an extreme example in which an original spinning fan video has only 6 frames per period at framerate. Please refer to our supplementary materials for these videos, as well as an exercise video [18] and videos of amplified blood flow in the neck [36] and face [37].
4 Conclusions
We have presented an approach that combines topological data analysis with spectral geometric analysis to reorder a video consisting of repetitive periodic motion into a single, slow motion template cycle. Our quantitative results demonstrate robustness to noise, drift, and background outliers, and our qualitative results reveal motion that is challenging to visually perceive from the raw input video. For future work, we plan on exploring applications of our technique to detecting motion irregularities, visualizing subtle motions from repetitive motion that is temporally aliased, and reconstructing templates for videos with large amounts of missing data.
References
 [1] Hadar AverbuchElor and Daniel CohenOr. Ringit: Ringordering casual photos of a temporal event. ACM Trans. Graph., 34(3):33, 2015.
 [2] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373–1396, 2003.
 [3] Liangyue Cao, Alistair Mees, and Kevin Judd. Dynamics from multivariate time series. Physica D: Nonlinear Phenomena, 121(12):75–88, 1998.
 [4] Gunnar Carlsson. Topology and data. Bulletin of the American Mathematical Society, 46(2):255–308, 2009.
 [5] Fan RK Chung. Spectral graph theory. Number 92. American Mathematical Soc., 1997.
 [6] Carmeline J. Dsilva, Bomyi Lim, Hang Lu, Amit Singer, Ioannis G Kevrekidis, and Stanislav Y. Shvartsman. Temporal ordering and registration of images in studies of developmental dynamics. Development, 142(9):1717–1724, MAY 1 2015 2015.
 [7] Herbert Edelsbrunner and John Harer. Persistent homologya survey. Contemporary mathematics, 453:257–282, 2008.
 [8] Herbert Edelsbrunner and John Harer. Computational topology: an introduction. American Mathematical Soc., 2010.
 [9] Herbert Edelsbrunner, David Letscher, and Afra Zomorodian. Topological persistence and simplification. In Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on, pages 454–463. IEEE, 2000.
 [10] Jordan Frank, Shie Mannor, and Doina Precup. Activity and gait recognition with timedelay embeddings. In AAAI. Citeseer, 2010.
 [11] Robert W Ghrist. Elementary applied topology. Createspace, 2014.
 [12] Chris Godsil and Gordon F Royle. Algebraic graph theory, volume 207. Springer Science & Business Media, 2013.

[13]
Runyu L Greene, David P Azari, Yu Hen Hu, and Robert G Radwin.
Visualizing stressful aspects of repetitive motion tasks and opportunities for ergonomic improvements using computer vision.
Applied ergonomics, 65:461–472, 2017.  [14] Allen Hatcher. Algebraic Topology. Cambridge University Press, 2002.
 [15] Alec Jacobson, Ilya Baran, Ladislav Kavan, Jovan Popović, and Olga Sorkine. Fast automatic skinning transformations. ACM Transactions on Graphics (TOG), 31(4):77, 2012.
 [16] Firas A Khasawneh and Elizabeth Munch. Chatter detection in turning using persistent homology. Mechanical Systems and Signal Processing, 70:527–541, 2016.
 [17] Mayank Kumar, Ashok Veeraraghavan, and Ashutosh Sabharwal. Distanceppg: Robust noncontact vital signs monitoring using a camera. Biomedical optics express, 6(5):1565–1588, 2015.
 [18] Ofir Levy and Lior Wolf. Live repetition counting. In Proceedings of the IEEE International Conference on Computer Vision, pages 3020–3028, 2015.
 [19] Jing Liao, Mark Finch, and Hugues Hoppe. Fast computation of seamless video loops. ACM Trans. Graph., 34(6):197:1–197:10, October 2015.
 [20] Zicheng Liao, Neel Joshi, and Hugues Hoppe. Automated video looping with progressive dynamism. ACM Trans. Graph., 32(4):77:1–77:10, July 2013.
 [21] Clément Maria, JeanDaniel Boissonnat, Marc Glisse, and Mariette Yvinec. The gudhi library: Simplicial complexes and persistent homology. In International Congress on Mathematical Software, pages 167–174. Springer, 2014.
 [22] Philip Mcleod and Geoff Wyvill. A smarter way to find pitch. In In Proceedings of the International Computer Music Conference (ICMC’05, pages 138–141, 2005.

[23]
Simone Meyer, Oliver Wang, Henning Zimmer, Max Grosse, and Alexander
SorkineHornung.
Phasebased frame interpolation for video.
In
Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on
, pages 1410–1418. IEEE, 2015.  [24] Jose A Perea and John Harer. Sliding windows and persistence: An application of topological methods to signal analysis. Foundations of Computational Mathematics, 15(3):799–838, 2015.
 [25] Arno Schödl, Richard Szeliski, David H Salesin, and Irfan Essa. Video textures. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 489–498. ACM Press/AddisonWesley Publishing Co., 2000.
 [26] Chris Stauffer and W Eric L Grimson. Adaptive background mixture models for realtime tracking. In Computer Vision and Pattern Recognition, 1999. IEEE Computer Society Conference on., volume 2, pages 246–252. IEEE, 1999.
 [27] Floris Takens et al. Detecting strange attractors in turbulence. Lecture notes in mathematics, 898(1):366–381, 1981.
 [28] Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction. science, 290(5500):2319–2323, 2000.
 [29] Christopher J Tralie. High dimensional geometry of sliding window embeddings of periodic videos. In Proceedings of the 32st International Symposium on Computational Geometry (SOCG), 2016.
 [30] Christopher J Tralie. Moebius beats: The twisted spaces of sliding window audio novelty functions with rhythmic subdivisions. In 18th International Society for Music Information Retrieval (ISMIR), Late Breaking Session, 2017.
 [31] Christopher J. Tralie and Jose A. Perea. (quasi)periodicity quantification in video data, using topology. SIAM Journal on Imaging Sciences, 11(2):1049–1077, 2018.
 [32] Christopher John Tralie. Geometric Multimedia Time Series. PhD thesis, Duke University Department of Electrical And Computer Engineering, 2017.
 [33] Mikael VejdemoJohansson, Florian T Pokorny, Primoz Skraba, and Danica Kragic. Cohomological learning of periodic motion. Applicable Algebra in Engineering, Communication and Computing, 26(12):5–26, 2015.
 [34] V Venkataraman and P Turaga. Shape descriptions of nonlinear dynamical systems for videobased inference. IEEE transactions on pattern analysis and machine intelligence, 2016.
 [35] Vinay Venkataraman, Karthikeyan Natesan Ramamurthy, and Pavan Turaga. Persistent homology of attractors for action recognition. In Image Processing (ICIP), 2016 IEEE International Conference on, pages 4150–4154. IEEE, 2016.
 [36] Neal Wadhwa, Michael Rubinstein, Frédo Durand, and William T Freeman. Phasebased video motion processing. ACM Transactions on Graphics (TOG), 32(4):80, 2013.
 [37] HaoYu Wu, Michael Rubinstein, Eugene Shih, John Guttag, Frédo Durand, and William Freeman. Eulerian video magnification for revealing subtle changes in the world. 2012.
Comments
There are no comments yet.