Log In Sign Up

Movement science needs different pose tracking algorithms

Over the last decade, computer science has made progress towards extracting body pose from single camera photographs or videos. This promises to enable movement science to detect disease, quantify movement performance, and take the science out of the lab into the real world. However, current pose tracking algorithms fall short of the needs of movement science; the types of movement data that matter are poorly estimated. For instance, the metrics currently used for evaluating pose tracking algorithms use noisy hand-labeled ground truth data and do not prioritize precision of relevant variables like three-dimensional position, velocity, acceleration, and forces which are crucial for movement science. Here, we introduce the scientific disciplines that use movement data, the types of data they need, and discuss the changes needed to make pose tracking truly transformative for movement science.


Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition

Dance experts often view dance as a hierarchy of information, spanning l...

Temporally Guided Articulated Hand Pose Tracking in Surgical Videos

Articulated hand pose tracking is an underexplored problem that carries ...

Physically Plausible Pose Refinement using Fully Differentiable Forces

All hand-object interaction is controlled by forces that the two bodies ...

Movement Assessment from Skeleton Videos: A Review

The raising availability of 3D cameras and dramatic improvement of compu...

Localized Mutual Information Monitoring of Pairwise Associations in Animal Movement

Advances in satellite imaging and GPS tracking devices have given rise t...

1 Movement data matters.

We only interact with the world through our movements. Consequently, many scientists analyze them. Meaningful analysis of movement data is the key to sports science: good movements maximize performance and minimize the risk of injuries. Movement data is crucial to research in physical and occupational therapy: the right movements improve the quality of life for patients with movement disorders. Quantified movement is a major biomarker for disease: the way people move can aid in diagnosing the disease the patient suffers from. Studying movement is also important in its own right as it is exciting to understand why people move the way they do. Lastly, quantifying movement matters as movement is the output of the brain: movement provides a meaningful goal to be encoded in brain signals. Across all these disciplines, movement data is key.

We begin by highlighting the contributions that movement science research makes to science and medicine across a number of disciplines. Our goal of providing this summary is to introduce computer vision researchers to the importance of developing pose tracking algorithms that serve movement science well.


As movement is the way an animal interacts with the world, many neuroscientists believe that understanding the neural basis of movement is key to understanding the brain. Thus, many studies investigate how the animal nervous system represents and controls movement (Figure 1a). In order to obtain such insights, these studies typically measure the output movement (such as position and velocity) and relate it to measured neural signals. For example, a number of studies investigate how arm reaching movements relate to brain signals. Such studies have found a neural correlate of the preparation to move in a particular direction churchland2012neural , the selection of a target to move towards scherberger2007target and the inhibition of an impending movement mirabella2011neural . These studies lead to an understanding of the neural basis of the generation and control of arm reaching movements. Taken broadly, studying the neural basis of tracked movement helps us understand how the healthy human brain works and points at potential faulty mechanisms in people with brain disorders.

Biomedical engineering.

In biomedical engineering, movement data is subject to engineering tools, often with the goal of improving health and medicine (Figure 1b). These engineers analyze and simulate healthy and diseased human movement (motion and forces) using tools such as multibody dynamics and control theory. For example, many biomedical engineers study human locomotion because it is a ubiquitous daily activity that is crucial for good quality of life. Simulations of locomotion of varying complexity srinivasan2006computer ; delp2007opensim have been used to analyze diseased walking skalshoi2015walking , stability and control of locomotion seethapathi2019step , or to understand the effects of lower limb surgery on walking mansouri2016rectus . Other engineering research has developed equipment that measure the external forces on the leg schepers2007ambulatory and the internal forces on muscles martin2018gauging outside the confines of a lab. Thus, biomedical engineering has helped enable the quantification and estimation of movement parameters that are important for human locomotion. More generally, biomedical engineers have used simulation and hardware design to measure and estimate movement data.

Sports and exercise science.

In sports science, human movement data is often used for maximizing athlete performance and success (Figure 1c). In a typical approach, scientists measure movement features (such as position, velocity, acceleration and force) and correlate them with performance. These results are then used to provide feedback and specific training guidance to athletes in competitive sports. For example, in soccer, the ability to accelerate quickly and accurately has been recognized as important for good performance spinks2007effects . Because of this, a number of studies have investigated the effect that different types of training regimes varley2013acceleration , ages of players mendez2011age and player field positions taskin2008evaluating have on acceleration profiles of soccer players’ movements. The results from such studies in soccer inform the decisions of coaches when choosing training exercises, selecting players and assigning field positions. More generally, the insights obtained from athlete movement data inform strategy, training, the minimization of injury risk, and eventual success in sports.


Psychology researchers develop theories of how mental processes (cognition) and perceptions of the environment lead to observable actions; movement is one such action (Figure 1d). Such studies measure the kinematics (speed, position, and timing) of movement in response to certain environmental conditions and test theories of mind, senses and body that best explain the observed behavior. For example, a number of studies in psychology analyze which theories best explain locomotion behaviors in different environments. One study found that people walk systematically differently in the presence of music and posited this is due to perceptual amplification styns2007walking . Others found that walking behavior in moving crowds have characteristic leader-follower behaviors rio2014follow and locomotion in such a moving environment is calibrated in a task-specific way bruggeman2010direction . Developmental psychology studies in children also find that locomotion task-environment mapping is learned in a task-specific way adolph2014fear and that infants explore their environments through aperiodic short paths and falls adolph2012you . Movement serves as a useful quantifiable action that psychologists can analyze to critically evaluate their theories of cognition and perception.


Physiology-based approaches use movement data to get at the mechanisms of movement. They use measures such as displacements, velocities and forces at the intramuscular, intermuscular or whole-body scales (Figure 1e). Additionally, measures of exertion such as muscle activation or metabolic energy consumption are often used. For example, a number of studies in physiology have analyzed how muscle activations are generated and coordinated to control movement and posture. At the intramuscular scale, one study found that motor units within a given muscle are activated over a small spatial range during human standing vieira2011postural . Another study analyzed how length change is shared between a tendon and the rest of a muscle in the ankle loram2007passive or how muscle lengths change dynamically during walking cronin2009mechanical . At the whole-body scale, one study found how multiple muscles are activated in a combined fashion when recovering from a push torres2007muscle and another modeled the role of the different muscle sensors in controlling movement kistemaker2012control . Through the analysis of movement data across scales, studies in physiology enable mechanistic models of movement.

Figure 1: Many important disciplines of science and engineering rely on human movement data for research.


About 42 million people in the United States of America are diagnosed with movement disorders such as Parkinson’s disease, stroke, and cerebral palsy. Additionally, there are about 2 million amputees in the US and many of them use prostheses everyday. Rehabilitation of people with walking disabilities is a major focus area for all movement science disciplines (Figure 1f). Towards this goal, movement data is used as a diagnostic tool, to inform treatment and to quantify the progress of disabled individual post-treatment. For example, movement data is commonly used to diagnose and treat patients with gait disorders. Variability in gait motion is a marker for some movement disorders schniepp2012locomotion and can be explained with simple models of fall-avoidance wang2014stepping . A proposed dopamine-inducing drug kurz2010levodopa and a ‘deep brain stimulation’ treatment allert2001effects were found to improve aspects of measured gait in patients with Parkinson’s disease. On-line movement measures have been used to design prostheses wen2019online and exoskeletons zhang2017human , thus paving the way for customizable assistive devices for gait rehabilitation. The quality of medical interventions and diagnostics can be improved by movement quantification.


Why humans move the way they do, is one of the big questions in science. Studies investigating this question take inspiration from physics to test theories of biological movement (Figure 1g). Typical studies collect movement data to test hypotheses such as minimization of metabolic energy alexander1997minimum , optimal feedback control liu2007evidence

, or Bayesian inference

kording2007causal . For example, scientists ask if the way people walk is as efficient as it could be. They do so by measuring metabolic energy use and kinematics and relating the two through optimization models srinivasan2006computer ; ackermann2010optimality . Minimization of metabolic energy predicts observed walking speed seethapathi2015metabolic , step width maxwell2001mechanical and step frequency bertram2001multiple . Scientists have also studied the relationship between metabolic energy and stability i.e., walking while minimizing the chances of a fall bruijn2009slow ; dean2007effect . Movement data thus enables answering questions about the how and why of human movement.


Inspiration from human movement has been used to develop more stable and more efficient robot motion (Figure 1h). Moreover, there is a recent thrust in the field of robotics to assist with physical therapy in a more repeatable, quantifiable and low-cost environment. Towards this goal, robots use movement data as a source of inspiration for mechanical design, to inform robot control algorithms and to act as a goal signal (in the case of rehab robots). For example, one robot project takes inspiration for leg mechanical design and control from birds with an inverted knee vejdani2013bio . While most robots consume a lot of power for locomotion chestnutt2007locomotion , some robots take inspiration from the mechanics of human locomotion to achieve energy-efficient motion bhounsule2012design . Robots intended for rehabilitating people with walking disorders such as wearable soft exoskeletons ding2014multi and low-cost interactive robots that train and monitor patient progress johnson2007potential are at the cutting edge of the field. Thus, human and animal movement serves as a source of conceptual inspiration as well as provides data for designing and controlling useful robot movement.

Other disciplines that use movement data.

The above description of the disciplines that use movement data is by no means an exhaustive one. Our focus in this review is biased by the expertise of the authors towards scientific and medical applications that aim to understand, assist, and improve human movement. Social sciences, comparative biology, security applications, etc. are examples of fields that also use human and animal movement data and are outside the scope of our expertise. For example, in social sciences, human movement data is used as a metric of body language to infer emotion barliya2013expression with applications to animation hicheur2013perception and social robotics lourens2010communicating . Measuring movements of different animal species as a function of morphology is of interest to comparative biologists interested in understanding the relationship between morphology and movement behavior more2010scaling ; usherwood2008compass . In the field of security, analyzing human movement is important to detect stealthy or threatening behavior using security cameras lin2011human ; neverova2016learning . Thus, our vision for how pose tracking needs to change in order for it to transform movement science may extend to disciplines beyond the specific ones mentioned in this paper.

2 Pose tracking promises to transform the scope and scale of movement science.

Movement science is an interdisciplinary field that has impacted medicine, engineering, neuroscience and sports with thousands of papers being published (see Figure 1) and many tens of thousands being cited every year. However, the traditional tools used for data collection in movement science significantly limit the scope and scale of its study. A vast majority of movement science focuses on contrived and repetitive movements studied inside the confines of a lab, is conducted on small non-representative subject samples and uses expensive equipment for measurements. Computer vision-based tracking of human pose, if it meets its potential, promises to transform movement science wei2018behavioral by broadening its scope, increasing its scale, making it more representative and less expensive.

A majority of movement science studies are conducted inside the confines of a lab, as demanded by large and tethered sensors. This prevents the community from studying the broad range of human movement behavior found in the real world. Also, as lab-confined studies can be quite time consuming, they are often conducted on small sample sizes of human subjects. High quality computer vision-based pose tracking of videos promises to capture complex human movements occurring in natural environments chambers2019pose . Also, because it is relatively easy to obtain videos of human movement from online sources (such as YouTube) with the help of creative commons license, pose tracking could increase the sample sizes of data used in movement science studies by orders of magnitude.

The high cost of the sensors used to measure movement data and the difficulty of recruiting subjects widely limits the accessibility and inclusivity of the science. A substantial fraction of movement science studies are conducted largely on American males of a standard height, size and age as these subjects are easily available in a university setting; this fact is likely to bias the findings of movement science towards a subset of the population. Pose tracking provides the opportunity to conduct studies on people of all sexes, shapes, sizes, and ages; this promises to make the research findings more broadly-applicable and generalizable. Most existing tools for movement science are very expensive and limit the science only to labs that have a lot of funds available. Pose tracking research has the ability to bring movement science to labs that have less access to resources such as smaller schools and departments in developing countries.

There is constant ongoing research to improve the state of the art of computer vision-based pose tracking tools. Despite this, the current pose tracking algorithms fall short of their potential to transform movement science. In the next section, we outline some of the reasons why current pose tracking algorithms are unsuitable to the needs of movement science and suggest some of the ways in which they can be changed to transform movement science.

3 Pose tracking algorithms need to estimate different movement quantities.

Movement science needs good estimates of three-dimensional kinematics, mass, size and kinetics of human and animal movement. For example, body part positions matter for muscle and tendon lengths. Velocities matter for neural signals. Accelerations matter for animals chasing one another. Forces matter for injuries. Energy matters for efficiency of movement. While the two-dimensional position of the left ear of an athlete on a single image may be perfectly scientifically irrelevant, the force of impact on her leg may decide between an outstanding career in baseball and an outstanding bill for physical therapy. Despite this, two-dimensional pixel positions are popular in computer vision as they are easy to obtain and many competitions have been dedicated to maximizing their estimation accuracy. In the rest of this section, we argue that pose tracking should start working to improve the estimation of the quantities that actually matter for doing science with movement data.

Current pose tracking algorithms do not prioritize measurement of the quantities that matter for movement science. The major focus of pose estimation research so far has been on estimating 2D pose from single images; the focus of the field is now quickly moving towards 3D pose from single images. However, in most pose estimation algorithms (see Figure 2a), consecutive time frames are treated as statistically independent and the underlying dynamical structure of the pose statistics are ignored. This omission often results in gross mis-estimates of pose in consecutive frames of a video (for instance, see Figure 2b ii) that can easily be discerned by the human eye. In addition to not incorporating structure in time into the algorithms, the ground truth data benchmarks currently used do not include quantities important to movement science like velocity, acceleration, and forces ionescu2013human3 ; lin2014microsoft ; such benchmarks often measure keypoint localization errors averaged over all frames. Moreover, these benchmark datasets do not consist of the types of movements encountered in movement science, often consisting of contrived poses ionescu2013human3 or relying on much broader-purpose image datasets ionescu2013human3 . In a popular variant of pose tracking, multi-person pose tracking andriluka2018posetrack , the community has focused more on identity-switches and fragmentations of multiple targets, giving even less attention for localization accuracy, not to mention metrics such as velocity and acceleration. Thus, we believe that the field of pose estimation currently does not prioritize important movement variables and this results in poor estimates of the data that matters for movement science.

Some common failure modes of existing pose estimation algorithms are illustrated in Figure 2b. In this section, in addition to highlighting the quantities that matter to movement science, we suggest ways that pose tracking algorithms should be adapted to better estimate these quantities. We provide a tabular summary of our key suggestions to the computer vision community in Figure 3.

Three-dimensional position, velocity and acceleration.

Our movements unfold in three dimensions and most movement science studies focus on three-dimensional positions, velocities, and accelerations. For example, to diagnose progress in a patient with a movement disorder, measures of the three-dimensional kinematics are analyzed hong2009kinematic . A majority of the existing pose tracking algorithms, however, aim to maximize the accuracy of two-dimensional, not three-dimensional, pose in single images or video frames sun2019deep ; yang2017learning ; newell2016stacked ; yang2018parsing . While more and more pose tracking algorithms are recently aiming for three-dimensional estimates kocabas2019self ; sun2018integral , still fewer incorporate tracking in time to improve pose estimates i.e. using the past and future movements to improve localization of pose in a given frame (Figure 2a). Frame-to-frame tracking errors of the kind shown in Figure 2b ii will lead to even larger errors in velocity and acceleration upon numerical differentiation.

Despite the existing issues, we believe that obtaining more accurate three-dimensional positions, velocities and accelerations from videos is possible by changing the pose tracking algorithms and ground truth benchmarks. Skeletal motion naturally creates a hierarchical dependency structure that results in spatial (joint location) and temporal (laws of motion) constraints. Thus, if you know the position, velocity, acceleration, and skeletal shape from the past few frames, then you can build a strong prior for the next frame. None of the existing pose tracking algorithms use such priors to improve pose estimates. Secondly, it is possible to incorporate into the algorithm camera motion to obtain 3D depth information zhou2017unsupervised ; vijayanarasimhan2017sfm . In addition to the issues with existing algorithms outlined above, pose estimation algorithms use crowd-sourced hand-labeled keypoints alp2018densepose ; lin2014microsoft as ground truth and these are likely subject to human error goodman2013data . Also, 3D pose tracking algorithms use contrived lab-based poses for ground truth ionescu2014human3 that likely do not overlap with the distribution of poses that are of interest to movement scientists. Video ground truth data (not static images) for velocity and acceleration in-the-wild could be obtained by collaborating with movement scientists. Then, minimization of errors in velocity and acceleration, not just pose, can be added to the objective functions of the algorithms. Given that many approaches in movement science are infeasible without reasonably accurate three-dimensional kinematic measures beyond static pose, pose tracking algorithms must improve their estimation of three-dimensional movements.

Figure 2: The need for better pose tracking algorithms. a) Most pose estimation papers published in computer vision conferences in the past two years do not use temporal information. b) Typical failure modes when algorithms are applied to videos of interest to movement science. In these cases, the algorithm’s performance is clearly inferior to that of the human eye. To generate this figure, we processed a video of a gymnast using the pretrained keypoint-RCNN model from Detectron Detectron2018 . Example images (CC) taken from YouTube,

External contact forces.

For many applications, quantifying the external forces involved in a movement is important. After all, the external forces determine the stress on bones and joints which relate to injury schache2009biomechanical . However, estimating external forces with current pose tracking algorithms is practically impossible. If there is only one point of contact, one can estimate contact forces using the mass and acceleration estimates for the individual body segments. However, when there are multiple points of contact for a movement, forces cannot be directly estimated from mass and acceleration because such a system is ‘statically indeterminate’ chao1978graphical . Moreover, the relevant frequency content of even relatively slow movements such as walking goes up to about 20 Hz stergiou2002frequency which, according to Nyquist’s theorem Weik2001 , cannot be observed with the frame rate of a typical video camera (about 30 Hz).

This drawback of existing algorithms, however, does not mean that a video-based algorithm cannot successfully estimate external force from movement. Consider someone kicking a ball: their foot decelerates by some extent (which can be estimated from previous and subsequent frames) over the total displacement of the soft tissue of the foot, resulting in the deformation of the shoe-ball complex and the movement of the ball shinkai2008ball . In addition to local deformation near the contact patch, estimating transmitted vibrations of individual body parts (the calf of the leg, for instance) can also help break static indeterminacy and estimate contact forces. As the deformations over which forces occur and the vibrations they result in contain information about the forces themselves, an algorithm designed to estimate external forces from videos should be achievable. Given the importance of external force measurements for movement science, their estimation should be prioritized by the pose tracking algorithms and competitions.

Absolute mass, length and inertia.

Estimates of true whole-body as well as body segment mass and size are necessary for movement science. Estimation of mass and size of individuals is important for understanding how movements differ in people of different body types mcmanus2010children and most movement science papers are expected to report the weight and height statistics of the subjects studied. The mass and inertia measures of body segments are needed to estimate internal forces and torques and to identify the joints that contribute to a given movement ren2008whole . Despite this, typical pose tracking algorithms do not attempt to provide estimates of absolute mass, inertia and size. The best one could do with existing pose tracking algorithms is to obtain the relative size of one segment with respect to other ones. However, such relative measures of size are typically not useful: absolute measures of movement are needed for any type of diagnosis and when comparing movements across individuals of different sizes.

One way to rectify this is by using computer vision to estimate the true size and scale of a known object in the background and use this information to estimate the size of the person or animal in the image. The estimates of absolute mass and inertia can then be made using standard cadaveric length-mass and length-inertia regressions dumas2007adjustments . Adding in priors for the relative sizes of different body segments from empirical data will also prevent errors in pose estimation which result in impossible body lengths (see Figure 2b i). One could also use optical effects, such as limited depth of field and their influence on blur to estimate the true size of the object in a given image. Gravity is a constant and affects the dynamics of freely falling objects seen in videos, this information can be used to estimate the true size of a given object based on the direction of gravity and the trajectories inferred frame-by-frame. This could be done, for instance, by tuning the scaling between the distance in pixels and the true length until the vertical acceleration of the object is equal to acceleration due to gravity. One might also need to incorporate or ignore the effect of air resistance depending on the application. Absolute size and scale of movement is essential for making meaningful scientific inference from movement data and pose tracking algorithms should estimate these.

Pose tracking with task and subject generality.

Aspects of movement have been found to change with subject demographics (like age or sex) and with the movement task (like walking or reaching). For example, studying the likelihood of falls with age hollman2007age is an important area of movement science as many disabilities and deaths in older individuals occur due to falls akyol2007falls . Also, the differences in the injury-proneness of movements in males and females has been studied liederbach2014comparison

. However, it is unclear if pose tracking algorithms will work equally well on people of different demographics or across different movement tasks. Artificial intelligence has been shown to have racial and gender biases due to unbalances datasets

osoba2017intelligence . Providing labels for demographic information and for the type of movement task could help study any systematic biases in the pose estimates. For example, infants and elderly individuals have different body configurations than the rest of the population. Infants have a comparatively larger torso and head while elderly individuals often have a hunched posture and use walkers or crutches; these body configurations are less typically seen in the image datasets used to train pose tracking models. In Figure 2b iv, we show an example of this where the pose estimation algorithms completely fails to detect an upside-down gymnast while still detecting a sitting human with similar amounts of blur, likely due to not being exposed to training data that contains a gymnast mid-task. Thus, it is very likely that the current pose tracking algorithms would need significant retraining to be able to correctly detect the movements of the populations that are of interest for movement science.

Pose tracking algorithms could be better designed to detect the movements of a broader demographic by training the models on a more diverse range of videos that include elderly individuals robinovitch2013video , infants karayiannis2001extraction

, etc. Also, they could be trained to generalize across different tasks by training with videos that contain different types of movements that are of interest to movement science like walking, running, reaching, etc. Computer vision can also provide an estimate of demographics, say, by classifying the face by age

yi2014age or sex xu2008hybrid . Additionally, these algorithms could also be trained to classify demographics based on the movements themselves. Inferring the demographics of the individuals and the type of movement task in addition to tracking pose is essential to study movements in different populations, to ensure unbiased training datasets, and to remove any systematic biases in the estimates for certain populations and tasks.

Figure 3: Key takeaways from the paper regarding what movement science needs from pose tracking and how to get there.

Body contact and partial occlusion.

Pose tracking promises movement science on large datasets in-the-wild which will be useful for many applications that are otherwise difficult to study. However, many in-the-wild environments that are of interest to movement science consist of body contact between multiple individuals and partial occlusion of body parts. For example, studying physiotherapist-patient interactions in a clinical setting necessitates separating the movements of the physiotherapist from those of the patient despite contact mendonca2018quantifying , something current pose tracking algorithms would not be good for. Similarly, in-the-wild data consists of occlusions which currently cause some pose tracking algorithms to fail cao2018openpose with the only solution being to handpick video frames where occlusions are absent.

These issues with not detecting contact and partial occlusions can be dealt with, for instance, by creating training examples by augmenting ground truth data such that individuals from distinct images are artificially brought in contact or occluded, to train the algorithms to better detect such scenarios. The use of such synthetic data and data augmentation will also provide more accurate ground truth data than the currently used hand-labeled estimates of the location of an occluded body part. Additionally, this approach will help balance the dataset by creating more examples that contain occlusion and contact in comparison to the datasets that are currently used. By making pose tracking algorithms better at detecting partial occlusions and contact, we can truly leverage the wealth of in-the-wild data to answer movement science questions.

Fixed frame of reference.

For many movement science applications, the estimates of body motion need to be in a fixed frame of reference. For example, what matters for understanding the progress of an individual undergoing physiotherapy is the knee angle in the body’s frame of reference joukov2014online not in the camera frame of reference. Moreover, Newton’s laws of motion, which are always used for movement science analyses, only hold true in a frame of reference that is static or moving at a constant speed. Current pose tracking methods would estimate such body angles in two-dimensions in the camera’s frame of reference. However, this method is subject to hand-held camera movements delbracio2015burst and the angles estimated would depend heavily on the camera angle, which is not ideal.

One way to deal with the issue of camera-fixed frame of reference in hand-held videos is to use training data where multiple camera angles for the same movement are naturally present, e.g. during sporting events. Additionally, pose tracking algorithms could be trained to use a fixed background object to estimate and remove camera movements from the pose estimates, say, by using SLAM and self calibration to update camera pose mur2015orb . Being able to provide good movement estimates in a fixed frame of reference is crucial for movement science applications.

4 Conclusions

The pose tracking field has made dramatic progress over the course of the last decade. And, indeed, there are impressive demonstrations that show how great the technology is cao2018openpose ; alp2018densepose . However, the field has not yet, with the exception of DeepLabCut mathis2018deeplabcut ; nath2019using , impacted movement science research because its algorithms do not prioritize the quantities that matter for movement science. In this paper, we have introduced computer vision scientists to the field of movement science, outlined the reasons why computer vision has failed to impact movement science despite the obvious scope for connections, and outlined some of the ways in which pose estimation algorithms can be adapted to bridge this gap. We believe that it is time to design pose tracking algorithms around the needs of the community that actually needs pose tracking: movement science.


We thank Claire Chambers for her comments on the issues she faced when using existing pose estimation software.

Funding Statement.

This work was funded by NIH grant R01NS063399.

Competing Interests.

The authors declare that they have no competing interests.

Authors’ Contributions.

NS conceived, wrote and edited the paper, and created the figures and table. SW helped generate figure 2. RS and GB provided comments on the paper. KPK conceived the purpose and scope of the paper and provided ideas. All authors edited the paper.


  • [1] Mark M Churchland, John P Cunningham, Matthew T Kaufman, Justin D Foster, Paul Nuyujukian, Stephen I Ryu, and Krishna V Shenoy. Neural population dynamics during reaching. Nature, 487(7405):51, 2012.
  • [2] Hansjörg Scherberger and Richard A Andersen. Target selection signals for arm reaching in the posterior parietal cortex. Journal of Neuroscience, 27(8):2001–2012, 2007.
  • [3] Giovanni Mirabella, Pierpaolo Pani, and Stefano Ferraina. Neural correlates of cognitive control of reaching movements in the dorsal premotor cortex of rhesus monkeys. American Journal of Physiology-Heart and Circulatory Physiology, 2011.
  • [4] Manoj Srinivasan and Andy Ruina. Computer optimization of a minimal biped model discovers walking and running. Nature, 439(7072):72, 2006.
  • [5] Scott L Delp, Frank C Anderson, Allison S Arnold, Peter Loan, Ayman Habib, Chand T John, Eran Guendelman, and Darryl G Thelen. Opensim: open-source software to create and analyze dynamic simulations of movement. IEEE transactions on biomedical engineering, 54(11):1940–1950, 2007.
  • [6] Ole Skalshøi, Christian Hauskov Iversen, Dennis Brandborg Nielsen, Julie Jacobsen, Inger Mechlenburg, Kjeld Søballe, and Henrik Sørensen. Walking patterns and hip contact forces in patients with hip dysplasia. Gait & posture, 42(4):529–533, 2015.
  • [7] Nidhi Seethapathi and Manoj Srinivasan. Step-to-step variations in human running reveal how humans run without falling. eLife, 8:e38371, 2019.
  • [8] Misagh Mansouri, Ashley E Clark, Ajay Seth, and Jeffrey A Reinbolt. Rectus femoris transfer surgery affects balance recovery in children with cerebral palsy: a computer simulation study. Gait & posture, 43:24–30, 2016.
  • [9] H Martin Schepers, Hubertus FJM Koopman, and Peter H Veltink. Ambulatory assessment of ankle and foot dynamics. IEEE Transactions on Biomedical Engineering, 54(5):895–902, 2007.
  • [10] Jack A Martin, Scott CE Brandon, Emily M Keuler, James R Hermus, Alexander C Ehlers, Daniel J Segalman, Matthew S Allen, and Darryl G Thelen. Gauging force by tapping tendons. Nature communications, 9(1):1592, 2018.
  • [11] Christopher D Spinks, Aron J Murphy, Warwick L Spinks, and Robert G Lockie. The effects of resisted sprint training on acceleration performance and kinematics in soccer, rugby union, and australian football players. The Journal of Strength & Conditioning Research, 21(1):77–85, 2007.
  • [12] Matthew C Varley and Robert J Aughey. Acceleration profiles in elite australian soccer. International journal of sports medicine, 34(01):34–39, 2013.
  • [13] Alberto Mendez-Villanueva, Martin Buchheit, Sami Kuitunen, Andrew Douglas, Esa Peltola, and Pitre Bourdon. Age-related differences in acceleration, maximum running speed, and repeated-sprint performance in young soccer players. Journal of sports sciences, 29(5):477–484, 2011.
  • [14] Halil Taskin. Evaluating sprinting ability, density of acceleration, and speed dribbling ability of professional soccer players with respect to their positions. The Journal of Strength & Conditioning Research, 22(5):1481–1486, 2008.
  • [15] Frederik Styns, Leon van Noorden, Dirk Moelants, and Marc Leman. Walking on music. Human movement science, 26(5):769–785, 2007.
  • [16] Kevin W Rio, Christopher K Rhea, and William H Warren. Follow the leader: Visual control of speed in pedestrian following. Journal of vision, 14(2):4–4, 2014.
  • [17] Hugo Bruggeman and William H Warren. The direction of walking—but not throwing or kicking—is adapted by optic flow. Psychological Science, 21(7):1006–1013, 2010.
  • [18] Karen E Adolph, Kari S Kretch, and Vanessa LoBue. Fear of heights in infants? Current directions in psychological science, 23(1):60–66, 2014.
  • [19] Karen E Adolph, Whitney G Cole, Meghana Komati, Jessie S Garciaguirre, Daryaneh Badaly, Jesse M Lingeman, Gladys LY Chan, and Rachel B Sotsky. How do you learn to walk? thousands of steps and dozens of falls per day. Psychological science, 23(11):1387–1394, 2012.
  • [20] Taian MM Vieira, Ian D Loram, Silvia Muceli, Roberto Merletti, and Dario Farina. Postural activation of the human medial gastrocnemius muscle: are the muscle units spatially localised? The Journal of physiology, 589(2):431–443, 2011.
  • [21] Ian D Loram, Constantinos N Maganaris, and Martin Lakie. The passive, human calf muscles in relation to standing: the non-linear decrease from short range to long range stiffness. The Journal of physiology, 584(2):661–675, 2007.
  • [22] Neil J Cronin, Masaki Ishikawa, Michael J Grey, Richard Af Klint, Paavo V Komi, Janne Avela, Thomas Sinkjaer, and Michael Voigt. Mechanical and neural stretch responses of the human soleus muscle at different walking speeds. The Journal of physiology, 587(13):3375–3382, 2009.
  • [23] Gelsy Torres-Oviedo and Lena H Ting. Muscle synergies characterizing human postural responses. Journal of neurophysiology, 2007.
  • [24] Dinant Arne Kistemaker, Arthur Knoek J Van Soest, Jeremy D Wong, Isaac L Kurtzer, and Paul L Gribble. Control of position and movement is simplified by combined muscle spindle and golgi tendon organ feedback. American Journal of Physiology-Heart and Circulatory Physiology, 2012.
  • [25] Roman Schniepp, Maximilian Wuehr, Maximilian Neuhaeusser, Maria Kamenova, Konstantin Dimitriadis, Thomas Klopstock, M Strupp, Thomas Brandt, and Klaus Jahn. Locomotion speed determines gait variability in cerebellar ataxia and vestibular failure. Movement disorders, 27(1):125–131, 2012.
  • [26] Yang Wang and Manoj Srinivasan. Stepping in the direction of the fall: the next foot placement can be predicted from current upper body state in steady-state walking. Biology letters, 10(9):20140405, 2014.
  • [27] Max J Kurz and Jyhgong Gabriel Hou. Levodopa influences the regularity of the ankle joint kinematics in individuals with parkinson’s disease. Journal of computational neuroscience, 28(1):131–136, 2010.
  • [28] N Allert, J Volkmann, S Dotse, H Hefter, V Sturm, and H-J Freund. Effects of bilateral pallidal or subthalamic stimulation on gait in advanced parkinson’s disease. Movement disorders: official journal of the Movement Disorder Society, 16(6):1076–1085, 2001.
  • [29] Yue Wen, Jennie Si, Andrea Brandt, Xiang Gao, and He Huang.

    Online reinforcement learning control for the personalization of a robotic knee prosthesis.

    IEEE transactions on cybernetics, 2019.
  • [30] Juanjuan Zhang, Pieter Fiers, Kirby A Witte, Rachel W Jackson, Katherine L Poggensee, Christopher G Atkeson, and Steven H Collins. Human-in-the-loop optimization of exoskeleton assistance during walking. Science, 356(6344):1280–1284, 2017.
  • [31] R McN Alexander. A minimum energy cost hypothesis for human arm trajectories. Biological cybernetics, 76(2):97–105, 1997.
  • [32] Dan Liu and Emanuel Todorov. Evidence for the flexible sensorimotor strategies predicted by optimal feedback control. Journal of Neuroscience, 27(35):9354–9368, 2007.
  • [33] Konrad P Körding, Ulrik Beierholm, Wei Ji Ma, Steven Quartz, Joshua B Tenenbaum, and Ladan Shams. Causal inference in multisensory perception. PLoS one, 2(9):e943, 2007.
  • [34] Marko Ackermann and Antonie J Van den Bogert. Optimality principles for model-based prediction of human gait. Journal of biomechanics, 43(6):1055–1060, 2010.
  • [35] Nidhi Seethapathi and Manoj Srinivasan. The metabolic cost of changing walking speeds is significant, implies lower optimal speeds for shorter distances, and increases daily energy estimates. Biology letters, 11(9):20150486, 2015.
  • [36] J Maxwell Donelan, Rodger Kram, and Kuo Arthur D. Mechanical and metabolic determinants of the preferred step width in human walking. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1480):1985–1992, 2001.
  • [37] John EA Bertram and Andy Ruina. Multiple walking speed–frequency relations are predicted by constrained optimization. Journal of theoretical Biology, 209(4):445–453, 2001.
  • [38] Sjoerd M Bruijn, Jaap H van Dieën, Onno G Meijer, and Peter J Beek. Is slow walking more stable? Journal of biomechanics, 42(10):1506–1512, 2009.
  • [39] Jesse C Dean, Neil B Alexander, and Arthur D Kuo. The effect of lateral stabilization on walking in young and old adults. IEEE Transactions on Biomedical Engineering, 54(11):1919–1926, 2007.
  • [40] HR Vejdani, Y Blum, MA Daley, and JW Hurst. Bio-inspired swing leg control for spring-mass robots running on ground with unexpected height disturbance. Bioinspiration & biomimetics, 8(4):046006, 2013.
  • [41] Joel Chestnutt, Philipp Michel, James Kuffner, and Takeo Kanade. Locomotion among dynamic obstacles for the honda asimo. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2572–2573. IEEE, 2007.
  • [42] Pranav A Bhounsule, Jason Cortell, and Andy Ruina. Design and control of ranger: an energy-efficient, dynamic walking robot. In Adaptive Mobile Robotics, pages 441–448. World Scientific, 2012.
  • [43] Ye Ding, Ignacio Galiana, Alan Asbeck, Brendan Quinlivan, Stefano Marco Maria De Rossi, and Conor Walsh. Multi-joint actuation platform for lower extremity soft exosuits. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 1327–1334. Ieee, 2014.
  • [44] Michelle J Johnson, Xin Feng, Laura M Johnson, and Jack M Winters. Potential of a suite of robot/computer-assisted motivating systems for personalized, home-based, stroke rehabilitation. Journal of NeuroEngineering and Rehabilitation, 4(1):6, 2007.
  • [45] Avi Barliya, Lars Omlor, Martin A Giese, Alain Berthoz, and Tamar Flash. Expression of emotion in the kinematics of locomotion. Experimental brain research, 225(2):159–176, 2013.
  • [46] Halim Hicheur, Hideki Kadone, Julie Grezes, and Alain Berthoz. Perception of emotional gaits using avatar animation of real and artificially synthesized gaits. In 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pages 460–466. IEEE, 2013.
  • [47] Tino Lourens, Roos Van Berkel, and Emilia Barakova. Communicating emotions and mental states to robots in a real time parallel framework using laban movement analysis. Robotics and Autonomous Systems, 58(12):1256–1265, 2010.
  • [48] Heather L More, John R Hutchinson, David F Collins, Douglas J Weber, Steven KH Aung, and J Maxwell Donelan. Scaling of sensorimotor control in terrestrial mammals. Proceedings of the Royal Society B: Biological Sciences, 277(1700):3563–3568, 2010.
  • [49] James R Usherwood, Katie L Szymanek, and Monica A Daley. Compass gait mechanics account for top walking speeds in ducks and humans. Journal of Experimental Biology, 211(23):3744–3749, 2008.
  • [50] Y Chih Lin, B Shiang Yang, Yu Tzu Lin, Yi Ting Yang, et al. Human recognition based on kinematics and kinetics of gait. Journal of Medical and Biological Engineering, 31(4):255–263, 2011.
  • [51] Natalia Neverova, Christian Wolf, Griffin Lacey, Lex Fridman, Deepak Chandra, Brandon Barbello, and Graham Taylor. Learning human identity from motion patterns. IEEE Access, 4:1810–1820, 2016.
  • [52] Kunlin Wei and Konrad Paul Kording. Behavioral tracking gets real. Nature neuroscience, 21(9):1146, 2018.
  • [53] Claire Chambers, Gaiqing Kong, Kunlin Wei, and Konrad Kording. Pose estimates from online videos show that side-by-side walkers synchronize movement under naturalistic conditions. PloS one, 14(6):e0217861, 2019.
  • [54] Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence, 36(7):1325–1339, 2013.
  • [55] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
  • [56] Mykhaylo Andriluka, Umar Iqbal, Eldar Insafutdinov, Leonid Pishchulin, Anton Milan, Juergen Gall, and Bernt Schiele. Posetrack: A benchmark for human pose estimation and tracking. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , pages 5167–5176, 2018.
  • [57] Minna Hong, Joel S Perlmutter, and Gammon M Earhart. A kinematic and electromyographic analysis of turning in people with parkinson disease. Neurorehabilitation and neural repair, 23(2):166–176, 2009.
  • [58] Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. Deep high-resolution representation learning for human pose estimation. arXiv preprint arXiv:1902.09212, 2019.
  • [59] Wei Yang, Shuang Li, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang. Learning feature pyramids for human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, pages 1281–1290, 2017.
  • [60] Alejandro Newell, Kaiyu Yang, and Jia Deng. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision, pages 483–499. Springer, 2016.
  • [61] Lu Yang, Qing Song, Zhihui Wang, and Ming Jiang. Parsing r-cnn for instance-level human analysis. arXiv preprint arXiv:1811.12596, 2018.
  • [62] Muhammed Kocabas, Salih Karagoz, and Emre Akbas. Self-supervised learning of 3d human pose using multi-view geometry. arXiv preprint arXiv:1903.02330, 2019.
  • [63] Xiao Sun, Chuankang Li, and Stephen Lin. An integral pose regression system for the eccv2018 posetrack challenge. arXiv preprint arXiv:1809.06079, 2018.
  • [64] Tinghui Zhou, Matthew Brown, Noah Snavely, and David G Lowe. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1851–1858, 2017.
  • [65] Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, and Katerina Fragkiadaki. Sfm-net: Learning of structure and motion from video. arXiv preprint arXiv:1704.07804, 2017.
  • [66] Rıza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7297–7306, 2018.
  • [67] Joseph K Goodman, Cynthia E Cryder, and Amar Cheema. Data collection in a flat world: The strengths and weaknesses of mechanical turk samples. Journal of Behavioral Decision Making, 26(3):213–224, 2013.
  • [68] Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence, 36(7):1325–1339, 2014.
  • [69] Ross Girshick, Ilija Radosavovic, Georgia Gkioxari, Piotr Dollár, and Kaiming He. Detectron., 2018.
  • [70] Anthony G Schache, Tim V Wrigley, Richard Baker, and Marcus G Pandy. Biomechanical response to hamstring muscle strain injury. Gait & posture, 29(2):332–338, 2009.
  • [71] EY Chao and KN An. Graphical interpretation of the solution to the redundant problem in biomechanics. Journal of Biomechanical Engineering, 100(3):159–167, 1978.
  • [72] Nicholas Stergiou, Giannis Giakas, Jennifer E Byrne, and Valerie Pomeroy. Frequency domain characteristics of ground reaction forces during walking of young and elderly females. Clinical Biomechanics, 17(8):615–617, 2002.
  • [73] Martin H. Weik. Nyquist theorem, pages 1127–1127. Springer US, Boston, MA, 2001.
  • [74] Hironari Shinkai, Hiroyuki Nunome, Yasuo Ikegami, and Masanori Isokawa. Ball–foot interaction in impact phase of instep soccer kicking. Science and football VI, 6:41, 2008.
  • [75] Alison M McManus, Eva YW Chu, Clare CW Yu, and Yong Hu. How children move: activity pattern characteristics in lean and obese chinese children. Journal of obesity, 2011, 2010.
  • [76] Lei Ren, Richard K Jones, and David Howard. Whole body inverse dynamics over a complete gait cycle based only on measured kinematics. Journal of biomechanics, 41(12):2750–2759, 2008.
  • [77] Raphael Dumas, Laurence Cheze, and J-P Verriest. Adjustments to mcconville et al. and young et al. body segment inertial parameters. Journal of biomechanics, 40(3):543–553, 2007.
  • [78] John H Hollman, Francine M Kovash, Jared J Kubik, and Rachel A Linbo. Age-related differences in spatiotemporal markers of gait stability during dual task walking. Gait & posture, 26(1):113–119, 2007.
  • [79] AD Akyol. Falls in the elderly: what can be done? International nursing review, 54(2):191–196, 2007.
  • [80] Marijeanne Liederbach, Ian J Kremenic, Karl F Orishimo, Evangelos Pappas, and Marshall Hagins. Comparison of landing biomechanics between male and female dancers and athletes, part 2: influence of fatigue and implications for anterior cruciate ligament injury. The American journal of sports medicine, 42(5):1089–1095, 2014.
  • [81] Osonde A Osoba and William Welser IV. An intelligence in our image: The risks of bias and errors in artificial intelligence. Rand Corporation, 2017.
  • [82] Stephen N Robinovitch, Fabio Feldman, Yijian Yang, Rebecca Schonnop, Pet Ming Leung, Thiago Sarraf, Joanie Sims-Gould, and Marie Loughin. Video capture of the circumstances of falls in elderly people residing in long-term care: an observational study. The Lancet, 381(9860):47–54, 2013.
  • [83] Nicolaos B Karayiannis, Seshadri Srinivasan, Rishi Bhattacharya, Merrill S Wise, James D Frost, and Eli M Mizrahi. Extraction of motion strength and motor activity signals from video recordings of neonatal seizures. IEEE Transactions on medical imaging, 20(9):965–980, 2001.
  • [84] Dong Yi, Zhen Lei, and Stan Z Li. Age estimation by multi-scale convolutional network. In Asian conference on computer vision, pages 144–158. Springer, 2014.
  • [85] Ziyi Xu, Li Lu, and Pengfei Shi. A hybrid approach to gender classification from face images. In 2008 19th International Conference on Pattern Recognition, pages 1–4. IEEE, 2008.
  • [86] Rochelle Mendonca and Michelle Johnson. Quantifying therapist–patient roles using video analysis during occupation-based therapy. American Journal of Occupational Therapy, 72(4_Supplement_1):7211500013p1–7211500013p1, 2018.
  • [87] Zhe Cao, T Simon, SE Wei, and Y Sheikh. Openpose: real-time multi-person keypoint detection library for body, face, and hands estimation, 2018.
  • [88] Vladimir Joukov, Michelle Karg, and Dana Kulic. Online tracking of the lower body joint angles using imus for gait rehabilitation. In 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 2310–2313. IEEE, 2014.
  • [89] Mauricio Delbracio and Guillermo Sapiro. Burst deblurring: Removing camera shake through fourier burst accumulation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2385–2393, 2015.
  • [90] Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. Orb-slam: a versatile and accurate monocular slam system. IEEE transactions on robotics, 31(5):1147–1163, 2015.
  • [91] Alexander Mathis, Pranav Mamidanna, Kevin M Cury, Taiga Abe, Venkatesh N Murthy, Mackenzie Weygandt Mathis, and Matthias Bethge.

    Deeplabcut: markerless pose estimation of user-defined body parts with deep learning.

    Technical report, Nature Publishing Group, 2018.
  • [92] Tanmay Nath, Alexander Mathis, An Chi Chen, Amir Patel, Matthias Bethge, and Mackenzie W Mathis. Using deeplabcut for 3d markerless pose estimation across species and behaviors. Nature protocols, 2019.