pypermod
This python package provides various tools to predict energy expenditure and recovery dynamics of an athlete. The name pypermod stands for Python Performance Modeling.
view repo
Data Science advances in sports commonly involve "big data", i.e., large sport-related data sets. However, such big data sets are not always available, necessitating specialized models that apply to relatively few observations. One important area of sport-science research that features small data sets is the study of recovery from exercise. In this area, models are typically fitted to data collected from exhaustive exercise test protocols, which athletes can perform only a few times. Recent findings highlight that established recovery such as the so-called work-balance models are too simple to adequately fit observed trends in the data. Therefore, we investigated a hydraulic model that requires the same few data points as work-balance models to be applied, but promises to predict recovery dynamics more accurately. To compare the hydraulic model to established work-balance models, we retrospectively applied them to data compiled from published studies. In total, one hydraulic model and three work-balance models were compared on data extracted from five studies. The hydraulic model outperformed established work-balance models on all defined metrics, even those that penalize models featuring higher numbers of parameters. These results incentivize further investigation of the hydraulic model as a new alternative to established performance models of energy recovery.
READ FULL TEXT VIEW PDFThis python package provides various tools to predict energy expenditure and recovery dynamics of an athlete. The name pypermod stands for Python Performance Modeling.
Emerging technologies that enable the real-time monitoring of athletes in training and competition have fostered interest in methods to predict and optimize athlete performance. Predictive models for how much an athlete “has left in the tank” enable the investigation of pacing strategies (Behncke, 1997; Sundström et al., 2014; de Jong et al., 2017) and to dynamically adjust strategies to optimize the outcome of a competition (Hoogkamer et al., 2018). They can be described as a digital athlete, i.e., a computer-based model for enhancing training programming or strategy optimization. A foundation for such advances is research in performance modeling, which can be understood as the mathematical abstraction of exercise physiology.
One of the seminal models in the area of performance modeling is the critical power model, which relies on the notions of a *cp and a *w’ (Hill, 1993; Whipp et al., 1982). Monod and Scherrer (1965) defined *cp as: “the maximum work rate a muscle can keep up for a very long time without fatigue”. Thus, *cp can be considered as a threshold for sustainable exercise. *w’ represents a capacity for work to be performed at a rate above *cp and is conceptualized as an energy storage. Using the definitions of *tte and *p the critical power model can be summarized in the relationship
(1) |
To determine *cp and *w’, an athlete has to conduct between three and five exercise tests until exhaustion at various constant exercise intensities. *cp and *w’ are then fitted to these distinct *tte and *p observations and their relationship can be used to predict the time to exhaustion at other intensities. Hill (1993) emphasized that the attractiveness of the model lies in its coarse simplicity and it should not be employed if highly accurate predictions are required. Nevertheless, its straightforward application and its elegant abstraction have led to improved understanding and prediction of performance dynamics (Poole et al., 2016; Sreedhara et al., 2019; Vanhatalo et al., 2011).
While the critical power model predicts energy expenditure at high intensities, it does not consider the recovery of *w’ after exercise has ended or during exercise at low intensities. Formally, exercise protocols that alternate between intensities below *cp and above *cp constitutes as intermittent exercise. In order to predict performance capabilities of athletes during intermittent exercise, models need to predict recovery of *w’ during phases of exercise below the *cp intensity.
One of the most widely covered approaches to predict recovery of *w’ during intermittent exercise is the *w’bal model (Sreedhara et al., 2019; Jones and Vanhatalo, 2017). Since the first publication by Skiba et al. (2012), an updated form of *w’bal was introduced by Skiba et al. (2015) and another alternative form was proposed by Bartram et al. (2018). *w’bal models have been used to search for optimal drafting strategies in running (Hoogkamer et al., 2018) or to predict phases of perceived exhaustion during cycling exercise (Skiba et al., 2014).
Despite these advances, research into energy recovery modeling is an evolving field, and *w’bal models have been scrutinized for their limitations. Similar to energy expenditure dynamics, recovery dynamics are derived from exhaustive exercise tests and therefore data for model fitting and validation are sparse (Vanhatalo et al., 2011; Sreedhara et al., 2019). Recent findings suggest that current *w’bal models overly simplify energy recovery dynamics, and that model modifications that account for characteristics of prior exhaustive exercise (Caen et al., 2019) as well as bi-exponential recovery dynamics (Caen et al., 2021) might improve recovery predictions. These modified models feature additional parameters, which introduces challenges in fitting them to small data sets. Indeed, the search for models that optimally balance complexity with applicability to few data points is a primary challenge of energy recovery modeling.
Hydraulic models offer an alternative to *w’bal models for predicting energy expenditure and recovery dynamics during exercise. Instead of using *cp and *w’, they represent energy dynamics as liquid flow within a system of tanks and pipes. These tanks and pipes are arranged according to physiological parameters such as maximal oxygen uptake and estimated phosphocreatine levels of an athlete. The first hydraulic model was proposed by
Margaria (1976) to provide an intuitive conceptualization of bioenergetic responses to exercise. Morton (1986) further elaborated the model, formalized its dynamics with differential equations, and published it as the Margaria-Morton (M-M) model. Later, Sundström (2016) proposed an extension of the M-M model and named it the Margaria-Morton-Sundström (M-M-S) model. Compared to *w’bal models, hydraulic models can predict more complex energy expenditure and recovery dynamics and have the potential to address highlighted shortcomings of *w’bal models in recent literature.A challenge of these hydraulic models is that their parameters require in-depth knowledge about bioenergetic systems. Indeed, Morton (2006) concluded that it remained to be seen to what extent model predictions conform to reality. Also, the more recently proposed M-M-S model by Sundström (2016) has yet to be validated experimentally. Behncke (1997) applied the M-M model to world records in competitive running, and while the predictions agreed with values provided in the literature, he also pointed out situations in which the naive interpretation of the model would not be justified. Furthermore, Behncke (1997) stated that constraints dictated by physiological conditions made explicit computations with the M-M model “rather cumbersome”. Collectively, the requirement to set parameters according to physiological measures impede the application and validation of the M-M and M-M-S models.
To overcome the issues caused by ascribing the model parameters to concrete bioenergetic measures, we proposed a generalized form of the M-M hydraulic model in Weigend et al. (2021). Our generalized hydraulic model allows the fitting of its parameters using an optimization approach that only requires *cp and *w’ as inputs. In this way, our modified model preserves the flexibility needed to model the observed dynamics without requiring strict correspondence to parameters pertaining bioenergetics. In a proof-of-concept, we showed that our fitted hydraulic model could successfully predict both energy expenditure and recovery kinetics for one example case, the former in line with predictions of the *cp model and the latter in a manner that matches published observations. While the generalized hydraulic model demonstrated satisfactory predictivity, it is still unknown as to whether it can outperform the existing *w’bal models.
Therefore, in this work, we compare the prediction quality of our generalized hydraulic model from Weigend et al. (2021) to that of three *w’bal models. We hypothesized the hydraulic model would predict the observed recovery ratios compiled from past studies overall more accurately than the *w’bal models. We found that the hydraulic model outperformed the *w’bal models on objective goodness-of-fit and prediction metrics. We conclude that the generalized hydraulic model provides a beneficial new perspective on energy recovery modeling that should be investigated further.
The Materials and methods are structured in the following way. In Section 2.1, the *w’bal and hydraulic models are defined and their underlying assumptions specified. We then introduce a new *w’bal model that has been fitted to the same recovery data as the investigated hydraulic model. In Section 2.2, we discuss how we will objectively compare the *w’bal and hydraulic models. In particular, we propose a procedure to obtain comparable recovery ratios. The validation data set consists of data compiled from previously published studies on the recovery from exercise. Section 2.3 lists the data exclusion criteria and extraction procedures. Finally, in Section 2.4, we describe the metrics used to assess the model goodness-of-fits and prediction capabilities.
The *cp, *w’bal, and the hydraulic models feature assumptions and parameters that require defining.
The critical power model predicts energy expenditure and is the underlying model for the *w’bal models. The four essential assumptions of the critical power model are stated as follows (Hill, 1993; Morton, 2006):
An individual’s power output is a function of two energy sources: aerobic (using oxidative metabolism) and anaerobic (non-oxidative metabolism).
Aerobic energy is unlimited in capacity but its conversion rate into power output is limited (*cp).
Anaerobic energy is limited in capacity (*w’) but its conversion rate is unlimited.
Exhaustion occurs when all of *w’ is depleted.
These assumptions are reflected in Equation 1, in which time to exhaustion (TTE) is estimated from the available *w’ divided by work above *cp. At every time step during which an athlete exercises above *cp, the product of the time elapsed and the difference between *cp and the actual power output is subtracted from the energy capacity *w’. Thus, during a constant power output above *cp, the critical power model predicts a linear depletion of *w’. When *w’ is depleted, exhaustion is reached.
As observable in the example in Figure 1, subsequently established *w’bal models combine the assumed linear depletion of *w’ at power outputs above *cp with predictions for *w’ reconstitution during exercise below *cp. The initial *w’bal model by Skiba et al. (2012) was later updated in Skiba et al. (2015). Substantial differences between these versions exist, and as shown by Skiba and Clarke (2021), the original model by Skiba et al. (2012) contradicts the assumption of Equation 1 that *w’ linearly depletes. As such, we focused on the updated version by Skiba et al. (2015) and we refer to it henceforth as *w’bal-ode. We denote the remaining capacity of *w’ at a discrete time point during exercise as . refers to the power output at a discrete time step . is the difference between the discrete time step and in seconds. We define the overall *w’bal-ode model as
(2) |
During a constant power output above or at *cp (), decreases linearly as increases, like the critical power model predicts. During power outputs below *cp, increases exponentially with *w’ as its asymptote. affects recovery speed and varies between distinct *w’bal-ode models. At a discrete time step , the *tau-skib is estimated as
(3) |
where represents the difference between and *cp. Henceforth, we refer to Equation 2 with Equation 3 as *w’bal-skib. Figure 1 depicts an example for *w’bal-skib predictions. For the time steps of the first 3 minutes was below *cp and , i.e., available balance, remained at its maximum. Then, the power output increased above *cp for the next three minutes. The available *w’ balance decreased by per second. Between 6 and 9 minutes dropped below *cp again, *w’bal-skib simulated recovery, and rose exponentially with *w’ as its asymptote. Speed of recovery was affected by from Equation 3, which took the difference between and *cp into account. That is observable by comparing recovery between 6 and 9 minutes to recovery between 12 and 21 minutes. During the second recovery bout was lower and therefore the slope of the exponential recovery was steeper. During the last three minutes was equal to *cp and thus did not change. If would reach 0, exhaustion would be predicted. In the example in Figure 1 the athlete was predicted to be close to exhaustion, but some of their energy capacities remained.
Bartram et al. (2018) investigated the recovery rate of *w’ of elite cyclists and observed faster recovery rates than Skiba et al. (2015). Therefore, they proposed another for Equation 2 to predict quicker recovery ratios. The *tau-bart was defined as
(4) |
Henceforth, we will refer to Equation 2 with Equation 4 as *w’bal-bart. Predictions of *w’bal-bart are depicted alongside those of *w’bal-skib in the example in Figure 1. It is observable that *w’bal-bart predicted faster recovery dynamics.
In Weigend et al. (2021) we expressed our generalized hydraulic tank model mathematically as a system of discretized differential equations with 8 parameters (A.3 and A.4 in their Appendix). Henceforth, the model will be referred to as the *hyd-weig, a schematic of which is depicted in Figure 2. *hyd-weig models power output of an athlete as a function of three interacting energy sources, which are represented as liquid-containing tanks. As depicted in Figure 2, these tanks are named the aerobic energy source (), anaerobic fast energy source () and anaerobic slow energy source (). is assumed to have infinite volume, which is indicated by the fading color to the left. A pipe connects to the middle tank and has the maximal flow capacity . The pipe from the right tank into the middle tank allows flow in both directions and the maximal flow capacity from into and from into . A tap () is attached to the bottom of the middle tank and liquid flow from this tap represents energy demand. The fill levels of tanks and flows through pipes change as liquid flows from the tap. If the athlete expends energy, the liquid level of drops and initiates flow from and into . When the middle tank is empty, liquid flow out of no longer matches the demand and exhaustion is assumed.
In the depicted situation in Figure 2 was opened and liquid flowed out of . As a result, the liquid level in dropped and thus liquid started to flow from into . The more the fill level of the middle tank dropped, the less liquid pressured against the pipe exit from , and the more the flow from increased. In Figure 2 the flow out of was so large, that the fill level of dropped below the top of () and liquid from started to flow into too. The liquid volume in is limited such that it can only contribute to flow out of for a limited time. If the simulated athlete stopped exercise, then their power output would decrease to 0 and the tap would close. If the tap was closed in the depicted situation in Figure 2, liquid from the outer tanks and would refill the middle tank until its fill level rises above the fill level of . Then, liquid from would continue to refill and liquid from would flow into . The model would mimic recovery and would eventually refill both other tanks.
Morton (1986) mathematically expressed liquid flows within this system as first- and second-order ordinary differential equations and in Weigend et al. (2021) we extended these equations so that they apply to all possible configurations of their generalized model. Due to liquid pressure and flow dynamics, varying fill levels of the middle and the right tank affect how flow from the tap is estimated. With these interactions between three tanks, *hyd-weig is capable of predicting energy expenditure and recovery as a more complex function than the above introduced critical power and *w’bal-ode models. It features eight adjustable parameters. Depicted in Figure 2, the parameters , , represent tank positions, , tank capacities, and , , maximal flow capacities. A configuration for *hyd-weig entails the positions, sizes and capacities of each tank and is therefore defined as
(5) |
Fitting the *hyd-weig to an athlete means finding the configuration that enables the model output to best reproduce the observed exercise responses of an athlete. In Weigend et al. (2021)
we introduced an evolutionary computation workflow to derive such configurations. We fitted a configuration to *cp and *w’ measures of an athlete as well as recovery ratios derived from a publication by
Caen et al. (2019). The same recovery ratios are used for every fitting and thus, to fit *hyd-weig to an athlete, only *cp and *w’ of the athlete are required. These are the same measures that are needed to apply *w’bal-ode models and, hence, the required input measures are the same for all compared *hyd-weig and *w’bal-ode models. With specific reference to our fitting method in Weigend et al. (2021), 10 evolutionary fittings to given *cp and *w’ were estimated and the best fitting has been used.The above introduced models *w’bal-skib, *w’bal-bart, and *hyd-weig were each created by fitting to different sets of recovery observations. Objective metrics to compare model quality, e.g. the Akaike Information Criterion (Burnham and Anderson, 2004), require models to be fitted to the same data. Therefore, to allow a more comprehensive comparison, we added a third *w’bal-ode model with a new *tau-weig. We derived this *tau-weig with a procedure as close as possible to the ones of Skiba et al. (2012) and Bartram et al. (2018).
As the first step, a constant value for in Equation 2 was fitted to each recovery ratio and recovery time combination from Table 1 of the Appendix of Weigend et al. (2021). For these observations power output was constant for every discrete time step during recovery and thus could be considered as constant with the same value for all . We used the standard Broyden-Fletcher-Goldfarb-Shanno algorithm implementation of SciPy (SciPy 1.0 Contributors et al., 2020) with 200 as the initial guess to fit a constant that enabled Equation 2 to best reproduce the observed recovery ratio. This resulted in twelve pairs of fitted constant s to constant recovery intensities.
As the next step, we then fitted an exponential function to these twelve pairs using the non-linear least squares implementation of SciPy (SciPy 1.0 Contributors et al., 2020). With the recovery intensity as , the function was of the form . With the values of Skiba et al. (2012) as the initial guess (546, -0.01, 316), the resulting optimal constants were and . Thus, given any at a discrete time step , can be estimated as
(6) |
Unfortunately, this fitted equation failed to satisfactorily fit the data () but we nevertheless used it because it was developed using a procedure that closely resembled those used to estimate *tau-skib and tau-bart. The introduction of *tau-weig is valuable because it allows the application of the Akaike Information Criterion metric, which requires compared models to be fitted to the same data points. Henceforth, Equation 2 with Equation 6 will be referred to as *w’bal-weig.
We compare the abilities of above defined *w’bal-ode and *hyd-weig models to predict “recovery ratios”. Recovery ratios are computed from exercise protocols involving two exhaustive work bouts (WB1 and WB2) interspersed with a recovery bout (RB). A schematic of the protocol is depicted in Figure 3. First, the model simulates exercise at a fixed work intensity () above *cp until exhaustion. Immediately after exhaustion is reached, exercise intensity switches to a lower recovery intensity () below *cp. After a set time () at that recovery intensity, a second work bout (WB2) until exhaustion at is simulated. The time to exhaustion of WB2 is expected to be shorter than the one of WB1 due to the limited recovery time in between the work bouts. Because it is assumed that *w’ is completely depleted at the end of WB1, the ratio of the second time to exhaustion to the first represents the amount of *w’ recovered. Thus, the time to exhaustion of WB2 divided by the time to exhaustion of WB1 multiplied by 100 results in a recovery ratio in percent (%).
This outlined procedure aligns with the assumptions of *cp or *w’bal-ode models and enables the direct comparison of the simulated recovery ratios from each model *hyd-weig, *w’bal-skib, *w’bal-bart, *w’bal-weig, and with the published data. As an example, the recovery ratio curves in Figure 5 were obtained by estimating the WB1 RB WB2 protocol for every recovery duration () in seconds between 0 and 900 seconds. Published observations by Caen et al. (2021) were added to the plot.
We extracted data from previous studies that investigated energy recovery dynamics and used to it compare and evaluate recovery ratio predictions of all models. The studies for comparison were identified from Table 1 of the comprehensive review by Chorley and Lamb (2020). From these studies, we retained those that featured appropriate data, except those that met the following exclusion criteria:
Featured a mode of exercise other than cycling. Cycle ergometers measure power output directly. Power outputs during modes of exercise like running or swimming are not directly comparable because they are estimated using different methods or are approximated, e.g, (Morton and Billat, 2004) focused only on speed instead of power.
The observations were made under extreme conditions, e.g., hypoxia or altitude.
Insufficient information was reported to simulate the prescribed protocol in and/or to infer a recovery ratio of *w’ in percent, e.g., the integral version of the *w’bal model by Skiba et al. (2012) assumes recovery during high-intensity exercise such that recovery ratios cannot be straightforwardly inferred.
The prescribed protocol leaves doubt if reported recovery ratios are comparable to the “recovery estimation protocol” described earlier, e.g., repeated ramp tests until exhaustion, 50% *w’ depletion followed by a 3-min all-out test, or knee-extension maximal voluntary contraction (MVC) test during recovery.
Five studies were included for comparison, four of which were obtained from the Chorley and Lamb review: (Bartram et al., 2018), (Chidnok et al., 2012), (Ferguson et al., 2010), and (Caen et al., 2019). After the summary of Chorley and Lamb (2020) was published, Caen et al. (2021) published a study that investigated the *w’ reconstitution dynamics in even more detail and which was thus added to the list.
The data in the listed studies were presented in diverse ways, such that modifications were made to some of the data to enable model comparison. The study by Caen et al. (2019) did not report distinct mean values for every investigated condition, such that we derived approximate values in Weigend et al. (2021) to fit our *hyd-weig to their conditions. Hence, the data for comparison are the values from Weigend et al. (2021). Further, the study by Bartram et al. (2018) fitted their own *w’bal-bart model, where is defined according to Equation 4. Therefore, we used *w’bal-bart model predictions for prescribed intensities of Bartram et al. (2018) as the observations against which the other models were compared.
The study by Chidnok et al. (2012) reported times to exhaustion from their intermittent exercise protocol instead of recovery ratios. Power output during recovery was constant in their tests. Therefore, in order to derive recovery ratio estimations that are comparable with the WB1 RB WB2 procedure defined above, we fitted a constant value for of the *w’bal-ode model to each of their prescribed protocols and times to exhaustion. These constant values for were fitted with the Brent method implementation by SciPy (SciPy 1.0 Contributors et al., 2020) to find a local minimum in the interval between [100, 1000]. We then used WB1 RB WB2 recovery ratio estimations of *w’bal-ode models with fitted constant as the observations with which to compare *w’bal-skib, *w’bal-bart, *w’bal-weig, and *hyd-weig.
The metrics of goodness of fit used to compare the models were *RMSE, *MAE, and *AIC. Chai and Draxler (2014) discussed *RMSE and *MAE as widely adopted metrics for assessing model prediction capabilities. We compared predictive accuracy by comparing *RMSE and *MAE on data to which competing models were not fitted. Lower values for *RMSE and *MAE were interpreted as more accurate predictions.
To statistically compare prediction error distributions between models, we used a bootstrap hypothesis test (Efron and Tibshirani, 1993; Good, 2000)
. We did so because only small data sets were available and we could not assume normal distributed prediction errors with equal variances for every compared model. The null hypothesis of our bootstrap test was that prediction error distributions of two compared models are the same. Because we used two prediction error metrics (*RMSE and *MAE) we investigated the null hypothesis on both. We used the absolute difference between RMSE and also between MAE of compared groups as our test statistics. With the null hypothesis that error distributions are the same, we could bootstrap new samples by randomly selecting with replacement from all pooled observations. We created a distribution of test statistics from 1 000 000 bootstrap samples to reliably approximate the p-value of our observed test statistic at high precision. We rejected the null hypothesis if the p-value
.We also compared models with the *AIC, which was first proposed by Sugiura (1978). The *AIC is a model selection tool to investigate the balance between model complexity and explanatory capability (Burnham and Anderson, 2004). *AIC penalizes the number of parameters of the model and thus provides insight into the balance between model complexity and goodness of fit. The lower the *AIC score, the better this balance is met. The AIC was calculated as
(7) |
where MSE is the Mean Squared Error, is the number of data points and is the number of parameters of the model. Models have to be fitted to and applied to the same data in order to obtain comparable *AIC scores. Therefore, only *w’bal-weig and *hyd-weig were comparable with this criterion in this work.
Altogether, the hypothesis that the more complex *hyd-weig model fits the data better than the established *w’bal-ode models will be supported if the overall *RMSE, *MAE and *AIC scores are lower for *hyd-weig than for other models, and if prediction error distributions are significantly different to those of other models.
In the following section, we present the extracted data and the prediction results of *w’bal-ode and *hyd-weig models for each listed previous study. We refer to extracted data from studies by the last name of the first author, e.g., the extracted data from Bartram et al. (2018)
is referred to as “Bartram data set”. All studies collected their data through performance tests that required athletes to exercise until volitional exhaustion. Such tests are affected by circumstances that are hard to measure and control, e.g, motivation, nutrition, and state-of-mind. Therefore, recovery ratio observations are noisy and the extracted group averages were accompanied by large standard deviations. These uncertainties prevented us from drawing conclusions about model quality on averages of individual data sets and instead necessitated to perform the comparison between models across all available data. We begin by presenting the extracted data and model predictions of individual data sets throughout
Sections 3.5, 3.4, 3.3, 3.2 and 3.1 followed by summarizing all prediction errors and resulting *RMSE, *MAE, and *AIC scores in the final Section 3.6.The protocol prescribed by Bartram et al. (2018) consisted of three work bouts interspersed with two 60-second recovery bouts. The first two work bouts each lasted for 30 seconds, and the final one until volitional exhaustion. Work bout exercise intensity () was set to , i.e., the intensity that was predicted to lead to exhaustion after 100 seconds. The recovery bout intensity () was set to differences to *cp () of 200, i.e., *cp - 200 watts, or 150, 100, 50, or 0. The group averaged CP and W’ for the four world-class cyclists featured in Bartram et al. (2018) were 393 watts and 23,300 joules. Altogether, these input values resulted in an estimated exhaustive intensity of 626 watts and recovery intensities 0 of 393 watts, 50 of 343 watts, 100 of 293 watts, 150 of 243 watts, and 200 of 193 watts, respectively.
The resulting recovery predictions of *w’bal-ode and *hyd-weig models are summarized in Figure 4 and Table 1. The *w’bal-bart model was not compared because it was the model that Bartram et al. (2018) fitted to their observations and we used it to create the observations against which the other models were compared. The fitted *hyd-weig configuration to *cp and *w’ by Bartram et al. (2018) was: 23111.91, 65845.28, 391.57, 148.88, 24.15, 0.73, 0.01, 0.24.
The protocol by Caen et al. (2021) investigated the recovery dynamics following exhaustive exercise at (published average of 349 watts). They prescribed a recovery intensity of 161 watts on average, which was determined by selecting of the power at gas exchange threshold (Binder et al., 2008) of their participants. The average *cp of their participants was 269 watts, and the average *w’ 19 200 joules. The reported observed recovery ratios were after 30 seconds, after 60 seconds, after 120 seconds, after 180 seconds, after 240 seconds, after 300 seconds, after 600 seconds, and after 900 seconds.
The simulation parameters and results of the defined recovery estimation protocol are summarized in Figure 5 and Table 2. Fitting *hyd-weig to *cp and *w’ group averages resulted in the configuration 17631.06, 46246.13, 267.28, 117.50, 20.09, 0.68, 0.01, 0.29. The recovery ratios predicted by the *hyd-weig model better matched the observed values compared to all the other models. Nevertheless, some lack of fit for the hyd-weig model was observed: the model overpredicted the recovery ratios at early time points and underpredicted those at longer time points except for the last one. *w’bal-skib, *w’bal-bart, and *w’bal-weig model predictions consistently overestimated recovery for longer recovery times.
Chidnok et al. (2012) prescribed a protocol that alternated between 60-second work bouts and 30-second recovery bouts until the athlete reached exhaustion. With their protocol, the work-bout intensity was set to . The protocol prescribed four trials each with a different recovery intensity (20 watts as the “low” recovery intensity, 95 watts as “medium”, 173 watts as “high”, and 270 watts as the “severe” recovery intensity). The participants had an average *cp of 241 watts and *w’ of 21 100 joules. Their recorded times to exhaustion were seconds with “low” recovery intensity, seconds with “medium”, seconds with “high”, and seconds with “severe”.
As described in Section 2.3, in order to compare observations of Chidnok et al. (2012) to WB1 RB WB2 protocol estimations, a constant value for for the *w’bal-ode model was fitted to the protocol by Chidnok et al. (2012) for each of their recovery conditions. The resulting values were 165.19 seconds for the “low” recovery intensity protocol, a of 124.81 seconds for “medium”, and a of 107.45 seconds for “high”. The “severe” recovery intensity was left out because 270 watts lies above the average *cp of 241 watts. In this case, no recovery should occur if the assumptions of *w’bal-ode model hold true. Chidnok et al. (2012) prescribed recovery bouts of 30 seconds and WB1 RB WB2 protocol estimations with corresponding fitted s and with a of 30 seconds were 24.6% at the “low” intensity, 21.7% at “medium”, and 16.7% at the “high” recovery intensity.
The fitted *hyd-weig configuration to *cp and *w’ by Chidnok et al. (2012) was 18919.76, 48051.77, 239.55, 115.05, 19.48, 0.68, 0.05, 0.31. Predictions of all models and extracted conditions for the recovery estimation protocol are summarized in Figure 6 and Table 3. In the case of the “low” recovery intensity predictions of the *w’bal-skib and *w’bal-weig models were the most accurate. In the case of the “medium” recovery intensity the *w’bal-skib model was the most accurate, and in the remaining “high” condition *hyd-weig and *w’bal-bart model predictions were closest to the data. None of the models made predictions that were close to all three of the observations.
Ferguson et al. (2010) prescribed a protocol with an initial time to exhaustion bout at the intensity that was predicted to lead to exhaustion after 360 seconds () followed by a recovery at 20 watts for 2 minutes, 6 minutes, or 15 minutes. After recovery, exercise intensity was then increased back to one of three possible high-intensity work rates. Thus, each participant performed nine tests in total with three different constant work rates after three different recovery times. The *cp model was fitted to these three times to exhaustion after each recovery period to determine changes in *cp and *w’. Ferguson et al. (2010) published their group averages for *cp as 212 watts, *w’ as 21 600 joules, the as 269 watts, and the observed recovery ratios after 2 minutes as (), 6 minutes (), and 15 minutes ().
Extracted parameters for the recovery intensity protocol and model prediction results are summarized in Figure 7 and Table 3 together with reported means by Ferguson et al. (2010). The fitted *hyd-weig configuration to *cp and *w’ group averages by Ferguson et al. (2010) was 18730.05, 81030.54, 211.56, 94.31, 18.76, 0.63, 0.21, 0.34. In this setup *w’bal-weig was overall closest to published observations. *hyd-weig overestimated the recovery after 120 and after 900 seconds. *w’bal-skib and *w’bal-bart overestimated recovery in every instance.
We derived the values from Table 1 in the Appendix of our Weigend et al. (2021) publication from Caen et al. (2019). Reported measures recreate the depicted means in Figure 3 of the publication by Caen et al. (2019). They consisted of three recovery ratios for four conditions each: Preceding exhausting exercise at or followed by recovery at 33% of *cp or 66% of *cp. The participants of Caen et al. (2019) had an average *cp of 248 watts and *w’ of 18 200 joules, which results in a of 285 watts, a of 323 watts, 33% of *cp as 81 watts, and 66% of *cp as 163 watts.
Extracted parameters for the recovery ratio estimation protocol and model predictions are summarized in Figure 8 and Table 5. The best fit *hyd-weig configuration to *cp and *w’ group averages was 18042.06, 46718.18, 247.4, 106.77, 16.96, 0.72, 0.02, 0.25. As described earlier, the recovery ratio values by Weigend et al. (2021) were used to fit the *tau-weig for the *w’bal-weig model and they are used in the evolutionary fitting process for *hyd-weig to fit recovery dynamics. Therefore, both *w’bal-weig and *hyd-weig were not scrutinized for predictive accuracy on this data set. Their predicted recovery ratios were recorded for the *AIC goodness of fit estimation metric in covered in the next subsection. Out of the remaining two models predictions of *w’bal-skib were closer to the observations but both overpredict in nearly all instances.
Table 6 summarizes the prediction errors of the competing models and resulting metric scores on our investigated data sets. *RMSE and *MAE were defined as the metrics to assess predictive accuracy. Their *MAE scores were 24.87 with a standard deviation of absolute errors (SD) of 14.35 for *w’bal-bart and 7.11(SD=6.83) for *hyd-weig ( for the difference in *MAEs, bootstrap hypothesis test). The *RMSE scores on Caen, Chidnok, and Ferguson data sets were 28.46 for *w’bal-bart and 9.69 for *hyd-weig. Also the bootstrap hypothesis test with the absolute difference in *RMSEs as its test statistic resulted in .
Both remaining models *w’bal-skib and *w’bal-weig could be compared to *hyd-weig on the Bartram, Caen, Chidnok, and Ferguson data sets. The *hyd-weig featured the lowest *MAE with 7.07, the lowest SD with 7.17, and lowest *RMSE with 9.94. *w’bal-skib predictions were significantly different to *hyd-weig ( with the *MAE test statistic and with *RMSE). *w’bal-weig predictions were significantly different to *hyd-weig ( with the *MAE test statistic and with *RMSE).
*AIC was chosen as the metric to assess which model provides the best trade-off between predictive capabilities and complexity. Models must be fitted to and tested on the same data for *AIC scores to be comparable. Hence, as reflected in the last two columns of Table 6, *w’bal-weig and *hyd-weig could be compared on combined data points of all covered data sets. With a of 3 for *w’bal-weig and a of 8 for *hyd-weig the resulting scores were 151.85 for *hyd-weig and 181.03 for *w’bal-weig. The *hyd-weig achieved the lower *AIC score.
In this study, we compared the prediction capabilities and goodness of fit of *hyd-weig to that of *w’bal models. We hypothesized that the hydraulic model would more accuratly predict observed recovery ratios observed in past studies. Models were compared on extracted data from five studies and the *hyd-weig model outperformed the *w’bal-skib, *w’bal-bart, and *w’bal-weig models with respect to objective *RMSE, *MAE, and *AIC metrics. Our findings therefore support the hypothesis. We discuss below our results in more detail and interpret them in context of findings of previous literature. We present arguments for why the *hyd-weig outperformed the *w’bal-ode models and we propose limitations and future work. Finally, we end this section with statements about significance and implications of our results.
We observed that the standard deviations of absolute prediction errors in Section 3.6 as well as the overall *MAE and *RMSE were considerably lower for the *hyd-weig than for the *w’bal-ode models. But when averaging the prediction errors on isolated data sets listed in Table 6, *hyd-weig only made more accurate predictions than its competitors on the Bartram and Caen data sets. For the Bartram data set, the *MAE of *hyd-weig was 6.96, compared to 18.74 for *w’bal-skib, and 26.82 for *w’bal-weig respectively. For the Caen data set, the *MAE of *hyd-weig was the lowest with 3.52. On the remaining Chidnok data set it was *w’bal-skib that achieved the lowest *MAE with 9.4 and on the Ferguson data set it was *w’bal-weig with 6.7.
As pointed out by Skiba and Clarke (2021) and Sreedhara et al. (2019), *w’bal-ode models are meant to be applied to any athlete on a wide range of possible conditions. A lower *MAE score for on the Ferguson data set means predicted recovery ratios more closely for the particular group (six recreational active men) under the particular test conditions that Ferguson tested. However, to determine the usefulness of a model for predicting performance for high-intensity intermittent exercise in a more general sense, models have to be evaluated on a multitude of scenarios. After combining all data sets, achieved the overall lowest *MAE score, which means that could predict recovery ratios overall more accurately for a range of groups and settings.
The less consistent prediction quality across data sets of the *w’bal-ode models agrees with findings by Caen et al. (2019), who proposed that the predictive capabilities of w’bal models may improve with modifications that account for intensity and duration of prior exhaustive exercise. As an example, out of all compared studies in this work, Bartram et al. (2018) prescribed the highest work bout intensity for their experimental setup ( = ). Considering the suggestion by Caen et al. (2019) that a shorter time to exhaustion at a high intensity allows a quicker recovery, it seems reasonable that the *w’bal-bart model estimated the fastest recovery kinetics out of all recovery models.
Conversely, the Caen et al. (2019) study prescribed the lowest work bout intensity out of all compared studies ( = ). Their observed recovery ratios are summarized in the Weigend data set and were slower than the *w’bal-bart predictions. This observation again matches the assumption that a longer exhaustive exercise at a lower intensity requires a longer recovery.
Despite the differences in observed recovery rates, the *w’bal-ode models allow for only a single recovery rate no matter the nature of the prior exercise. To illustrate this point, we conducted simulations to depict the influence of prior exercise intensity on the recovery ratios predicted by the *w’bal-ode and *hyd-weig. Figure 9 depicts four simulations. All simulations shared the same test setup except for differing intensities. The simulation on the left had = as prescribed by Bartram et al. (2018), the simulation on the right featured the lowest = as found in the Weigend data set. From left to right, of the simulations decreased step wise. Bartram et al. (2018) investigated recovery after 60 seconds and therefore the *w’bal-bart prediction after 60 seconds is marked as the observation on the left. Recovery ratios with = and = of of the Weigend data set are marked as observations on the right. The recovery ratios predicted by the *w’bal-ode models were the same for each and their predictions were therefore unable to fit all observations equally well. In contrast, the hydraulic model could account for such characteristics.
This result occurred because of the interactions between the three tanks that *hyd-weig uses to model energy recovery. For example, during high-intensity exercise, the liquid level in would rapidly decrease and the contribution of would be less than during lower-intensity exercise when the liquid level in would decrease more slowly. Differences in fill states of affected recovery estimations and enabled *hyd-weig to predict rapid recovery after high-intensity exercise and a slower recovery after exercise at a lower intensity. We suggest that standard deviations of *MAE as well as overall *MAE and *RMSE scores of *hyd-weig model were smaller than those of *w’bal-ode models because the hydraulic model could account for characteristics of prior exhaustive exercise.
Further, we propose that *hyd-weig has also achieved overall better metric scores because it better captured the bi-exponential nature of energy recovery, as opposed to the mono-exponential *w’bal-ode models. Indeed, the observed recovery ratios of the Caen data set increased rapidly from 0 seconds to 120 seconds and then continued to rise more slowly at longer durations (Figure 5). Caen et al. (2021) showed that their observations were well explained with a bi-exponential model that implements a steeper slope during the beginning of recovery. Also, the first *w’bal paper by Skiba et al. (2012) proposed an alternative bi-exponential version of their *w’bal model with two s but such bi-exponential *w’bal models have yet to be applied in practice.
The observed improved prediction quality of *hyd-weig comes at the cost of a time-demanding fitting process. As outlined in Section 2.1.2, our fitting process from Weigend et al. (2021) requires *cp and *w’ of an athlete as inputs and then obtains fitted *hyd-weig configurations using an evolutionary computation approach. Different *cp and *w’ values require a new *hyd-weig configuration to be fitted to them. Additionally, fittings for our comparison took computation times of 3 hours or more on 7 cores of an CPU E5-2650 v4 @ 2.20GHz each. On the other hand, obtaining for our *w’bal-ode models was solved in milliseconds and they can be applied to any *cp and *w’ combination without fitting anew. Therefore, the application of *hyd-weig is a time consuming task in comparison to the application of *w’bal-ode models. In order to improve the feasibility of application, future work must optimize the hydraulic model fitting process to minimize this limitation.
Further, prediction results show that estimated recovery ratios of the used WB1 RB WB2 protocol are affected by rounding errors that arose from the recovery ratios being estimated from simulations in discrete time steps. For example, in Table 5, it was observed that the *w’bal-ode model predictions between the and trials varied slightly in a few cases, even though these the recovery kinetics should have been the same. The variations were caused by simulation time steps of size 0.1 that made rounding effects play a bigger role in shorter work bouts of . Smaller step sizes would decrease the error, but to more fully prevent inaccuracies, future work ought to formalize model simulations in differential equations that don’t require estimations in discrete time steps.
Predicted recovery ratios of the hyd-weig at recovery intensities close to *cp require further investigation too. The comparison on the 0 case of the Bartram data set (See Figure 4) revealed that *hyd-weig predicts a slight recovery during exercise at *cp. No liquid from flowed back into the system but the ongoing flow from to still caused liquid level in to rise during the recovery bout. Recovery while exercising at *cp intensity is a controversial assumption that is made to an even stronger extent by the original *w’bal model of Skiba et al. (2012). Such dynamics have to be taken into consideration when the models are used for predictions and are important directions for future investigation.
Additionally, extracted recovery ratio observations from previous studies come with associated uncertainties that could not be considered in our *MAE, *RMSE and *AIC scores. As an example, the standard deviations of observed recovery ratios by Caen et al. (2021) depicted in Figure 5 are greater than reported standard deviations by Ferguson et al. (2010) depicted in Figure 7. One could argue that comparisons to reported means by Ferguson et al. (2010) therefore provide a better indication of prediction quality. Unfortunately, we could not incorporate these standard deviations into goodness-of-fit metrics because of how different recovery ratios were reported in compared studies. Caen et al. (2021) and Ferguson et al. (2010) reported averaged observed recovery ratios with standard deviations, for the Bartram data set we had to use *w’bal-bart predictions as observations to compare to, in Weigend et al. (2021) we derived our values from Caen et al. (2019) without standard deviations, and for the Chidnok data set we fitted constant values for for *w’bal-ode models to their reported times to exhaustion to obtain comparable recovery ratios in percent. We believe these various formats of observations highlight the need for more and more comparable studies on energy recovery dynamics.
Larger data sets are vital for more educated investigations of recovery models and their improvement in future work. We see the combination of data sets in this work as a step towards this direction. In order to improve and compare models more holistically, it is important that more comparable studies are conducted in the future and combined into a larger test bed for performance models.
To the best of our knowledge, performance models on energy recovery during intermittent exercise have yet to be compared in such detail. Our comparison on data from five studies allowed a more holistic view on recovery dynamics and confirmed limitations of *w’bal-ode models that were suggested by previous literature. Our results imply that more complex models like *hyd-weig can improve energy recovery predictions. We propose that further efforts to merge and compare data are significant steps to bring the research area of energy recovery modeling forward. We further propose that the predictive capabilities of hydraulic models look strong and that hydraulic models have to be considered as a possible future direction to advance energy recovery modeling.
We conclude that the *hyd-weig outperformed *w’bal-ode models when fit to multiple independent data sets featuring intermittent high-intensity exercise. The predictive accuracy and goodness of fit was better for the hydraulic model, even for the *AIC metric, which includes a penalty for the number of model parameters. The hydraulic model is thus likely more generalizable than *w’bal-ode models, which are typically applied within narrow contexts. Future research should focus on improving the feasibility of the hydraulic model, because it is computationally more burdensome to use than *w’bal-ode models. Pending such improvements, we foresee athletes adopting this model to optimize pacing and interval training workouts. To contribute towards further advancements we publish all material, extracted data, and simulation scripts here: https://github.com/faweigend/pypermod.
No funding was received to assist with the preparation of this manuscript.
The authors have no conflicts of interest to declare that are relevant to the content of this article.
All code is thoroughly documented and available as source code and as a python package here: https://github.com/faweigend/pypermod.
Monographs on statistics and applied probability
, Chapman & Hall, New York. External Links: ISBN 978-0-412-04231-7 Cited by: §2.4.