1 Introduction
In the majority of research studies, the focus lies on identifying average effects in a population of individuals, such as in large cohort studies or randomized controlled trials (RCTs). However, especially if there are heterogeneous individual effects, it can be of great interest to investigate associations on an individual level. Estimating and testing these individual effects pose challenges. One approach is to employ statistical or machine learning models to estimate individual effects from populationlevel studies, and different methods have been proposed in recent years, such as causal inference methods bica2021real; alaa2017bayesian; shalit2017estimating; Lee2018. As another approach, a new study can be designed with the specific aim of investigating individuallevel effects. For this, the study design of the socalled Nof1 trials has been established as the gold standard nikles_essential_2015. In Nof1 trials, the effect of one or more interventions on individual persons is investigated by measuring the outcome of interest over time across alternating phases in which the interventions are applied. Nof1 trials are, therefore, a specific form of singleperson crossover RCTs lillie2011n; mirza2017history. As a third approach, which we propose in this study, populationlevel data can be reanalyzed through the lens of Nof1 trials. For illustration, we focus on an application to gait.
Gait can be quantified using spatiotemporal parameters, such as stride length, stride time, speed, or cadence. These gait parameters provide crucial insights into a person’s health status. For example, gait speed has been associated with life expectancy stanaway2011fast, and has been termed ”the sixth vital sign” middleton2015. Similarly, greater variability from step to step has been identified as a major intrinsic risk factor for falls in older adults HAUSDORFF20011050. Although gait is typically regarded as an isolated and highly automatic task, evidence suggests that gait patterns differ when concurrently performing a secondary task (e.g., cognitive or motor interference task). Such dualtask situations, which closely mimic daily life walking hillel2019every; BAYOT2018, have been associated with slower gait speeds and increased stride times Ebersbach1995; Nohelova2021; SMITH2016; MonteroOdasso2012. Consequently, studying gait under these conditions may provide clinically relevant insights into gait modulations in daily life. A more comprehensive understanding of gait in reallife walking should also consider the aspect of physical fatigue because it knowingly affects gait kinematics and kinetics, which in turn is linked to a higher risk of slipinduced falls in healthy adults parijat2008effects. Similarly, for older healthy adults, lower limb fatigue affects spatial and temporal characteristics of gait santos2019effects, and leads to impaired movement control after overcoming an obstacle while walking hatton2013effect. Existing studies investigating the effects of muscle fatigue on gait performance reported heterogeneous outcomes. For example, for young healthy adults, Granacher et al. granacher2010effects observed significant decreases in gait speed and stride length, while Parijat et al. parijat2008effects reported no significant changes in gait speed. In older healthy adults, muscle fatigue only resulted in rather moderate changes in gait parameters santos2019effects. Regarding stride length, some studies reported an increase granacher2010effects; Barbieri2014; morrison2016walking, while others reported no changes hatton2013effect; TOEBES2014; Helbostad2007.
One possible explanation for the aforementioned discrepancy in effects of fatigue could be that the grouplevel analysis typically performed in gait studies did not capture the heterogeneous gait changes among individuals. It is known that gait characteristics are highly individualized and persist for a long period. As reported by Horst et al., classification accuracy for identifying 128 healthy individuals using their gait patterns sustained over 99% for one year HORST2017. Moreover, there is evidence that gait changes in response to interventions are also individualized. For example, systematic gait training to modify footstrike patterns from rearfoot strike to midfoot strike for runners resulted in heterogeneous changes of the footstrike angles chan2020effects, and the treatment response of gait patterns for neurological diseases such as Parkinson’s disease are also individualized marxreiter2018sensor; nonnekes2018towards.
The highly individualized nature of gait and gait modifications suggests that individuallevel analyses could provide insights that are not evident from populationlevel analyses. However, to the best of our knowledge, only one series of Nof1 trials has been conducted on gait, in which Maguire et al. maguire2020replacing compared the effect of different walking aids on gait and balance for chronic stroke patients and revealed different responses to the new walking aid across the participants.
Here, we investigate how existing data from populationlevel studies can be reanalyzed through the lens of Nof1 trial to estimate individuallevel effects. To this aim, we use data from a populationbased study in a repeatedmeasures design that investigated the effects of physical fatigue and cognitive task performance on gait. We estimate personalized gait parameters from Bayesian linear mixed models, and compare the results with standard populationlevel ANOVA models. Finally, we discuss reanalyzing population studies through the lens of Nof1 trials more generally and highlight important considerations and requirements.
2 Materials and methods
2.1 Overview of the gait study
Sixteen young healthy participants (eight men, eight women) were enrolled in the study. Eligibility for the study was determined using the Physical Activity Readiness Questionnaire (PARQ) and only participants without medical restrictions for performing physical activities (i.e. with all negative responses) were allowed to take part in the study. At the first visit (see below), personal characteristics were assessed, including the daily activity levels of the participants using the International Physical Activity Questionnaire (IPAQ).
Figure 1 shows an overview of the study design. The study consisted of two visits, referred to as visits A and B in the following, which were seven days apart. The order of A and B was randomized, and during each of the two visits, the participants performed two walking assessments using IMU sensors, separated by a fatigue protocol. During visit A, the participant first completed a 6minute walking assessment in a corridor with a distance of 35 meters in one direction. Then, the participant performed a repeated sittostand allout fatigue protocol to induce muscular fatigue in the lower limbs. Immediately following the fatigue protocol, participants repeated another 6minute walking assessment. During visit B, the experimental procedure was the same as in visit A, except that a numbercounting dualtask condition was added in both walking sessions, meaning that participants had to count numbers while walking. Details of the fatigue protocol and the cognitive task are described in Supplementary Text 1. In total, data from four walking sessions were collected for each participant: singletask control (STControl), singletask fatigue (STFatigue), dualtask control (DTControl) and dualtask fatigue (DTFatigue).
2.2 Statistical analyses
Descriptive statistics and populationlevel analyses
In a first step, we computed descriptive statistics of age, body weight, height and daily physical activity level of all participants. Next, we performed a populationlevel analysis using a twoway repeated measures ANOVA to serve as a reference for the comparison to the Nof1 trial analyses. In ANOVA, we used stride length and stride time as outcomes and tested for the effect of physical fatigue and cognitive tasks, which were included as fixed factors. No further covariates were included in the model.
Nof1 trial analysis using Bayesian mixed models
In our main analysis, we analyzed the data through the lens of Nof1 trials and for each participant, we estimated the individual effects of the physical fatigue intervention and cognitive intervention on the gait parameters stride length and stride time. In contrast to typical Nof1 trials with multiple crossovers, the data from our study consists of 4 blocks of repeated measures of the outcome gait parameters over the course of an intervention period for each participant, which is an experimental design that is typical of populationlevel studies granacher2010effects. In other words, the hundreds of gait measurements of each participant are contained in intervention blocks of either sequence STControl – STFatigue – DTControl – DTFatigue or of sequence DTControl – DTFatigue – STControl – STFatigue.
We used Bayesian linear mixed models to fit probabilistic models of the data distribution to the gait time series data for each participant, and separately for stride length and stride time. Such Bayesian models provide a probabilistic description of the data for interpretation makowski2019indices and allow to incorporate prior knowledge, which is not possible in conventional frequentist analyses. A model with firstorder autoregressive (AR1) error structure was used which acknowledges that (for the same person) the covariance between errors from the observations may not be equal, and decreases towards zero with increasing lag, which matched well with the longitudinal stride parameters in our study deVries2013bayesian.
In more detail, we fitted the following model separately for each participant. Let denote the th observation of a participant in the study, i.e., the observation at the th timepoint. We assume a linear model , where is the () design matrix describing the fixed data structure, its row = (, , , ) denotes observation , and
is a vector including the intercept
which is associated with the first combination of the groups in (i.e., the gait parameter of interest under STControl condition), as well as the changes of the gait parameter from STControl condition to the other conditions (denoted by , and ).represents the error drawn from a multivariate normal distribution,
, whereis a variancecovariance matrix determined by the AR1 process, in which the exponent of the correlation declines linearly with the time lag
:(1) 
The Markov Chain Monte Carlo (MCMC) method with Gibbs sampling was used.Among all parameters of the model, the parameter of primary interest is the vector
. Combinations of its elements make up the mean gait parameter distributions for the four walking conditions (i.e., STControl, STFatigue, DTControl and DTFatigue). While informative prior distribution for the parameter can be directly inferred from studies on normal gait parameters of young healthy adults bernal2016reliability, there is not enough information available to assume priors for the other parameters. As a result, we chose to use noninformative priors in the main analyses. We employed halfCauchy distribution for
in the AR1 model as described in gelman2006prior, with default priors recommended by Gelman et al gelman2008weakly. In sensitivity analyses, we tested different informative priors, see section 2.2. In the sampling, we used 2 chains, 5000 burnin steps, 1 thinning step (i.e. no thinning) and 10,000 iterations. To reduce the amount of computation, data used for the AR1 model were taken only from the left foot, and downsampled by a factor of five (i.e., selecting every fifth sample sequentially).The convergence of the MCMC chain was confirmed with potential scale reduction factor (PSRF) and trace plots. A PSRF of 1 indicates chain convergence. Also, stable and uniform values over the iterations (i.e., a horizontal band with no particular patterns in the trace plots) for all sampling chains indicate convergence. MCMC chain resolution was evaluated using the effective sample size (ESS), which measures the efficiency of Monte Carlo methods such as MCMC martino2017effective
. Higher ESS indicates more information content, or effectiveness of the sample chain. More details on the MCMC diagnostics and on their results can be found in Supplementary Text 3. To confirm that the posterior estimates accurately represent the observed data, a posterior predictive check was performed by comparing the posterior distributions with the distribution of the observed samples. More specifically, the posterior distribution of the intercept and effects were used to reconstruct the modeled distributions of gait parameters under the four conditions. These modeled distributions are then compared with the observed sample distributions using boxplots. The Bayesian analysis was performed using JAGS version 4.3.0, run from R version 4.1.1 (R Project for Statistical Computing). Formal specifications of the JAGS models can be found in Supplementary Text 2 and the R scripts used for running the analysis can be found at
https://github.com/HIAlab/gait_nof1trials.Sensitivity analyses
To test how well alternative Bayesian models can estimate the posterior distribution, we implemented two additional models, a simple basic model and a time covariate model. In contrast to the AR1 model introduced in Section 2.2, both these models assumed that the errors are independent and identically distributed. As a basic model, we implemented a simple Bayesian ANOVA model with two fixed factors fatigue and cognitive task performance. Similar to the AR1 model, we assumed a linear relationship and normallydistributed errors, but we assumed here that each data point, namely, each stride from the same recording session, is independent of each other. As as second alternative model based on the basic model, we included a linear time trend by appending an incremental integer array to the design matrix. Apart from the linear time trend covariate, the model structure was identical to the basic model.
In further sensitivity checks, we compared models based on noninformative and informative priors for all three abovementioned models. The investigated priors are summarized in Table 1. The distribution of informative priors was based on the corresponding gait parameter values reported for young healthy adults bernal2016reliability
which included a mean stride length of 1.36 m with standard deviation of 0.08 m; and mean stride time (estimated as doubled step time) of 1.05 s with standard deviation of 0.06 s.
Model  Noninformative Priors  Informative Priors (SL)  Informative Priors (ST)  






AR1 




SL: stride length, ST: stride time
3 Results
3.1 Characteristics of study participants
In total, data from 16 participants (8 males, 8 females) were collected for the four walking conditions (STControl, STFatigue, DTControl and DTFatigue). The dataset consisted of 3117 strides pooled across all participants. Stride length and stride time from each stride were used as outcome variables in the analyses. The observations were balanced across the four walking conditions with 788 strides from STControl, 792 strides from STFatigue, 766 strides from DTControl and 771 strides from DTFatigue, across all participants. Also, within each participant, the numbers of strides were balanced under each of the four walking conditions. Table 2 summarizes the participant characteristics, and Table 3 summarizes the gait parameters.
Variable  Mean SD  Min  Max 

Age  27.16 4.03  21  35 
Body Mass (kg)  71.19 12.58  54  103 
Height (cm)  173.78 8.85  157  190 
Activity Level^{*}  2  1  3 

1, 2, 3 means low, medium and high activity levels in IPAQ, respectively. Median is reported instead of Mean SD, since data contain ordinal values.
3.2 Populationlevel analysis
Next, we performed baseline analyses to investigate the populationlevel effects of physical fatigue and cognitive task performance on gait. Twoway repeated measures ANOVA indicated very small effects induced by physical fatigue or cognitive task performance. The main effects of physical fatigue for both stride length and stride time had an effect size of 0.01 or less (stride length: F(1,15) = 5.86, p = 0.03, = 0.01; stride time: F(1,15) = 2.56, p = 0.13, = ). Main effects of cognitive task performance were moderate, with 0.15 and 0.20 for stride length and stride time, respectively (stride length: F(1,15) = 18.46, p = , = 0.15; stride time: F(1,15) = 21.14, p = , = 0.20). No significant interaction effects were found (p = 0.77 for stride length, and p = 0.99 for stride times). See Table 3 for a summary of the ANOVA results.
3.3 Nof1 trials using Bayesian linear mixed models
The posterior distributions for stride length and stride time are illustrated in Figure 2. See Supplementary File all_posterior_estimates.zip for a complete summary of the posterior distributions of parameters. The results showed that the baseline values of the gait parameters (under STControl condition) varied largely among all participants, and the gait changes under the four walking conditions were also highly heterogeneous among all participants. Figure 2 also shows the aggregated posterior distributions from all participants for reference, which further indicated that the aggregated populationlevel summary was not a good representation of the highly heterogeneous individual gait effects.
For stride length, there was a consistent trend among participants that the values from DT conditions were smaller than those from ST conditions, as seen in the populationlevel ANOVA main effect of cognitive task described above. Nevertheless, interperson variation could be observed, for example, study participant 10 exhibited almost no response under the DT condition compared to the ST condition (mean values changed from 1.34 under ST to 1.33 under DT), whereas participant 2 largely reduced the stride length (mean values changed from 1.62 under ST to 1.35 under DT). In contrast, effects of physical fatigue on stride lengths were smaller on average but more complex on the individual level compared to those induced by the cognitive task, and opposite effects could be observed for different individuals. Especially under DT condition, stride length seemed to have increased from nonfatigue to fatigue condition for participant 6 (from 1.44 to 1.48), participant 14 (from 1.41 to 1.45) and participant 15 (from 1.45 to 1.48) but remained unchanged or decreased for the other participants. Overall, the variance of the posterior distribution was larger under DT condition compared to under ST condition. Moreover, the effects of fatigue were larger under DT condition compared to under ST condition for all participants. Similar trends could be observed for stride time, where the DT condition generally induced an increase for all participants, but the individual posterior distributions were heterogeneous and cognitive task performance seemed to have increased the variance as well as effects of fatigue for many participants. It is worth noting that the posterior estimates for participants 7 and 13 had unusually large variations compared to those for all other participants. Quality control analyses revealed that the MCMC chains did not converge for these two participants, more details are presented in section MCMC Chain Convergence in Supplementary Text 3.
To provide a qualitative overview of the heterogeneous gait changes under the four walking conditions for each participant, we computed the difference between each pair of conditions using mean values under the four walking conditions obtained from the posterior distributions. As illustrated in Figure 3, for great majority of the condition pairs, the gait changes vary in both magnitude and direction among all participants.
In the sensitivity checks, we investigated how the results might change when different regression models or different priors were used. Posterior distributions of the three models (AR1, basic, time covariate) were compared using the mean and standard deviation of the estimated model parameters. Overall, the AR1 and basic model had similar posterior distributions of parameters, and the time covariate model had slightly shifted distribution. The difference between STControl and DTControl for stride length, as represented by , was similar in the AR1 and basic model and in observed data, and larger compared to the posterior estimate from the time covariate model (0.09 from observed data, from AR1 model and from basic model, 0.06 from time covariate model). The difference between STControl and STFatigue, as represented by , was negative in observed data and in the AR1 and basic model, but positive when estimated by the time covariate model (0.03 from observed data, from AR1 model and from basic model, 0.03 from posterior estimate). Posterior estimates for stride times exhibited similar trends, in that the and estimates from the time covariate model were slightly different from those from the AR1 and the basic models. Comparison of models with different priors indicated that they had no meaningful influence on the sampling results. More detailed illustration of the effects of models and priors can be found in the Supplementary Figure S7. Hence, since no significant differences in the posterior estimates using noninformative and informative priors were found, we focused on reporting and discussing results from models using noninformative priors.
4 Discussion
In this study, we reanalyzed data from a gait study, which was originally designed as a populationbased trial, as single Nof1 trials. Despite the fact that the effects of cognitive task performance were generally larger than those of fatigue, some participants changed their gait patterns to a larger extent than the others. This is consistent with our prior knowledge that gait and gait responses to interventions are highly individual, and the posteriors for each individual allow further investigation into causes underlying these individual effects. In a reallife rehabilitation setting, by understanding under what conditions a person’s gait is especially susceptible to physical fatigue or cognitive challenges, targeted training or gait monitoring could be developed, thus reducing the risk of motor functionrelated injuries. Another interesting finding is that under the DT condition, the effects of fatigue seemed to have the same trend, but more pronounced compared to the ST condition for some individuals. This observation suggests that the cognitive task helped magnify the subtle effects of fatigue. In fact, dualtasking has been used as an established paradigm to aid in the study of otherwise subtle motor function deficits in neurological diseases baetens2013gait. The great majority of studies involving the dualtask paradigm are on the population level; based on our observation that the dualtask effect did not manifest on all individuals, it would be interesting to explore what the constraints are, so that effective, personalized dualtaskbased motor diagnosis can be developed. The above initial observations of the posterior estimates demonstrate that heterogeneous responses do exist in our cohort, and analyses based on Nof1 trials are necessary to enable further indepth investigation of the individual effects.
In typical Nof1 trials, the effects of an intervention are studied by comparing data from treatment and control conditions from the same person. The study design can leverage many of the statistical and methodological concepts from randomized controlled trials to model multiple crossovers and timedependent effects, including randomization, washin and washout periods to avoid carryover effects, and placebo controls. The dataset used in our study was unconventional in the sense that it was obtained from a study originally designed for populationlevel analyses: instead of the multiple crossovers for typical Nof1 trials, each participant was measured only in one session for the baseline, underwent the intervention once, and the intervention effects were measured in a subsequent session. Hundreds of data points (gait cycles) were measured under both conditions, which are sufficient for statistical analysis on a single person. Nevertheless, several assumptions were made to enable the analysis of the data through the lens of Nof1 trials. For example, we assumed that the oneweek break between the ST and DT visits did not induce any effect on the gait characteristics of an individual. Only in this case, the effect observed from DT condition could be attributed to cognitive task performance and not with time as a confounder. We based our assumption on evidence that an individual’s gait characteristics are persistent over a long period of time HORST2017. The oneweek break can be considered a washout period, where the effects of fatigue exercise from the previous visit are sufficiently removed. For other types of outcomes that fluctuate over time or are more sensitive to uncontrolled factors, the effect of time between visits should not be neglected.
Another assumption was made to circumvent the lack of withinperson randomization in typical Nof1 trials. The order of ST and DT visits was randomized among all participants, such that exactly half of the participants performed the cognitive task during the first visit. However, with only one crossover, the order was fixed for the same person. In addition, the order of the control and fatigue conditions during both visits was fixed as well. In our analysis, we assumed that there were no interaction effects between the person and the order of the walking conditions. However, it is possible that carryover effects exist. In that case, the design does not isolate the effect from the intervention for the individual. As one approach, the carryover effects could be modeled in the analysis to still allow efficient and unbiased estimation of the effects
gartner2022comparison. Such challenges would still be present when aggregated Nof1 trial analyses or the traditional populationlevel ANOVA analysis are performed, but they might profit when there is some level of randomization (e.g., in our case, the randomization for ST and DT visits among the participants). As future work, additional crossovers between the fatigue and dualtask conditions can be added into the study design, in order to introduce randomization within one person.In our analyses, informative and noninformative priors did not result in different posteriors for the same model, as illustrated in Supplementary Figure S7. One possible reason could be that the values of the noninformative priors were similar to those of the informative priors for both stride length and stride time. Only prior knowledge of the mean and standard deviation for the baseline (STControl) was introduced, and these values (centered around 1 with very small standard deviations) were close to the noninformative priors (centered around 0 with standard deviation of ). It is therefore worth emphasizing that for use cases where the informative priors differ largely from noninformative default priors, more detailed prior predictive check should be performed and an informed choice of priors might effect the posterior estimates to a larger degree. Visualizations similar to our Supplementary Figure S7 are helpful for qualitative comparison between different priors.
Posterior estimates of the model parameters from the AR1 and basic models matched the distributions of the observed values, whereas slight deviations could be observed for posterior estimates of the time covariate model. Moreover, for four participants (#2, 9, 12, 18), the MCMC chains did not converge for the model parameter which was associated with the linear representation of time in the design matrix. In our opinion, these observations indicate that the assumption of a linear effect of time with the time covariate model does not accurately represent the data. As discussed by Heckenstenden et al. hecksteden2015individual, repeated measurements during the course of a single uninterrupted intervention period could be used as a surrogate for repeated interventions, however, it is reasonable to assume autocorrelation between measurements, and nonlinear adaptation may occur during the measurement period. Our study indicates that in such settings, the effects of time is more appropriately modeled with the temporal autocorrelation of the samples described by the AR1 model. It is worth noticing that the MCMC chains failed to converge for the AR1 models for two participants (#7, 13) for stride length. During data collection, we observed that the general gait patterns of these two participants were particularly affected by the interventions, and initial data exploration revealed a large variability in gait parameters. We assume that the true data distributions of their gait parameters are different from those of the other participants, and the true relationship between the variables are nonlinear. In this case, a more flexible model with a nonlinear structure could be better suited for analysis.
Based on the posteriors obtained from the Bayesian analyses and the assumptions discussed above, the effect sizes of interventions can be estimated and further investigated kelter2020analysis. Aggregated Nof1 trials analyses can be performed to investigate the underlying causes of the personalized responses to intervention gartner2022comparison. In our study, the different responses could potentially be associated with the participants’ sex, anthropometric features or stable lifestyle habits (e.g., as measured by the IPAQ questionnaire), or a combination of all these factors. In future studies, more heterogeneous cohorts with a larger variety of age or preexisting health conditions, genetic endowment or epigenetic modifications will provide additional features for analysis. Based on these findings, personalized advice or treatment could reduce the risk of falls or injury for vulnerable individuals.
5 Conclusion
Our study provides an example of how to initiate an indepth investigation of treatment effects on an individual level using data from populationlevel studies. We demonstrate the use of Bayesian models to study individuallevel effects of interventions, and point out aspects to consider for future studies.
Data Availability Statement
The data and scripts used for running the analysis can be found at https://github.com/HIAlab/gait_nof1trials.
Conflict of Interest
The authors have declared no conflict of interest.
The authors would like to thank all participants in this study. We would also like to thank Urs Granacher and Clemens Markus Brahms for their support in developing the study. This study received funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number 491466077, and has been partly funded by the Federal Ministry of Education and Research of Germany in the framework of KILABITSE (project number 01IS19066).
Supplementary Materials
Supplementary Text 1: Details of research methods
Fatigue protocol and cognitive task
During the fatigue protocol, the participants wore a weighted vest matched to 30 % of their body weight, repeatedly stood up from a chair at a selfselected, rapid pace until failure. Fatigue was assessed using the Borg Rating of Perceived Exertion (referred to as the Borg scale in following text) borg2006comparison, as well as blood lactate concentration measured using blood sampled from the earlobe finsterer2012biomarkers.
The cognitive task required participants to serially subtract the number seven from a 4digit number between 3000 and 9000 provided by the experimenter, which was randomly generated anew for each participant. Participants were asked to verbalize the numbers (e.g., 3745, 3738, 3731, […]), so that the answers could be documented with the audio recorder and analyzed later. In order to minimize learning effects, participants practiced the dualtask 6minute walking session one time before the actual recording trial with the exact same setup.
The study was approved by the ethics committee of the University of Potsdam (63/2020), and all experiments were conducted according to the latest revision of the declaration of Helsinki. All participants provided written consent prior to data collection.
IMU gait analysis
Two IMUs (Physilog^{®}5, Gait Up, Switzerland) were attached to left and right insteps of the participants. Triaxial acceleration and angular velocity data were recorded at 128 Hz, the acceleration and gyroranges were g and dps, respectively. An audio recorder was attached close to the left collar bone in order to document the responses of the cognitive tasks recorded under the dual task condition.
Stride length and stride time of all strides from the walking sessions were extracted from the raw IMU data using an errorstate Kalman filter based algorithm, which utilizes zerovelocity update (ZUPT) to correct for accumulating drift errors caused by inertial integration. The algorithm has been described in detail, and validated against gold standard reference systems in previous studies
tunca2017inertialzhou2020. Briefly, triaxial acceleration and angular velocity signals were used as inputs, the stance phases were identified with gyro magnitude threshold, and an errorstate Kalman filter was employed to periodically correct the drift during stance phases, thus enabling estimation of the threedimensional trajectories of the feet movement. The gait events (toeoff, initial contact) were identified using signal features in in the gyroscope data. Temporal gait parameters, such as swing time and stance time, were derived directly from the gait events. Spatial gait parameters, such as stride length and clearance, were obtained by segmenting the trajectories into individual strides. Turning strides at the ends of the walkway were excluded based on a threshold on the change in foot orientation. Acceleration and deceleration phases were excluded by removing two strides before and after the actual turning strides. Additional outlier strides were defined as those whose gait parameter values were larger or smaller than three standard deviations around the mean (effectively 0.3% of the data), and were excluded from further analyses.
Supplementary Text 2: Details on Bayesian models
We used Bayesian linear mixed models to fit probabilistic models of the data distribution for the four walking conditions. They provide a probabilistic description of the data for interpretation makowski2019indices. This is in contrast to the conventional frequentist modeling, which produces a pvalue which is widely misinterpreted when determining whether the hypothesis studied is true wasserstein2020asa
. The Markov Chain Monte Carlo (MCMC) method with Gibbs sampling was used to estimate parameters describing the data distribution. MCMC performs Monte Carlo integration by drawing samples from a probability distribution with the construction of a Markov chain, where the value of the current variable is dependent on the value of the previous variable in the chain. With sufficient numbers of steps, the sampled values will converge to the value which is being inferred
geyer1992practical. Gibbs sampling is one of the most commonly used MCMC algorithms. It draws samples for each parameter from the full conditional distributions of that parameter smith1993bayesian.The basic model with noninformative priors is specified as follows. (More details can be found, for example, in a tutorial on factorial ANOVA implementation using JAGS^{2}^{2}2https://agabrioblog.onrender.com/jags/factorialanovajags/factorialanovajags, last retrieved on 20220812.)
modelString = " model { #Likelihood for (i in 1:n) { y[i]~dnorm(mean[i],tau) mean[i] < inprod(beta[],X[i,]) } #Priors beta[1] ~ dnorm(0,1.0E3) for(i in 2:ngroups) { beta[i] ~ dnorm(0,1.0E3) } sigma ~ dunif(0, 100) tau < 1 / (sigma * sigma) } "
The primary outcome of this study was the distribution of gait parameters under the four walking conditions, which can be derived from the model parameters. The JAGS program used in this work uses a dialect of the BUGS modeling language. In BUGS language, the normal distribution is parameterized in terms of precision (tau), which is the inverse of variance (sigma squared) plummer2017jags. In the model string, the parameter n in the likelihood model represents the total number of data points used for the simulation, whereas the ngroups in priors represents total number of elements in the beta vector, namely, the number of columns in the design matrix X.
The time covariate model was the same as the basic model, except that the design matrix X had an additional column with incremental integers for each walking condition, which represents the time component.
The AR1 covariance model with noninformative priors was defined as follows. The definition of the halfCauchy distribution was adopted from Gelman gelman2006prior:
modelString = " model { #Likelihood for (i in 1:n) { mean[i] < inprod(beta[],X[i,]) } y[1:n] ~ dmnorm(mean[1:n],Omega) for (i in 1:n) { for (j in 1:n) { Sigma[i,j] < sigma2*(1 phi*phi)*(equals(i,j) + (1equals(i,j))*pow(phi,abs(ij))) } } Omega < inverse(Sigma) #Priors phi ~ dunif(1,1) beta[1] ~ dnorm(0,1.0E3) for(i in 2:ngroups) { beta[i] ~ dnorm(0,1.0E3) } sigma < z/sqrt(chSq) # prior for sigma; cauchy = normal/sqrt(chi^2) z ~ dnorm(0, 0.16)I(0,) # positive part of normal distribution, Cauchy scale = 2.5 chSq ~ dgamma(0.5, 0.5) # chi^2 with 1 d.f. sigma2 = pow(sigma,2) } "
Supplementary Text 3: Details on quality control
MCMC chain convergence
Convergence of the MCMC chain was confirmed with visualization using trace plots and the convergence statistic potential scale reduction factor (PSRF). The trace plot displays sampled values over number of iterations for each chain and each model parameter. Stable and uniform patterns (i.e., a horizontal band with no particular patterns) for both chains indicate convergence. The PRSF is an estimated factor by which the current distribution of the parameter might be reduced if the simulations were to continue for an infinite number of iterations gelman1992inference. The PSRF plot shows the median and upper confidence limits (confidence = 0.95) against the number of iterations. An upper limit close to 1 indicates approximate convergence, as the current distribution is no longer overdispersed in respect to the target distribution. In our study, both the trace plots and PRSF confirmed chain convergence for all simulations for stride length. Supplementary Figures S1 and S2 show example trace plots and PSRF plots of converged chains, respectively. For stride time, the chains from the AR1 models did not converge for participants sub_07 and sub_13 for all model parameters. Chains from the time covariate models did not converge for participants sub_02, sub_09, sub_12 and sub_18 for the parameter , which was associated with the time component in the design matrix. Examples of trace plots and PRSF for nonconvergence can be found in Supplementary Figures S3 and S4.
MCMC chain resolution
The resolution of the MCMC chain was measured with effective sample size (ESS). A higher ESS indicates more information content, or higher effectiveness of the sample chain. In cases where observed data samples are highly autocorrelated, the ESS might be relatively small compared to the total sample size. Supplementary File all_posterior_estimates.zip shows the ESS for each model and parameter (n.eff).
Posterior predictive check
Compare models
Posteriors of the main model (AR1) and two alternative models (basic and time covariate) were plotted in combination with their corresponding noninformative and informative priors for comparison. Figure S7 and Figure S8 show means and standard deviations for posteriors of stride length and stride time, respectively. Informative priors did not have a visible influence on posteriors, whereas the three different models produced slightly different posteriors.