1 Introduction
Osteosarcoma is a malignant bone tumour mainly affecting children and young adults. Although osteosarcoma is the most common primary malignant bone cancer, it is a rare disease and has an annual incidence of 34 patients per million (Smeland et al., 2019). Multidisciplinary management including neoadjuvant and adjuvant chemotherapy with aggressive surgical resection (Ritter and Bielack, 2010) or intensified chemotherapy has improved clinical outcomes but over the past 40 years there have been no further improvements in survival (Anninga et al., 2011).
In cancer trials, the relationship between chemotherapy dose and clinical efficacy outcomes are problematic to analyse due to the presence of negative feedback between exposure to cytotoxic drugs and other aspects, such as latent accumulation of chemotherapyinduced toxicity. Toxicities, developed by patients through chemotherapy, are timedependent confounders for the effect of chemotherapy on patient’s status (Lancia et al., 2019). Toxicities affect subsequent exposure by delaying the next cycle or reducing chemotherapy doses (Souhami et al., 1997), being at the same time risk factors for mortality and predictors for future exposure levels. According to the Common Terminology Criteria for Adverse Events (CTCAE) (U.S. Department of Health and Human Services, 2006), a multimodality grading system for the standardized classification of adverse events (AEs) in cancer therapy, nominal grades of AEs severity range from minor to lifethreatening injuries or death (Trotti et al., 2003). Since patients may have multiple AEs with different levels of severity, identifying the actual extent of toxic burden and investigating the evolution of patient’s overall toxicity status during treatment represent challenging problems in cancer research.
Due to the complexity of longitudinal chemotherapy data, no standard method is available for analysing AEs data. Toxicity data are usually considered in very simplistic ways in cancer studies, where they act as fixed pooled indexes, such as maximum toxicity over time, maximum grade among events, or weighted sums of individual toxic effects (Bekele and Thall, 2004; Rogatko et al., 2004; Trotti et al., 2007; Lee et al., 2012; McTiernan et al., 2012; Sivendran et al., 2014; Thanarajasingam et al., 2015, 2016; Zhang et al., 2016; Carbini et al., 2018). Although these methods can summarise data over time, substantial amount of information (e.g., isolated vs repeated events, single vs multiple episodes, longerlasting lowergrade toxicities, toxic events timing) are discarded. As neglecting the time component may give an inaccurate depiction of toxicity, alternative methods for a longitudinal analysis of AEs have been proposed (Trotti et al., 2007; Thanarajasingam et al., 2016, 2020; Hirakawa et al., 2019). Although these approaches improperly treated the nominal CTCAE grades as numerical values, they provided more insights into treatmentrelated toxicity, suggesting that longitudinal methods should become of standard use in future analyses of cancer trials. Models to deal with both longitudinal and categorical aspects of toxicity levels progression are then necessary, still not well developed.
Longitudinal data are often of interest in a wide range of research fields, such as social, economic and behavioural sciences, education or public health, since they allow to detect the changes and developments of subjects’ social and biological characteristics over time. In many applications involving longitudinal data, the interest lies in analysing the evolution of a latent characteristic of a group of individuals over time, rather than in studying their observed attributes (Bartolucci et al., 2014
). The phenomenon which affects the distribution of the response variables that are relevant for the problem under consideration may not be directly observable. In a clinical context, this latent characteristic may reflect patients’ qualityoflife and could contain valuable information related to patient’s health status and disease progression.
In the statistical literature many models have been proposed for the analysis of longitudinal data; for a review see, among others, Fitzmaurice et al. (2009). For longitudinal categorical data, where the interest is in describing individual changes with respect to a certain latent status, Latent Markov (LM) models can be used (Wiggins, 1973; Bartolucci et al., 2013
). These models are tailored to study the evolution of an individual characteristic of interest, when it is not directly observable. The idea behind a LM model is that the latent process fully explains the observable behaviour of a subject, assuming that the response variables are conditionally independent given the latent process. For this aim, the latent process follows a Markov chain with a finite number of states, which represent different conditions of the latent characteristic of interest. LM models can also account for the effect of observable covariates, serial dependence between observations, measurement errors, or unobservable heterogeneity. For a comprehensive overview on LM models see
Bartolucci et al. (2013, 2014).Motivated by the need to improve methods for summarising and quantifying the overall toxicity level and its evolution during treatment, in this work a novel procedure based on latent Markov models for longitudinal toxicity data is proposed. The latent status of interest is the Latent Overall Toxicity (LOTox) condition of a patient, which affects the distribution of the observed categorical toxic grades measured over treatment. The proposed approach aims at identifying different latent states of overall toxicity burden (LOTox states) and investigating how patients move between states during chemotherapy treatment.
Three novelties are presented in this work: (i) introduction of a new method based on LM models to summarize and quantify multiple AEs and their evolution during treatment, by including both longitudinal and categorical aspects of the observed toxic levels; (ii) identification of groups of patients with a common distribution for the observed toxic categories, and thus a similar overall toxicity burden; (iii) reconstruction of personalized longitudinal LOTox profiles
, which represent the probability over time of being in a specific LOTox state and allow to study the individual overall toxic risk evolution during treatment for each subject. The approach proposed here has never been applied to osteosarcoma treatment and provides new insights for supporting clinicians in planning childhood cancer therapy. Provided that longitudinal CTCAEgraded toxicity data are available, the developed procedure is a flexible approach that can be adapted and applied to other cancer studies.
The rest of this article is organized as follows. Data from the MRC BO06/EORTC 80931 Randomized Controlled Trial for patients with osteosarcoma (Lewis et al., 2007) are described in Section 2. Statistical methods are introduced in Section 3. Results for MRC BO06 data are presented in Section 4. Section 5 ends with a discussion of strengths and limitations of the proposed approach, identifying some possible developments for future research.
2 MRC BO06 randomized clinical trial data
In Section 2.1 the selected cohort of patients is illustrated. Longitudinal chemotherapy data and patients characteristics are presented in Section 2.2.
2.1 Data illustration
Data from the MRC BO06/EORTC 80931 Randomized Clinical Trial (RCT) for patients with nonmetastatic highgrade osteosarcoma recruited between 1993 and 2002 were analysed (Lewis et al., 2007). Patients were randomized between conventional (RegC) and doseintense (RegDI) regimens. Both arms had six cycles of the same course of doxorubicin and cisplatin with different time schedule (3weekly vs 2weekly, supported by granulocyte colony stimulating factor). Details concerning the trial protocol are provided in Appendix A.
The dataset included 497 eligible patients; 19 patients who did not start chemotherapy (13) or reported an abnormal dosage (i.e., +25% higher than planned) for a single or both agents (6) were excluded. Patients who did not complete all six cycles of chemotherapy (93) and did not terminate the last cycle within 180 days after randomization (8) were excluded. The final cohort of 377 patients included in the analyses (75.9% of the initial sample) is shown in the consort diagram in Figure 1.
2.2 Longitudinal chemotherapy data
During the trial treatment, case report forms were used to document across cycles all the information required by the MRC BO06/EORTC 80931 RCT protocol for each patient.
Patients baseline characteristics (age, gender, allocated chemotherapy regimen, site and location of the tumour) were registered at randomization. Among 377 patients, 229 (60.7%) were males and RegDI was allocated in 52.3% of the patients (197). Median age was 15 years (IQR [11; 18]). Treatmentrelated factors (administered dose of chemotherapy, cycles delays, chemotherapyinduced toxicity, haematological parameters) were collected at each cycle of chemotherapy. For each patient and cycle , chemotherapy dose was analysed as percentage of achieved chemotherapy dose up to cycle , i.e., the percentage of the cumulative drugs administrated up to cycle divided by the cumulative drugs planned up to . Nonhaematological chemotherapyinduced toxicity for nausea/vomiting (), infection (), oral mucositis (), cardiac toxicity (), ototoxicity () and neurological toxicity () were graded according to the Common Terminology Criteria for Adverse Events Version 3 (CTCAE v3.0) (U.S. Department of Health and Human Services, 2006), with grades ranging from 0 (none) to 4 (lifethreatening) (see Appendix A
for further details). In this work, nausea/vomiting, infection and oral mucositis were classified as
generic toxicities since they represent common adverse events for chemotherapeutic treatments in general, whereas cardiac toxicity, ototoxicity and neurological toxicity, which could also cause irreversible conditions (see Appendix Table A.1), were classified as drugspecific toxicities since they are related to the use of cisplatin or doxorubicin (Almalky et al., 2020; dos Santos et al., 2020).Barplots of CTCAEgrades over cycles for each chemotherapyinduced nonhaematological toxicity are reported in Figure 2. As expected, generic toxicities were more frequent than drugspecific ones. Nausea/vomiting was reported at least once over cycles in 97.3% of patients (367/377), with a percentage that decreased over cycles from 84.9% in cycle 1 to 52.5% in cycle 6. The percentages of patients who reported oral mucositis or infections were more stable over cycles: 30.5%43.3% for mucositis, with 78% (294/377) reporting mucositis at least once, and 23.8%31.3% for infection, with 69% (260/377) reporting an infection at least once. Ototoxicity was reported at least once in 21.5% (81/377), cardiac toxicity in 14.1% (53/377) and neurological toxicity in 11.7% (44/377). At each cycle, CTCAEgrade 4 for generic toxicities and CTCAEgrades for drugspecific toxicities were reported in less than 5% of patients. For this reason, the lowfrequency classes have been merged, ending up with the following toxic categories which represent

the severity of the toxic event for generic toxicities: none (CTCAEgrade 0), mild (CTCAEgrade 1), moderate (CTCAEgrade 2), and severe (CTCAEgrades 3 or 4);

the absence or the presence of toxic event for drugspecific toxicities: no (CTACEgrade 0) and yes (CTACEgrades ).
These categories identified for each toxicity constitute the itemresponse elements selected to model the latent process representing the "true" overall toxic status. Table 1 shows the observed frequencies (and percentages) of the selected categories for each toxicity over cycles. The observed responses for each patient are then given by his/her longitudinal toxic categories measured along the cycles, which are finally used to evaluate his/her LOTox condition during treatment.
Toxicity  Cycle 1  Cycle 2  Cycle 3  Cycle 4  Cycle 5  Cycle 6 

Nausea 

none  57 (15.1%)  88 (23.3%)  115 (30.5%)  126 (33.4%)  146 (38.7%)  179 (47.5%) 
mild  74 (19.6%)  87 (23.1%)  76 (20.2%)  72 (19.1%)  86 (22.8%)  74 (19.6%) 
moderate  117 (31.1%)  117 (31.1%)  114 (30.2%)  113 (30.0%)  96 (25.5%)  87 (23.1%) 
severe  129 (34.2%)  85 (22.5%)  72 (19.1%)  66 (17.5%)  49 (13.0%)  37 (9.8%) 
Infection 

none  259 (68.7%)  287 (76.1%)  268 (71.1%)  265 (70.3%)  268 (71.1%)  286 (75.9%) 
mild  30 (7.9%)  24 (6.4%)  26 (6.9%)  31 (8.2%)  23 (6.1%)  16 (4.3%) 
moderate  64 (17.0%)  45 (11.9%)  61 (16.2%)  54 (14.3%)  52 (13.8%)  45 (11.9%) 
severe  24 (6.4%)  21 (5.6%)  22 (5.8%)  27 (7.2%)  34 (9.0%)  30 (8.0%) 
Mucositis 

none  265 (70.3%)  228 (60.5%)  234 (62.1%)  237 (62.9%)  214 (56.8%)  262 (69.5%) 
mild  54 (14.3%)  46 (12.2%)  59 (15.6%)  52 (13.8%)  62 (16.4%)  44 (11.7%) 
moderate  44 (11.7%)  54 (14.3%)  43 (11.4%)  55 (14.6%)  63 (16.7%)  50 (13.2%) 
severe  14 (3.7%)  49 (13.0%)  41 (10.9%)  33 (8.7%)  38 (10.1%)  21 (5.6%) 
Cardiotoxicity 

no  374 (99.2%)  361 (95.8%)  362 (96.0%)  359 (95.2%)  357 (94.7%)  355 (94.2%) 
yes  3 (0.8%)  16 (4.2%)  15 (4.0%)  18 (4.8%)  20 (5.3%)  22 (5.8%) 
Ototoxicity 

no  357 (94.7%)  361 (95.8%)  350 (92.8%)  342 (90.7%)  346 (91.8%)  326 (86.5%) 
yes  20 (5.3%)  16 (4.2%)  27 (7.2%)  35 (9.3%)  31 (8.2%)  51 (13.5%) 
Neurological toxicity 

no  371 (98.4%)  367 (97.3%)  362 (96.0%)  367 (97.3%)  356 (94.4%)  363 (96.3%) 
yes  6 (1.6%)  10 (2.7%)  15 (4.0%)  10 (2.7%)  21 (5.6%)  14 (3.7%) 

3 Statistical Methods
In the following Sections, the novel Latent Markov (LM) approach for modelling the Latent Overall Toxicity (LOTox) condition of each patient starting from the observed longitudinal toxic categories measured during chemotherapy treatment (see Section 2.2) is introduced. In Section 3.1 motivations for the proposed approach for treating the longitudinal toxicity data are discussed. In Section 3.2 the mathematics related to LM model approach is briefly introduced. Model selection procedure and longitudinal latent probability profiles are presented in Sections 3.3 and 3.4, respectively. Statistical analyses were performed in the Rsoftware environment (R Core Team, 2020), using LMest package by Bartolucci et al. (2017). R code for the current study is available upon request.
3.1 Motivations for latent Markov models for longitudinal toxicity data
LM models are statistical methods employed for the analysis of longitudinal (categorical) data specifically designed to study the evolution of an individual characteristic of interest, when it is not directly observable (Wiggins, 1973; Bartolucci et al., 2013). The idea behind a LM approach for longitudinal toxicity data in the case of the current application is to assume the existence of a latent process representing the "true" LOTox status, which affects the distribution of the response variables, in our case the observed toxicities. Similarly to previous studies (Bartolucci et al., 2014), there are two principal motivations to adopt LM models for quantifying the toxic risk in cancer studies: (i) account for measurement errors in the observed toxicity variables, and (ii) identify different LOTox subpopulations (i.e., the latent states) in the global population (i.e., the patients’ cohort) and their changes over time. For the problem under analysis, it is reasonable to assume that the latent variables follow a firstorder Markov chain, so that the "true" level of overall toxicity at a given time is influenced only by the its previous level. Moreover, the response toxicity variables are conditionally independent, as each observed response is assumed to depend only on the corresponding "true" LOTox level. In this context, a LM model may be seen as an extension of the latent class model (Collins and Lanza, 2010), where patients are allowed to move between latent states during the observation period. LM models for longitudinal toxicity data are characterized by several parameters: the initial probability of each LOTox state, the transition probabilities among different states over chemotherapy cycles, and the conditional response probabilities given the latent variable. Individual covariates (if available) can be included in the latent model and may affect the initial and transition probabilities of the Markov chain (Bartolucci et al., 2009), as explained in Section 3.2.
A LM approach is appropriate to both identify the actual overall toxicity burden and investigate its evolution during treatment for each patient. On one hand, patients that at a specific time result in the same subpopulation are characterized by a common distribution for the observed toxic categories, and by a similar overall toxicity burden. On the other hand, individual dynamic changes among latent states allow to evaluate the LOTox evolution during treatment for each subject.
3.2 Latent Markov model with covariates
The latent Markov model (Wiggins, 1973; Bartolucci et al., 2013) with covariates affecting initial and transitions probabilities is now introduced.
Let be the set of categorical response variables measured at each time . Denote by the response variable for subject at time , with set of categories coded from 0 to . Let
denote the observed multivariate response vector at time
for patient and be the corresponding complete response vector. Denote by the complete vector of individual covariates, where elements are the vectors of timefixed and timevarying covariates for subject at occasion . The general LM model assumes the existence of a latent process which affects the distribution of the response variables . The latent process follows a firstorder Markov chain with state space , where is the total number of latent states. LM models usually assume that the response vectors are conditionally independent given the latent process (local independence of the response vectors) and that the elements are conditionally independent given (conditional independence of elements). The motivation of these assumptions is that the latent process fully explains the observable behaviour of a subject. For the problem under analysis, it means that the individual responses observed for the different nonhaematological toxicities have no direct influence on each other and depend only on the “true” LOTox process underlying them.LM models are made by two components: the measurement model concerns the conditional distribution of the response variables given the latent process, and the latent model concerns the distribution of the latent process (i.e., initial and transition probabilities). This latent process represents an individual characteristic of interest that is not directly observable and may evolve over time, also depending on observable covariates. The main research interest hence lies in modelling the latent process and the effect of covariates on its dynamic, as in Bartolucci et al. (2009, 2017). In this framework, a version of LM models where both the initial and the transition probabilities of the latent process may depend on covariates is considered. Three different sets of probabilities (i.e., parameters) can be defined.

Conditional response probability (or itemresponse probability) is the probability of observing a response for variable at time , given the latent status :
To ensure that the interpretation of the latent states remains constant over time, conditional response probabilities are assumed timehomogeneous, i.e.,
. Given the estimated
, the latent states can be characterized in terms of observed response categories. 
Initial latent states prevalence is the probability of membership in latent state at time , given the vector of covariates for individual :
The estimated
may be interpreted as quantities proportional to the size of each latent state at the first timeoccasion, given the covariates. A natural way to allow the initial probabilities of the LM chain to depend on individual covariates is a multinomial logit parametrization:
(1) where and are the parameters vectors to be estimated.

Transition probability is the probability of a transition to latent state at time , conditional on membership in latent state at time , given the individual vector of covariates (if available):
where and . The estimated reflect changes or persistence in the various states over time, given the individual covariates whose effects can be modelled through a multinomial logit parametrization:
(2) for and with . are the parameters vectors to be estimated.
Under the assumptions of local and conditional independence, the manifest distribution of the response variables is given by:
(3) 
where . The vector is a realization of made by the subvectors where is a realization of with elements . The vector is a realization of made by the subvectors where is a realization of .
Parameters estimation is performed maximizing the loglikelihood for a sample of independent units, i.e.,
, using an ExpectationMaximization algorithm (
Bartolucci et al., 2013, 2014, 2015).3.3 Model selection
The choice of the final LM model for the application consists of two steps: (i) identification of the number of latent states , and (ii) selection of the covariates to be included in the final model. When the number of latent states can not be a priori defined based on clinical indications, it can be selected according different measures (Bacci et al., 2014). The most used indexes are the Akaike information criterion (AIC) by Akaike (1973) or the Bayesian information criterion (BIC) by Schwarz (1978), defined as
where is the maximum of the loglikelihood of the model of interest and denotes the number of free parameters. In particular, the smaller the values of the above criteria, the better the model represents the optimum compromise between goodnessoffit and complexity. If the two criteria lead to selecting a different number of states, BIC is usually preferred (Bacci et al., 2014; Bartolucci et al., 2017).
Following a standard practice, unrestricted basic LM models (i.e., LM models with timeheterogeneous transitions and no covariates  named M1) were fitted increasing the value of from 1 to 10, and the number of latent states was selected according to the minimum BIC. Once was determined according to M1, a forward strategy was adopted to identify the covariates to be included in the final model. In particular, the smallest basic LM model with latent states and timehomogeneous transitions (i.e., the LM model restricted to the case in which initial and transition probabilities are parametrized by multinomial logit without covariates  named M2) was initially fitted and then the effect of each covariate on initial and/or transition probabilities (models M3M12) was added. Only the covariates whose effect reduces the value of the BIC index were included in the final LM model.
3.4 Longitudinal latent probability profiles
For each the ExpectationMaximization algorithm provides the posterior probabilities of variables
(4) 
which can be estimated using recursions and involving the manifest distribution in Equation (3) (Bartolucci et al., 2014).
For each latent state , probabilities in (4) can be used to reconstruct the Longitudinal Latent Probability Profile of the th subject (LLPP), as follows:
(5) 
Each LLPP represents the probability over time of being in latent state for individual , given the observed complete response and covariates (if available). Applying this procedure, a variate longitudinal latent profile such that can be obtained for each subject . Moreover, according to the socalled local decoding, a prediction of the sequence of the latent states over time for each subject can be obtained maximizing over states
. It is indicated aswhere for all .
In the present framework, the LOTox states summarize different levels of overall toxicity burden, representing a proxy for patient’s quality of life. Therefore, for each patient , LLPP in Equation (5) represents the probability over time of being in the LOTox state (i.e., the probability over time of developing an overall toxic burden quantified by state ) given patient’s history: observed toxicity categories and personal characteristics over treatment. By reconstructing the longitudinal LOTox profiles, it is possible to (i) describe patient’s response to therapy over cycles, (ii) quantify the overall toxicity burden evolution over treatment cycles given patient’s history and (iii) investigate the individual dynamic changes among latent states, detecting differences in health status and quality of life among patients. Similarly, the local decoding prediction represents the sequence of the most probable LOTox states (i.e., the most probable overall toxicity burden condition) for a patient over time. Both the longitudinal LOTox profiles and the corresponding LOTox sequence represent tools to summarize and quantify the overall toxic risk and its evolution for each patient.
4 Results
In this section, the results obtained from the application of the LM model proposed in Section 3 to the MRC BO06/EORTC 80931 Randomized Clinical Trial (RCT) data about osteosarcoma patients (described in Section 2) are reported.
4.1 Latent Markov model for longitudinal toxicity data
For each cycle , let be the set of nonhaematological toxicities, representing response variables . The relative sets of response categories identified in Section 2.2 were coded from 0 to , as follows:
Latent Markov (LM) model  AIC  BIC  

M1: Unrestricted LM model without covariates  1  18  8794.91  17625.81  17696.59 
2  35  8420.19  16910.38  17048.01  
3  68  8216.99  16569.98  16837.37  
4  111  8035.21  16292.42  16728.90  
5  164  7902.59  16133.18  16778.07  
6  227  7793.14  16040.29  16932.91  
7  300  7688.12  15976.24  17155.91  
8  383  7603.30  15972.61  17478.66  
9  476  7530.49  16012.98  17884.73  
10  579  7462.34  16082.68  18359.45  
M2: Multinomial logit LM model without covariates  4  63  8069.21  16264.43  16512.16 
M3: M2 + regimen effect on initial prob.  4  66  8065.49  16262.97  16522.50 
M4: M2 + gender effect on initial prob.  4  66  8061.73  16255.45  16514.98 
M5: M2 + age effect on initial prob.  4  66  8055.35  16242.69  16502.22 
M6: M2 + regimen effect on transition prob.  4  75  8063.37  16276.74  16571.66 
M7: M2 + gender effect on transition prob.  4  75  8060.33  16270.66  16565.58 
M8: M2 + age effect on transition prob.  4  75  8061.07  16272.14  16567.06 
M9: M2 + timevar chemotherapy dose on both prob.  4  78  8045.55  16247.10  16553.82 
M10: M2 + timevar WBC count on both prob.  4  78  8062.53  16281.07  16587.78 
M11: M2 + timevar PLT count on both prob.  4  78  8047.15  16250.30  16557.02 
M12: M2 + timevar NEUT count on both prob.  4  78  8062.67  16281.34  16588.05 
The procedure described in Section 3.3 was applied to first identify the number of latent states and then select the covariates to be included in the final model. Age, gender and allocated regimen at randomization were considered as timefixed covariates, while percentage of achieved chemotherapy dose up to cycle , white blood cell, neutrophils and platelets counts measured at each cycle were considered as timevarying ones. Results are shown in Table 2. The unrestricted LM model without covariates (M1) with the minimum BIC (16728.90) was obtained for , identifying a latent process with four LOTox states. Moreover, the basic model M2 with initial and transition probabilities parametrized by multinomial logit was preferable (BIC = 16512.16) to the unrestricted model M1 with the same number of latent states. Several models (M3M12) with four latent states, obtained from M2 adding covariates effect to initial and/or transition probabilities, were fitted. By comparing models M3M12 with M2, age (centred with respect to the mean) at randomization was the only covariate leading to a significant improvement in terms of both BIC and AIC (M5). Model M5 was then selected as final model. The path diagram of model M5 for a given subject is displayed in Figure 3: only initial probabilities were influenced by patient’s age at randomization and transition probabilities were assumed timehomogeneous. Equations (1) and (2) for a patient became:
(6)  
(7) 
Figure 4 shows the estimated conditional response probabilities for each type of nonhaematological toxicity under the selected model M5, which can be used for interpreting the latent states. In each toxicitypanel, each column refers to a different latent state . State 1 seems to correspond to people in good conditions, since for all nonhaematological toxicities the most probable category was the absence of the adverse event. State 2 seems to correspond to patients with nonsevere nausea and was the only state where drugspecific toxicities occurred with a relevant probability, especially for ototoxicity where . State 3 seems to be characterized by patients undergoing only nausea or vomiting, mostly moderate or severe. State 4 seems to correspond to people with multiple generic toxicities  mostly severe or moderate  with the certainty of having nausea (). Based on these results, the following LOTox states labelling were derived:

State 1: quite good conditions (nontoxic) no LOTox

State 2: nonsevere nausea with possible drugspecific AEs moderate LOTox

State 3: moderate/severe nausea/vomiting only low LOTox (limited to nausea)

State 4: multiple severe/moderate generic toxicities high LOTox.
Note that the states numbering (from 1 to 4) does not correspond with the progressive severity of overall toxicity burden (from no to high).
Table 3 displays the estimated regression parameters for the initial probabilities in Equation (6) and the estimated transition probabilities in Equation (7). The estimated intercepts indicates that for 15year patients the most prevalent state at cycle 1 was low LOTox state 3 (limited to nausea), followed by no LOTox state 1, high LOTox state 4 and moderate LOTox state 2. The estimates for were all positive, indicating that older individuals reported a higher overall severity at the first cycle compared to younger patients. The estimated transition probabilities shows a quite high persistence in the same state, especially for nontoxic state 1 and moderate state 2, where drugspecific AEs may also lead to permanent conditions (see Appendix Table A.1). The highest transition probability was 15.6% and was observed from the high LOTox state 4, where the effects of generic AEs are reversible and temporary, to the first nontoxic state. Other transitions were observed from high LOTox state 4 to nausea/vomiting only in state 3 (8.7%) and from low LOTox state 3 (limited to nausea) to no LOTox state 1 (10.7%) or high LOTox state 4 (8.2%). The remaining transition probabilities were always lower than 8%.
Regression parameters for initial probabilities  

2  3  4  
Intercept  1.2679  1.0138  0.3031  
Age  0.1858  0.0014  0.0512  
Transition probabilities from to ()  
1  2  3  4  
1  0.9674  0.0167  0.0032  0.0127 
2  0.0525  0.9214  0.0245  0.0016 
3  0.1070  0.0526  0.7581  0.0824 
4  0.1555  0.0356  0.0868  0.7221 
Starting from these parameter estimates, Figure 5 (left panel) displays the estimated vectors of initial probabilities for patients aged 10, 15 and 20 years old and the vector obtained as average of vectors over all the 377 subjects in the sample. On average, at cycle 1 low LOTox state 3 of subjects with nausea/vomiting only had the largest dimension (55.7%), followed by 20.2% of individuals for no LOTox state 1. No and low LOTox states together, representing the states with the lowest overall toxic severity, accounted for more than 75% of the patients, whereas less than 25% belonged to the latent states corresponding to the worst toxic conditions (moderate and high LOTox states 2 and 4).
Right panel in Figure 5 shows the estimated average probability of each latent state at each timeoccasion, i.e., the latent states prevalences averaged over all the subjects at each cycle. On average, prevalence of low overall severity limited to nausea (state 3) decreased over cycles from 55.7% to 19.3% (), whereas no and moderate overall toxicity (state 1 and 2, respectively) increased from 20.2% to 49.7% and from 9.2% to 18.9%. Prevalence in high overall severity (state 4) was more stable over cycles ranging in 10.1%15.6%, with peaks at cycles 2 and 3.
4.2 Longitudinal Latent Overall Toxicity (LOTox) profiles
Once the parameters were estimated for the final LM model, the longitudinal latent probability profiles were reconstructed for each patient and latent state , according to Equation (5) in Section 3.4.
In the case of longitudinal toxicity data, profiles can be also named longitudinal LOTox profiles since they represent the probability over cycles of being in the LOTox state for each patient , given the observed toxic categories over treatment and individual characteristics (i.e., the age at randomization). Figure 6 shows the longitudinal LOTox profiles for four patients aged 15 years old and with different observed toxic categories over cycles, as reported in Table 4. The longitudinal LOTox profiles captured the individual realisations of the latent process over cycles, showing different patterns of overall toxicity evolution over treatment among patients.
Patient  Cycle  Naus  Inf  Oral  Car  Oto  Neur  

A  1  15  3  0  1  0  0  0 
2  3  1  0  0  0  0  
3  3  3  0  0  0  0  
4  3  2  1  0  0  0  
5  3  0  2  0  0  0  
6  3  0  1  0  0  0  
B  1  15  1  0  0  0  0  0 
2  1  0  2  0  0  0  
3  2  2  3  0  0  1  
4  1  0  0  0  0  0  
5  1  0  1  0  0  0  
6  2  0  0  0  0  1  
C  1  15  2  0  0  0  0  0 
2  1  0  0  0  0  0  
3  1  0  0  0  0  0  
4  1  0  0  0  0  0  
5  1  0  0  0  0  0  
6  1  0  0  0  0  0  
D  1  15  2  0  0  0  0  0 
2  2  0  2  0  0  0  
3  0  0  0  0  0  0  
4  0  0  0  0  0  0  
5  0  0  0  0  0  0  
6  0  0  0  0  0  0 
According to local decoding, the sequences of the most probable LOTox states across cycles for patients computed on the basis of their posterior probabilities were
which reflected different extents of overall toxicity burdens during treatment according to the different toxicity levels progression observed for each patient (Table 4).
Both longitudinal LOTox profiles and sequences summarize and quantify the overall toxic risk and its evolution for each patient based on observed individual characteristics. This information in cooperation with medical staff could lead to improvements in the definition of useful tools for healthcare assessment and treatment planning.
5 Discussion
Due to the presence of multiple types of Adverse Events (AEs) with different levels of severity, identifying the actual extent of toxic burden and investigating the evolution of patient’s overall toxicity represent challenging problems in cancer research. AEs are one of the main factors determining clinical decisions in medical interventions and treatment planning, playing a fundamental role in health assessment and patient monitoring. The development of statistical methods able to summarize multiple AEs and to deal with the complexity of chemotherapy data, considering both the longitudinal and categorical aspects of toxicity levels progression, is then necessary and clinically relevant.
This paper proposed a Latent Markov (LM) model with covariates as a tool for investigating the evolution of overall toxicity pattern for each patient over chemotherapy treatment. This model is applied to longitudinal chemotherapy data for osteosarcoma patients from MRC BO06/EORTC 80931 Randomized Controlled Trial. The toxic AEs over cycles were registered through apposite case report forms and graded according to the Common Terminology Criteria for Adverse Events (CTCAE) (U.S. Department of Health and Human Services, 2006), as indicated by the trial protocol (Lewis et al., 2007). The latent status of interest is the Latent Overall Toxicity (LOTox) condition of a patient, which affects the distribution of the observed categorical toxicities grades over treatment. This latent characteristic plays the role of timedependent confounder for the effect of chemotherapy on patient’s health (delaying the subsequent cycle or reducing its dosage), providing insights into patient’s status and disease progression over treatment. For these reasons, it can be interpreted as a proxy for patient’s quality of life, fundamental aspect of the cancer patient care. By assuming the existence of a latent Markov chain for LOTox, the proposed approach aimed at identifying different latent states of overall toxicity burden (i.e., the LOTox states) and investigating how patients move between states during chemotherapy treatment. This is important in light of the need to develop tools to support clinical decisions in tailored interventions for effective management of adverse symptoms and treatments.
The proposed taxonomy based on LM models identified subpopulations of patients characterized by a common distribution of toxic categories, and by a similar overall toxicity burden. Four LOTox states were found, representing (i) people in quite good conditions (no LOTox state 1), (ii) patients undergoing only nausea or vomiting  mostly moderate or severe  (low LOTox state 3), (iii) subjects with nonsevere nausea and the possibility to develop drugspecific AEs (moderate LOTox state 2), or (iv) people with multiple severe/moderate generic toxicities (high LOTox state 4). This LM approach estimated the probability of individual changes over time and the initial prevalence of each state, and predicted the LOTox state to which every patient belongs at a given occasion. By reconstructing the personalized longitudinal LOTox profiles, which represented the probability over time for a patient of being in a certain LOTox state, the dynamic evolution of his/her overall toxic risk during treatment was assessed.
The approach proposed here represents a novelty for osteosarcoma treatment. It allowed to summarise and quantify patient’s overall toxic risk during treatment, providing new insights for childhood cancer therapy. The latent characteristic may be considered as a proxy for patient’s qualityoflife and can be used to (i) identify subpopulations representing different levels of multiple AEs severity at a given cycle, (ii) describe patient’s response to therapy over the entire treatment through the longitudinal LOTox profiles, and (iii) support clinical decisions, trying to reduce the impact of therapies in terms of toxic AEs.
This retrospective exploratory analysis has of course some limitations. The procedure used to select the final model may miss the best available one, since not all possible models have been fitted. However, it is computationally efficient and follows a standard stepwise forward selection approach. The analysis was performed on a single RCT in osteosarcoma, considering only nonhaematological toxicities. Other factors of potential interest were not routinely recorded during the trial, including among others nephrotoxicity, lymphocytes count or tumour size. To get more information about the robustness of the model developed in this study, it should be applied to other osteosarcoma data provided that the toxicity are longitudinally recorded.
On the other hand, this LM procedure can be adapted and applied to other cancer studies. Provided that toxicities are recorded according to the CTCAE scale or an analogous grading system, the LM approach represents a general and flexible method to quantify the personal evolution of overall toxic risk during chemotherapy. Moreover, this work opens doors to further researches, both in the field of statistical methodology development as well as in cancer research. Patients can be stratified in different risk groups to be used during treatment, based on their different LOTox dynamics. Association between longitudinal LOTox profiles and survival outcomes will provide new insights in the treatment effect during the evolution of the disease. The association with survival outcomes will be investigated in future work.
The novel taxonomy based on LM model for longitudinal toxicity data proposed in this study gives a new detailed insight into the chemotherapyinduced AEs with respect to traditional approaches. This procedure takes into account the dynamic nature of the overall toxic risk, showing that longitudinal methods should be considered in future analyses of cancer trials. Provided that longitudinal data are available from drug administrations, this flexible procedure can be adapted to other cancer studies. This study shows that working in this direction is a difficult but profitable approach, which in cooperation with medical staff could lead to improvements in the definition of useful tools for healthcare assessment and treatment planning.
Acknowledgements
The authors thank Medical Research Council in London for sharing the dataset used in this work.
Data Availability Statement
Data are not publicly available due to privacy restrictions. Access to the full dataset of MRC BO06 trial can be requested to MRC Clinical Trials Unit at UCL, Institute of Clinical Trials and Methodology, UCL, London.
References
 Akaike (1973) Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood Principle. In Petrov, B. N. and Csaki, F., editors, Proceedings of the 2nd International Symposium on Information Theory, pages 267–281. Budapest: Akademiai Kiado.
 Almalky et al. (2020) Almalky, H. S., Harthi, S. E. A., and Osman, A.M. M. (2020). Major obstacles to doxorubicin therapy: Cardiotoxicity and drug resistance. Journal of Oncology Pharmacy Practice, 26(2):434–444.
 Anninga et al. (2011) Anninga, J. K., Gelderblom, H., Fiocco, M., Kroep, J. R., Taminiau, A. H., Hogendoorn, P. C., and Egeler, R. M. (2011). Chemotherapeutic adjuvant treatment for osteosarcoma: Where do we stand? European Journal of Cancer, 47(16):2431–2445.
 Bacci et al. (2014) Bacci, S., Pandolfi, S., and Pennoni, F. (2014). A Comparison of Some Criteria for States Selection in the Latent Markov Model for Longitudinal Data. Advances in Data Analysis and Classification, 8(2):125–145.
 Bartolucci et al. (2013) Bartolucci, F., Farcomeni, A., and Pennoni, F. (2013). Latent Markov Models for Longitudinal Data. Chapman & Hall/CRC, Boca Raton.
 Bartolucci et al. (2014) Bartolucci, F., Farcomeni, A., and Pennoni, F. (2014). Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates. TEST, 23:433–465.
 Bartolucci et al. (2009) Bartolucci, F., Lupparelli, M., and Montanari, G. E. (2009). Latent Markov model for longitudinal binary data: An application to the performance evaluation of nursing homes. The Annals of Applied Statistics, 3(2):611 – 636.
 Bartolucci et al. (2015) Bartolucci, F., Montanari, G. E., and Pandolfi, S. (2015). Threestep estimation of latent Markov models with covariates. Computational Statistics & Data Analysis, 83:287–301.
 Bartolucci et al. (2017) Bartolucci, F., Pandolfi, S., and Pennoni, F. (2017). LMest: An R Package for Latent Markov Models for Longitudinal Categorical Data. Journal of Statistical Software, 81(4):1–38.
 Bekele and Thall (2004) Bekele, B. N. and Thall, P. F. (2004). DoseFinding Based on Multiple Toxicities in a Soft Tissue Sarcoma Trial. Journal of the American Statistical Association, 99(465):26–35.
 Carbini et al. (2018) Carbini, M., SuárezFariñas, M., and Maki, R. G. (2018). A Method to Summarize Toxicity in Cancer Randomized Clinical Trials. Clinical Cancer Research.
 Collins and Lanza (2010) Collins, L. M. and Lanza, S. (2010). Latent Class and Latent Transition Analysis: With Applications in the Social, Behavioral, and Health Sciences. John Wiley and Sons Inc.
 dos Santos et al. (2020) dos Santos, N. A. G., Ferreira, R. S., and dos Santos, A. C. (2020). Overview of cisplatininduced neurotoxicity and ototoxicity, and the protective agents. Food and Chemical Toxicology, 136:111079.
 Fitzmaurice et al. (2009) Fitzmaurice, G., Davidian, M., Verbeke, G., and Molenberghs, G. (2009). Longitudinal data analysis. Chapman & Hall, CRC, London.
 Hirakawa et al. (2019) Hirakawa, A., Sudo, K., Yonemori, K., Sadachi, R., Kinoshita, F., Kobayashi, Y., Okuma, H. S., Kawachi, A., Tamura, K., Fujiwara, Y., Rubinstein, L., and Takebe, N. (2019). A Comparative Study of Longitudinal Toxicities of Cytotoxic Drugs, Molecularly Targeted Agents, Immunomodulatory Drugs, and Cancer Vaccines. Clinical Pharmacology & Therapeutics, 106(4):803–809.
 Lancia et al. (2019) Lancia, C., Spitoni, C., Anninga, J., Whelan, J., Sydes, M. R., Jovic, G., and Fiocco, M. (2019). Marginal structural models with dosedelay jointexposure for assessing variations to chemotherapy intensity. Statistical Methods in Medical Research, 28(9):2787–2801. PMID: 29916309.
 Lee et al. (2012) Lee, S., Hershman, D., Martin, P., Leonard, J., and Cheung, Y. (2012). Toxicity burden score: a novel approach to summarize multiple toxic effects. Annals of Oncology, 23(2):537–541.
 Lewis et al. (2007) Lewis, I., Nooij, M. A., Whelan, J., Sydes, M. R., Grimer, R., Hogendoorn, P. C. W., Memon, M. A., Weeden, S., Uscinska, B. M., van Glabbeke, M., Kirkpatrick, A., Hauben, E. I., Craft, A. W., Taminiau, A. H. M., and On behalf of MRC BO06 and EORTC 80931 collaborators and European Osteosarcoma Intergroup (2007). Improvement in Histologic Response But Not Survival in Osteosarcoma Patients Treated With Intensified Chemotherapy: A Randomized Phase III Trial of the European Osteosarcoma Intergroup. JNCI: Journal of the National Cancer Institute, 99(2):112–128.
 McTiernan et al. (2012) McTiernan, A., Jinks, R. C., Sydes, M. R., Uscinska, B., Hook, J. M., van Glabbeke, M., Bramwell, V., Lewis, I. J., Taminiau, A. H., Nooij, M. A., Hogendoorn, P. C., Gelderblom, H., and Whelan, J. S. (2012). Presence of chemotherapyinduced toxicity predicts improved survival in patients with localised extremity osteosarcoma treated with doxorubicin and cisplatin: A report from the European Osteosarcoma Intergroup. European Journal of Cancer, 48(5):703–712.
 R Core Team (2020) R Core Team (2020). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
 Ritter and Bielack (2010) Ritter, J. and Bielack, S. (2010). Osteosarcoma. Annals of Oncology, 21(suppl 7):vii320–vii325.
 Rogatko et al. (2004) Rogatko, A., Babb, J. S., Wang, H., Slifker, M. J., and Hudes, G. R. (2004). Patient Characteristics Compete with Dose as Predictors of Acute Treatment Toxicity in Early Phase Clinical Trials. Clinical Cancer Research, 10(14):4645–4651.
 Schwarz (1978) Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics, 6(2):461–464.
 Sivendran et al. (2014) Sivendran, S., Latif, A., McBride, R. B., Stensland, K. D., Wisnivesky, J., Haines, L., Oh, W. K., and Galsky, M. D. (2014). Adverse Event Reporting in Cancer Clinical Trial Publications. Journal of Clinical Oncology, 32(2):83–89.
 Smeland et al. (2019) Smeland, S., Bielack, S. S., Whelan, J., Bernstein, M., Hogendoorn, P., Krailo, M. D., Gorlick, R., Janeway, K. A., Ingleby, F. C., Anninga, J., Antal, I., Arndt, C., Brown, K., ButterfassBahloul, T., Calaminus, G., Capra, M., Dhooge, C., Eriksson, M., Flanagan, A. M., Friedel, G., …, and Marina, N. (2019). Survival and prognosis with osteosarcoma: outcomes in more than 2000 patients in the EURAMOS1 (European and American Osteosarcoma Study) cohort. European Journal of Cancer, 109:36–50.
 Souhami et al. (1997) Souhami, R. L., Craft, A. W., Van der Eijken, J. W., Nooij, M., Spooner, D., Bramwell, V. H., Wierzbicki, R., Malcolm, A. J., Kirkpatrick, A., Uscinska, B. M., Van Glabbeke, M., and Machin, D. (1997). Randomised trial of two regimens of chemotherapy in operable osteosarcoma: a study of the European Osteosarcoma Intergroup. The Lancet, 350(9082):911–917.
 Thanarajasingam et al. (2016) Thanarajasingam, G., Atherton, P. J., Novotny, P. J., Loprinzi, C. L., Sloan, J. A., and Grothey, A. (2016). Longitudinal adverse event assessment in oncology clinical trials: the Toxicity over Time (ToxT) analysis of Alliance trials NCCTG N9741 and 979254. The Lancet Oncology, 17(5):663–670.
 Thanarajasingam et al. (2015) Thanarajasingam, G., Hubbard, J. M., Sloan, J. A., and Grothey, A. (2015). The Imperative for a New Approach to Toxicity Analysis in Oncology Clinical Trials. JNCI: Journal of the National Cancer Institute, 107(10). djv216.
 Thanarajasingam et al. (2020) Thanarajasingam, G., Leonard, J. P., Witzig, T. E., Habermann, T. M., Blum, K. A., Bartlett, N. L., Flowers, C. R., Pitcher, B. N., Jung, S.H., Atherton, P. J., Tan, A., Novotny, P. J., and Dueck, A. C. (2020). Longitudinal Toxicity over Time (ToxT) analysis to evaluate tolerability: a case study of lenalidomide in the CALGB 50401 (Alliance) trial. The Lancet Haematology, 7(6):e490–e497.
 Trotti et al. (2003) Trotti, A., Colevas, A., Setser, A., Rusch, V., Jaques, D., Budach, V., Langer, C., Murphy, B., Cumberlin, R., Coleman, C., and Rubin, P. (2003). CTCAE v3.0: development of a comprehensive grading system for the adverse effects of cancer treatment. Seminars in Radiation Oncology, 13(3):176–181.
 Trotti et al. (2007) Trotti, A., Pajak, T. F., Gwede, C. K., Paulus, R., Cooper, J., Forastiere, A., Ridge, J. A., WatkinsBruner, D., Garden, A. S., Ang, K. K., and Curran, W. (2007). TAME: development of a new method for summarising adverse events of cancer treatment by the Radiation Therapy Oncology Group. The Lancet Oncology, 8(7):613–624.
 U.S. Department of Health and Human Services (2006) U.S. Department of Health and Human Services (2006). Common Terminology Criteria for Adverse Events v3.0 (CTCAE). https://www.eortc.be/services/doc/ctc/ctcaev3.pdf.
 Wiggins (1973) Wiggins, L. (1973). Panel analysis: latent probability models for attitude and behaviour processes. Elsevier, Amsterdam.
 Zhang et al. (2016) Zhang, S., Chen, Q., and Wang, Q. (2016). The use of and adherence to CTCAE v3.0 in cancer clinical trial publications. Oncotarget, 7(40):65577–65588.
Appendix A MRC BO06/EORTC 80931 RCT protocol
Data from the MRC BO06/EORTC 80931 Randomized Controlled Trial (RCT) for patients with nonmetastatic highgrade osteosarcoma recruited between 1993 and 2002 were analysed (Lewis et al., 2007). The trial randomised patients between conventional treatment with doxorubicin (DOX) and cisplatin (CDDP) given every 3 weeks (RegC) versus a doseintense regimen of the same two drugs given every 2 weeks (RegDI), supported by granulocyte colonystimulating factor. Chemotherapy was administered for six cycles (a cycle is a period of either 2 or 3 weeks depending on the allocated regimen), before and after surgical removal of the primary osteosarcoma. In both arms, DOX (75 mg/m^{2}) plus CDDP (100 mg/m^{2}) were given over six cycles. Surgery to remove the primary tumour was scheduled at week 6 after starting treatment in both arms, that is, after 2 cycles (2 [DOX+CDDP]) in regimenC and after 3 cycles (3 [DOX+CDDP]) in regimenDI. Postoperative chemotherapy was intended to resume 3 weeks after surgery in both arms. Figure A.1 shows the trial design. Laboratory tests were usually performed before each cycle of chemotherapy (in some cases also during and after the cycle) in order to monitor patient’s health status and the development of toxicities or adverse events. Nonhaematological chemotherapyinduced toxicity for nausea/vomiting, infection, oral mucositis, cardiac toxicity, ototoxicity and neurological toxicity were graded according to the Common Terminology Criteria for Adverse Events Version 3 (CTCAE v3.0) by U.S. Department of Health and Human Services (2006), with grades ranging from 0 (none) to 4 (lifethreatening), as shown in Table A.1. Delays or chemotherapy dose reductions during treatment were possible in case of toxicity. Additional details can be found in the primary analysis of the trial by Lewis et al. (2007).
Toxicity  Grade 0  Grade 1  Grade 2  Grade 3  Grade 4 

Nausea/Vomiting  None  Nausea  Transient vomiting  Continuative vomiting  Intractable vomiting 
Infection  None  Minor infection  Moderate infection  Major infection  Major infection with hypotension 
Oral Mucositis  No change  Soreness or erythema  Ulcers: can eat solid  Ulcers: liquid diet only  Alimentation not possible 
Cardiac toxicity  No change  Sinus tachycardia  Unifocal PVC arrhythmia  Multifocal PVC  Ventricular tachycardia 
Ototoxicity  No change  Slight hearing loss  Moderate hearing loss  Major hearing loss  Complete hearing loss 
Neurological toxicity  None  Paraesthesia  Severe paraesthesia  Intolerable paraesthesia  Paralysis 
Comments
There are no comments yet.