Revealing patterns in HIV viral load data and classifying patients via a novel machine learning cluster summarization method
HIV RNA viral load (VL) is an important outcome variable in studies of HIV infected persons. There exists only a handful of methods which classify patients by viral load patterns. Most methods place limits on the use of viral load measurements, are often specific to a particular study design, and do not account for complex, temporal variation. To address this issue, we propose a set of four unambiguous computable characteristics (features) of time-varying HIV viral load patterns, along with a novel centroid-based classification algorithm, which we use to classify a population of 1,576 HIV positive clinic patients into one of five different viral load patterns (clusters) often found in the literature: durably suppressed viral load (DSVL), sustained low viral load (SLVL), sustained high viral load (SHVL), high viral load suppression (HVLS), and rebounding viral load (RVL). The centroid algorithm summarizes these clusters in terms of their centroids and radii. We show that this allows new viral load patterns to be assigned pattern membership based on the distance from the centroid relative to its radius, which we term radial normalization classification. This method has the benefit of providing an objective and quantitative method to assign viral load pattern membership with a concise and interpretable model that aids clinical decision making. This method also facilitates meta-analyses by providing computably distinct HIV categories. Finally we propose that this novel centroid algorithm could also be useful in the areas of cluster comparison for outcomes research and data reduction in machine learning.
READ FULL TEXT