Non-convex SVM for cancer diagnosis based on morphologic features of tumor microenvironment
The surroundings of a cancerous tumor impact how it grows and develops in humans. New data from early breast cancer patients contains information on the collagen fibers surrounding the tumorous tissue – offering hope of finding additional biomarkers for diagnosis and prognosis – but poses two challenges for typical analysis. Each image section contains information on hundreds of fibers, and each tissue has multiple image sections contributing to a single prediction of tumor vs. non-tumor. This nested relationship of fibers within image spots within tissue samples requires a specialized analysis approach. We devise a novel support vector machine (SVM)-based predictive algorithm for this data structure. By treating the collection of fibers as a probability distribution, we can measure similarities between the collections through a flexible kernel approach. By assuming the relationship of tumor status between image sections and tissue samples, the constructed SVM problem is non-convex and traditional algorithms can not be applied. We propose two algorithms that exchange computational accuracy and efficiency to manage data of all sizes. The predictive performance of both algorithms is evaluated on the collagen fiber data set and additional simulation scenarios. We offer reproducible implementations of both algorithms of this approach in the R package mildsvm.
READ FULL TEXT