Robust Identification of Target Genes and Outliers in Triple-negative Breast Cancer Data

07/04/2018
by   Pieter Segaert, et al.
0

Correct classification of breast cancer sub-types is of high importance as it directly affects the therapeutic options. We focus on triple-negative breast cancer (TNBC) which has the worst prognosis among breast cancer types. Using cutting edge methods from the field of robust statistics, we analyze Breast Invasive Carcinoma (BRCA) transcriptomic data publicly available from The Cancer Genome Atlas (TCGA) data portal. Our analysis identifies statistical outliers that may correspond to misdiagnosed patients. Furthermore, it is illustrated that classical statistical methods may fail in the presence of these outliers, prompting the need for robust statistics. Using robust sparse logistic regression we obtain 36 relevant genes, of which ca. 60% have been previously reported as biologically relevant to TNBC, reinforcing the validity of the method. The remaining 14 genes identified are new potential biomarkers for TNBC. Out of these, JAM3, SFT2D2 and PAPSS1 were previously associated to breast tumors or other types of cancer. The relevance of these genes is confirmed by the new DetectDeviatingCells (DDC) outlier detection technique. A comparison of gene networks on the selected genes showed significant differences between TNBC and non-TNBC data. The individual role of FOXA1 in TNBC and non-TNBC, and the strong FOXA1-AGR2 connection in TNBC stand out. Not only will our results contribute to the breast cancer/TNBC understanding and ultimately its management, they also show that robust regression and outlier detection constitute key strategies to cope with high-dimensional clinical data such as omics data.

READ FULL TEXT
research
08/13/2018

BACH: Grand Challenge on Breast Cancer Histology Images

Breast cancer is the most common invasive cancer in women, affecting mor...
research
05/20/2023

Technical outlier detection via convolutional variational autoencoder for the ADMANI breast mammogram dataset

The ADMANI datasets (annotated digital mammograms and associated non-ima...
research
01/22/2021

RaJIVE: Robust Angle Based JIVE for Integrating Noisy Multi-Source Data

With increasing availability of high dimensional, multi-source data, the...
research
11/24/2013

Sparse CCA via Precision Adjusted Iterative Thresholding

Sparse Canonical Correlation Analysis (CCA) has received considerable at...
research
11/28/2022

Graph Neural Networks for Breast Cancer Data Integration

International initiatives such as METABRIC (Molecular Taxonomy of Breast...
research
06/05/2019

DOT: Gene-set analysis by combining decorrelated association statistics

Historically, the majority of statistical association methods have been ...
research
07/14/2023

Prediction of breast cancer with 98

Abstract Cancer is a tumor that affects people worldwide, with a higher ...

Please sign up or login with your details

Forgot password? Click here to reset