TRAPDOOR: Repurposing backdoors to detect dataset bias in machine learning-based genomic analysis

08/14/2021
by   Esha Sarkar, et al.
0

Machine Learning (ML) has achieved unprecedented performance in several applications including image, speech, text, and data analysis. Use of ML to understand underlying patterns in gene mutations (genomics) has far-reaching results, not only in overcoming diagnostic pitfalls, but also in designing treatments for life-threatening diseases like cancer. Success and sustainability of ML algorithms depends on the quality and diversity of data collected and used for training. Under-representation of groups (ethnic groups, gender groups, etc.) in such a dataset can lead to inaccurate predictions for certain groups, which can further exacerbate systemic discrimination issues. In this work, we propose TRAPDOOR, a methodology for identification of biased datasets by repurposing a technique that has been mostly proposed for nefarious purposes: Neural network backdoors. We consider a typical collaborative learning setting of the genomics supply chain, where data may come from hospitals, collaborative projects, or research institutes to a central cloud without awareness of bias against a sensitive group. In this context, we develop a methodology to leak potential bias information of the collective data without hampering the genuine performance using ML backdooring catered for genomic applications. Using a real-world cancer dataset, we analyze the dataset with the bias that already existed towards white individuals and also introduced biases in datasets artificially, and our experimental result show that TRAPDOOR can detect the presence of dataset bias with 100 furthermore can also extract the extent of bias by recovering the percentage with a small error.

READ FULL TEXT
research
05/03/2023

Fairness in AI Systems: Mitigating gender bias from language-vision models

Our society is plagued by several biases, including racial biases, caste...
research
06/27/2022

Prisoners of Their Own Devices: How Models Induce Data Bias in Performative Prediction

The unparalleled ability of machine learning algorithms to learn pattern...
research
08/01/2022

Disparate Censorship Undertesting: A Source of Label Bias in Clinical Machine Learning

As machine learning (ML) models gain traction in clinical applications, ...
research
06/08/2023

Shedding light on underrepresentation and Sampling Bias in machine learning

Accurately measuring discrimination is crucial to faithfully assessing f...
research
09/16/2021

Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?

Research in machine learning (ML) has primarily argued that models train...
research
11/19/2022

Quantifying Human Bias and Knowledge to guide ML models during Training

This paper discusses a crowdsourcing based method that we designed to qu...

Please sign up or login with your details

Forgot password? Click here to reset