Can We Achieve Fairness Using Semi-Supervised Learning?

11/03/2021
by   Joymallya Chakraborty, et al.
0

Ethical bias in machine learning models has become a matter of concern in the software engineering community. Most of the prior software engineering works concentrated on finding ethical bias in models rather than fixing it. After finding bias, the next step is mitigation. Prior researchers mainly tried to use supervised approaches to achieve fairness. However, in the real world, getting data with trustworthy ground truth is challenging and also ground truth can contain human bias. Semi-supervised learning is a machine learning technique where, incrementally, labeled data is used to generate pseudo-labels for the rest of the data (and then all that data is used for model training). In this work, we apply four popular semi-supervised techniques as pseudo-labelers to create fair classification models. Our framework, Fair-SSL, takes a very small amount (10 pseudo-labels for the unlabeled data. We then synthetically generate new data points to balance the training data based on class and protected attribute as proposed by Chakraborty et al. in FSE 2021. Finally, the classification model is trained on the balanced pseudo-labeled data and validated on test data. After experimenting on ten datasets and three learners, we find that Fair-SSL achieves similar performance as three state-of-the-art bias mitigation algorithms. That said, the clear advantage of Fair-SSL is that it requires only 10 first SE work where semi-supervised techniques are used to fight against ethical bias in SE ML models.

READ FULL TEXT

page 9

page 10

research
03/23/2020

Fairway: A Way to Build Fair ML Software

Machine learning software is increasingly being used to make decisions t...
research
03/23/2020

Fairway: SE Principles for Building Fairer Software

Machine learning software is increasingly being used to make decisions t...
research
09/25/2020

Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce Discrimination

A growing specter in the rise of machine learning is whether the decisio...
research
02/03/2023

Less, but Stronger: On the Value of Strong Heuristics in Semi-supervised Learning for Software Analytics

In many domains, there are many examples and far fewer labels for those ...
research
06/18/2012

Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching

In real-world classification problems, the class balance in the training...
research
05/25/2021

Bias in Machine Learning Software: Why? How? What to do?

Increasingly, software is making autonomous decisions in case of crimina...
research
10/06/2021

SNEAK: Faster Interactive Search-based Software Engineering (using Semi-Supervised Learning)

When reasoning over complex models, AI tools can generate too many solut...

Please sign up or login with your details

Forgot password? Click here to reset