Quantifying Human Bias and Knowledge to guide ML models during Training

11/19/2022
by   Hrishikesh Viswanath, et al.
0

This paper discusses a crowdsourcing based method that we designed to quantify the importance of different attributes of a dataset in determining the outcome of a classification problem. This heuristic, provided by humans acts as the initial weight seed for machine learning models and guides the model towards a better optimal during the gradient descent process. Often times when dealing with data, it is not uncommon to deal with skewed datasets, that over represent items of certain classes, while underrepresenting the rest. Skewed datasets may lead to unforeseen issues with models such as learning a biased function or overfitting. Traditional data augmentation techniques in supervised learning include oversampling and training with synthetic data. We introduce an experimental approach to dealing with such unbalanced datasets by including humans in the training process. We ask humans to rank the importance of features of the dataset, and through rank aggregation, determine the initial weight bias for the model. We show that collective human bias can allow ML models to learn insights about the true population instead of the biased sample. In this paper, we use two rank aggregator methods Kemeny Young and the Markov Chain aggregator to quantify human opinion on importance of features. This work mainly tests the effectiveness of human knowledge on binary classification (Popular vs Not-popular) problems on two ML models: Deep Neural Networks and Support Vector Machines. This approach considers humans as weak learners and relies on aggregation to offset individual biases and domain unfamiliarity.

READ FULL TEXT

page 6

page 8

research
10/07/2019

Learning De-biased Representations with Biased Representations

Many machine learning algorithms are trained and evaluated by splitting ...
research
02/15/2023

Feature-Enhanced Network with Hybrid Debiasing Strategies for Unbiased Learning to Rank

Unbiased learning to rank (ULTR) aims to mitigate various biases existin...
research
04/04/2020

Measuring Social Biases of Crowd Workers using Counterfactual Queries

Social biases based on gender, race, etc. have been shown to pollute mac...
research
03/31/2020

A survey of bias in Machine Learning through the prism of Statistical Parity for the Adult Data Set

Applications based on Machine Learning models have now become an indispe...
research
08/14/2021

TRAPDOOR: Repurposing backdoors to detect dataset bias in machine learning-based genomic analysis

Machine Learning (ML) has achieved unprecedented performance in several ...
research
02/25/2023

Mitigating Observation Biases in Crowdsourced Label Aggregation

Crowdsourcing has been widely used to efficiently obtain labeled dataset...
research
10/06/2021

Which Shortcut Cues Will DNNs Choose? A Study from the Parameter-Space Perspective

Deep neural networks (DNNs) often rely on easy-to-learn discriminatory f...

Please sign up or login with your details

Forgot password? Click here to reset