DeepAI
Log In Sign Up

Problem Formulation and Fairness

01/08/2019
by   Samir Passi, et al.
0

Formulating data science problems is an uncertain and difficult process. It requires various forms of discretionary work to translate high-level objectives or strategic goals into tractable problems, necessitating, among other things, the identification of appropriate target variables and proxies. While these choices are rarely self-evident, normative assessments of data science projects often take them for granted, even though different translations can raise profoundly different ethical concerns. Whether we consider a data science project fair often has as much to do with the formulation of the problem as any property of the resulting model. Building on six months of ethnographic fieldwork with a corporate data science team---and channeling ideas from sociology and history of science, critical data studies, and early writing on knowledge discovery in databases---we describe the complex set of actors and activities involved in problem formulation. Our research demonstrates that the specification and operationalization of the problem are always negotiated and elastic, and rarely worked out with explicit normative considerations in mind. In so doing, we show that careful accounts of everyday data science work can help us better understand how and why data science problems are posed in certain ways---and why specific formulations prevail in practice, even in the face of what might seem like normatively preferable alternatives. We conclude by discussing the implications of our findings, arguing that effective normative interventions will require attending to the practical work of problem formulation.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/09/2020

Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects

The trustworthiness of data science systems in applied and real-world se...
09/10/2020

Biases in Data Science Lifecycle

In recent years, data science has become an indispensable part of our so...
10/18/2017

Mapping for accessibility: A case study of ethics in data science for social good

Ethics in the emerging world of data science are often discussed through...
05/04/2018

Building Data Science Capabilities into University Data Warehouse to Predict Graduation

The discipline of data science emerged to combine statistical methods wi...
03/05/2017

Doing Things Twice: Strategies to Identify Studies for Targeted Validation

The "reproducibility crisis" has been a highly visible source of scienti...
12/19/2018

Progressive Data Science: Potential and Challenges

Data science requires time-consuming iterative manual activities. In par...
10/12/2020

Towards International Relations Data Science: Mining the CIA World Factbook

This paper presents a three-component work. The first component sets the...