Continuously Updated Data Analysis Systems

07/19/2019
by   Lee F. Richardson, et al.
0

When doing data science, it's important to know what you're building. This paper describes an idealized final product of a data science project, called a Continuously Updated Data-Analysis System (CUDAS). The CUDAS concept synthesizes ideas from a range of successful data science projects, such as Nate Silver's FiveThirtyEight. A CUDAS can be built for any context, such as the state of the economy, the state of the climate, and so on. To demonstrate, we build two CUDAS systems. The first provides continuously-updated ratings for soccer players, based on the newly developed Augmented Adjusted Plus-Minus statistic. The second creates a large dataset of synthetic ecosystems, which is used for agent-based modeling of infectious diseases.

READ FULL TEXT

page 7

page 9

research
08/08/2023

Why Data Science Projects Fail

Data Science is a modern Data Intelligence practice, which is the core o...
research
04/26/2019

Evaluating the Success of a Data Analysis

A fundamental problem in the practice and teaching of data science is ho...
research
04/09/2021

Agile (data) science: a (draft) manifesto

Science has a data management as well as a project management problem. W...
research
02/09/2023

Rehabilitating Homeless: Dataset and Key Insights

This paper presents a large anonymized dataset of homelessness alongside...
research
01/08/2019

Problem Formulation and Fairness

Formulating data science problems is an uncertain and difficult process....
research
10/06/2017

Data science for urban equity: Making gentrification an accessible topic for data scientists, policymakers, and the community

The University of Washington eScience Institute runs an annual Data Scie...
research
12/25/2018

A Variability-Aware Design Approach to the Data Analysis Modeling Process

The massive amount of current data has led to many different forms of da...

Please sign up or login with your details

Forgot password? Click here to reset