Augmented Data Science: Towards Industrialization and Democratization of Data Science

09/12/2019
by   Huseyin Uzunalioglu, et al.
0

Conversion of raw data into insights and knowledge requires substantial amounts of effort from data scientists. Despite breathtaking advances in Machine Learning (ML) and Artificial Intelligence (AI), data scientists still spend the majority of their effort in understanding and then preparing the raw data for ML/AI. The effort is often manual and ad hoc, and requires some level of domain knowledge. The complexity of the effort increases dramatically when data diversity, both in form and context, increases. In this paper, we introduce our solution, Augmented Data Science (ADS), towards addressing this "human bottleneck" in creating value from diverse datasets. ADS is a data-driven approach and relies on statistics and ML to extract insights from any data set in a domain-agnostic way to facilitate the data science process. Key features of ADS are the replacement of rudimentary data exploration and processing steps with automation and the augmentation of data scientist judgment with automatically-generated insights. We present building blocks of our end-to-end solution and provide a case study to exemplify its capabilities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2021

Automating Data Science: Prospects and Challenges

Given the complexity of typical data science projects and the associated...
research
09/05/2019

Human-AI Collaboration in Data Science: Exploring Data Scientists' Perceptions of Automated AI

The rapid advancement of artificial intelligence (AI) is changing our li...
research
03/02/2023

A Vision for Semantically Enriched Data Science

The recent efforts in automation of machine learning or data science has...
research
06/01/2017

One button machine for automating feature engineering in relational databases

Feature engineering is one of the most important and time consuming task...
research
01/07/2020

Vamsa: Tracking Provenance in Data Science Scripts

Machine learning (ML) which was initially adopted for search ranking and...
research
03/21/2022

Telling Stories from Computational Notebooks: AI-Assisted Presentation Slides Creation for Presenting Data Science Work

Creating presentation slides is a critical but time-consuming task for d...
research
05/05/2023

GPT for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering

As the field of automated machine learning (AutoML) advances, it becomes...

Please sign up or login with your details

Forgot password? Click here to reset