NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python

03/11/2023
by   Ratnadira Widyasari, et al.
0

Machine learning (ML) has gained much attention and been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such a high-quality dataset poses an obstacle in understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2023

Analysis of Software Engineering Practices in General Software and Machine Learning Startups

Context: On top of the inherent challenges startup software companies fa...
research
06/01/2022

Studying the Practices of Deploying Machine Learning Projects on Docker

Docker is a containerization service that allows for convenient deployme...
research
08/04/2022

Development and Validation of ML-DQA – a Machine Learning Data Quality Assurance Framework for Healthcare

The approaches by which the machine learning and clinical research commu...
research
09/20/2022

Comparative analysis of real bugs in open-source Machine Learning projects – A Registered Report

Background: Machine Learning (ML) systems rely on data to make predictio...
research
05/24/2022

Pynblint: a Static Analyzer for Python Jupyter Notebooks

Jupyter Notebook is the tool of choice of many data scientists in the ea...
research
06/22/2020

Leveraging traditional ecological knowledge in ecosystem restoration projects utilizing machine learning

Ecosystem restoration has been recognized to be critical to achieving ac...
research
09/07/2019

A curated Dataset of Microservices-Based Systems

Microservices based architectures are based on a set of modular, indepen...

Please sign up or login with your details

Forgot password? Click here to reset