An investigation of licensing of datasets for machine learning based on the GQM model

03/24/2023
by   Junyu Chen, et al.
0

Dataset licensing is currently an issue in the development of machine learning systems. And in the development of machine learning systems, the most widely used are publicly available datasets. However, since the images in the publicly available dataset are mainly obtained from the Internet, some images are not commercially available. Furthermore, developers of machine learning systems do not often care about the license of the dataset when training machine learning models with it. In summary, the licensing of datasets for machine learning systems is in a state of incompleteness in all aspects at this stage. Our investigation of two collection datasets revealed that most of the current datasets lacked licenses, and the lack of licenses made it impossible to determine the commercial availability of the datasets. Therefore, we decided to take a more scientific and systematic approach to investigate the licensing of datasets and the licensing of machine learning systems that use the dataset to make it easier and more compliant for future developers of machine learning systems.

READ FULL TEXT

page 1

page 6

research
11/28/2019

Computer Systems Have 99 Problems, Let's Not Make Machine Learning Another One

Machine learning techniques are finding many applications in computer sy...
research
09/28/2014

Combining human and machine learning for morphological analysis of galaxy images

The increasing importance of digital sky surveys collecting many million...
research
03/16/2018

Snap Machine Learning

We describe an efficient, scalable machine learning library that enables...
research
02/15/2020

Jelly Bean World: A Testbed for Never-Ending Learning

Machine learning has shown growing success in recent years. However, cur...
research
03/09/2023

StyleDiff: Attribute Comparison Between Unlabeled Datasets in Latent Disentangled Space

One major challenge in machine learning applications is coping with mism...
research
01/18/2023

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Despite the advancement of machine learning techniques in recent years, ...
research
11/23/2022

Can lies be faked? Comparing low-stakes and high-stakes deception video datasets from a Machine Learning perspective

Despite the great impact of lies in human societies and a meager 54 accu...

Please sign up or login with your details

Forgot password? Click here to reset