Generative AI and the Digital Commons

03/20/2023
by   Saffron Huang, et al.
0

Many generative foundation models (or GFMs) are trained on publicly available data and use public infrastructure, but 1) may degrade the "digital commons" that they depend on, and 2) do not have processes in place to return value captured to data producers and stakeholders. Existing conceptions of data rights and protection (focusing largely on individually-owned data and associated privacy concerns) and copyright or licensing-based models offer some instructive priors, but are ill-suited for the issues that may arise from models trained on commons-based data. We outline the risks posed by GFMs and why they are relevant to the digital commons, and propose numerous governance-based solutions that include investments in standardized dataset/model disclosure and other kinds of transparency when it comes to generative models' training and capabilities, consortia-based funding for monitoring/standards/auditing organizations, requirements or norms for GFM companies to contribute high quality data to the commons, and structures for shared ownership based on individual or community provision of fine-tuning data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2023

VerifAI: Verified Generative AI

Generative AI has made significant strides, yet concerns about the accur...
research
12/03/2020

Digital Landscape of COVID-19 Testing: Challenges and Opportunities

The COVID-19 Pandemic has left a devastating trail all over the world, i...
research
07/07/2023

AI and the EU Digital Markets Act: Addressing the Risks of Bigness in Generative AI

As AI technology advances rapidly, concerns over the risks of bigness in...
research
03/06/2023

Data Portraits: Recording Foundation Model Training Data

Foundation models are trained on increasingly immense and opaque dataset...
research
08/08/2023

SimplyRetrieve: A Private and Lightweight Retrieval-Centric Generative AI Tool

Large Language Model (LLM) based Generative AI systems have seen signifi...
research
02/11/2023

CatAlyst: Domain-Extensible Intervention for Preventing Task Procrastination Using Large Generative Models

CatAlyst uses generative models to help workers' progress by influencing...
research
03/29/2023

RetClean: Retrieval-Based Data Cleaning Using Foundation Models and Data Lakes

Can foundation models (such as ChatGPT) clean your data? In this proposa...

Please sign up or login with your details

Forgot password? Click here to reset