Foundation Models and Fair Use

03/28/2023
by   Peter Henderson, et al.
0

Existing foundation models are trained on copyrighted material. Deploying these models can pose both legal and ethical risks when data creators fail to receive appropriate attribution or compensation. In the United States and several other countries, copyrighted content may be used to build foundation models without incurring liability due to the fair use doctrine. However, there is a caveat: If the model produces output that is similar to copyrighted data, particularly in scenarios that affect the market of that data, fair use may no longer apply to the output of the model. In this work, we emphasize that fair use is not guaranteed, and additional work may be necessary to keep model development and deployment squarely in the realm of fair use. First, we survey the potential risks of developing and deploying foundation models based on copyrighted content. We review relevant U.S. case law, drawing parallels to existing and potential applications for generating text, source code, and visual art. Experiments confirm that popular foundation models can generate content considerably similar to copyrighted material. Second, we discuss technical mitigations that can help foundation models stay in line with fair use. We argue that more research is needed to align mitigation strategies with the current state of the law. Lastly, we suggest that the law and technical mitigations should co-evolve. For example, coupled with other policy mechanisms, the law could more explicitly consider safe harbors when strong technical tools are used to mitigate infringement harms. This co-evolution may help strike a balance between intellectual property and innovation, which speaks to the original goal of fair use. But we emphasize that the strategies we describe here are not a panacea and more work is needed to develop policies that address the potential harms of foundation models.

READ FULL TEXT

page 6

page 9

page 10

page 11

page 12

page 13

page 14

page 24

research
06/05/2023

LexGPT 0.1: pre-trained GPT-J models with Pile of Law

This research aims to build generative language models specialized for t...
research
05/04/2023

Training Is Everything: Artificial Intelligence, Copyright, and Fair Training

To learn how to behave, the current revolutionary generation of AIs must...
research
07/01/2022

Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset

One concern with the rise of large language models lies with their poten...
research
11/27/2022

Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models

A growing ecosystem of large, open-source foundation models has reduced ...
research
03/22/2023

The Shaky Foundations of Clinical Foundation Models: A Survey of Large Language Models and Foundation Models for EMRs

The successes of foundation models such as ChatGPT and AlphaFold have sp...
research
09/12/2017

Multivariate Density Modeling for Retirement Finance

Prior to the financial crisis mortgage securitization models increased i...
research
07/05/2023

Citation: A Key to Building Responsible and Accountable Large Language Models

Large Language Models (LLMs) bring transformative benefits alongside uni...

Please sign up or login with your details

Forgot password? Click here to reset