Abstract
This comment to the United States Copyright Office argues that the developers of AI training datasets and the developers and creators of generative AI systems are not and should not be held liable for copyright infringement for the use of copyrighted material in training datasets. It also argues that any claim regarding infringement must follow the required steps of proving the AI developer or end user had access to a registered copyrighted work and, most importantly, proves copying of copyrightable portions of the work through a substantial similarity analysis in a side-by-side comparison of the original copyrighted work with a work produced using the AI system.
The current copyright infringement class actions against the creators of AI training datasets and AI developers appear to want to skip over the required steps of the infringement analysis in order to focus on the most intriguing question, “Could a visual generative AI generate a work that potentially infringes a preexisting copyrighted work?” and then the discussion skips further ahead to, “Would the AI have a fair use defense, most likely under the transformative test?” These are relevant questions, but in isolation from the actual steps of the copyright infringement analysis, the discussion is misleading or even irrelevant. This skipping of topics and stages of the infringement analysis does not train our attention to a properly accused party or entity whose actions prompt the question. The leaping from a question of infringement in the creation of training datasets to the creation of foundation models that draw from the training data to the actual operation of the generative AI system to produce images makes a false equivalency regarding the processes themselves and the persons responsible for them. It suggests magical thinking that because an artist's work was on the internet and because it was scraped along with hundreds of millions of other images, that somehow the generative AI system has infringed on protectible elements of the artist's work when the AI learned from this work and hundreds of millions of other works in the course of training the foundation model of the AI system.
Original language | American English |
---|---|
DOIs | |
State | Published - May 13 2024 |
Keywords
- Artificial Intelligence
- Generative AI
- Copyright
- training
- Large Language Model
- copyrighted works
- LAION
- dataset
- infringement. fair use
- copying
- nonexpressive copying
- copy reliant technology
- Google v. Oracle
- Perfect 10