Humans learn by reading books, watching videos, and experiencing the world, often using copyrighted material like textbooks or movies. This learning is generally not considered copyright infringement, and is known as “fair use,” as it involves personal absorption rather than copying or distributing.
“Fair use” principles and law come into play if humans create new works. It is not the ideas and concepts that are protected, only their form of expression. So new music, writing, songs, movies or TV shows might mirror existing works, but cannot “copy” them.
The issue for AI training is that AI systems, particularly machine learning models, learn by training on large data sets, which may include copyrighted content that is copied. One early court case not directly involving generative AI suggests the systems do not enjoy “fair use” protection.
Fair use is a legal doctrine under U.S. copyright law that permits limited use of copyrighted material without permission, for purposes such as criticism, comment, news reporting, teaching, scholarship, or research.
A student reading a textbook or watching a documentary is not typically seen as infringing copyright, as the act of learning is personal and does not involve making physical copies. But that’s where computers and models, with their efficient “memory,” raise issues.
We might argue that human memory is porous enough that “copies” of content are never made, with the possible exception of those humans with “photographic memory.” Computers, obviously, suffer no similar issues.
So human learning is a mental process. “Plagarism” is the obvious example of a fair use violation, as it represents a purportedly new creation that really is copying.
Proponents argue that AI training is transformative, as the model learns patterns to generate new content, not to reproduce the original works.
Opponents argue that AI-generated content competes with originals. But that does not inherently strike some of us as a copyright violation, “merely” a case of new competition.
Computer efficiency is among the issues, since an AI model can be trained on millions of books in hours, far surpassing human capacity. Since copyright is about commercial product protection, language models therefore raise the issue of market impact. It is not so much that humans or AI models “learn” but that they can create new content that has commercial implications.
The commercial concern seems to center on the potential increase in content competition, not so much the knowledge ingestion. That is essentially what underlies the concern about huge amounts of AI-created content “drowning out” human authors.
As often happens, the conflict is between legacy interests and innovators whose new products could disrupt existing economic models. Such conflicts are common when disruptive technologies emerge.
Fair use of content “scraped” by AI models is another example of a clash of perceived business interests.
No comments:
Post a Comment