Humans learn by reading books, watching videos, and experiencing the world, often using copyrighted material like textbooks or movies. This learning is generally not considered copyright infringement, and is known as “fair use,” as it involves personal absorption rather than copying or distributing.
“Fair use” principles and law come into play if humans create new works. It is not the ideas and concepts that are protected, only their form of expression. So new music, writing, songs, movies or TV shows might mirror existing works, but cannot “copy” them.
The issue for AI training is that AI systems, particularly machine learning models, learn by training on large data sets, which may include copyrighted content that is copied. One early court case not directly involving generative AI suggests the systems do not enjoy “fair use” protection.
Fair use is a legal doctrine under U.S. copyright law that permits limited use of copyrighted material without permission, for purposes such as criticism, comment, news reporting, teaching, scholarship, or research.
A student reading a textbook or watching a documentary is not typically seen as infringing copyright, as the act of learning is personal and does not involve making physical copies. But that’s where computers and models, with their efficient “memory,” raise issues.
We might argue that human memory is porous enough that “copies” of content are never made, with the possible exception of those humans with “photographic memory.” Computers, obviously, suffer no similar issues.
So human learning is a mental process. “Plagarism” is the obvious example of a fair use violation, as it represents a purportedly new creation that really is copying.
Proponents argue that AI training is transformative, as the model learns patterns to generate new content, not to reproduce the original works.
Opponents argue that AI-generated content competes with originals. But that does not inherently strike some of us as a copyright violation, “merely” a case of new competition.
Aspect | Human Learning | AI Training |
Method of Access | Reading, listening, observing | Copying data into memory/storage |
Copying Involved | No physical copies, mental absorption | Yes, physical copies for processing |
Purpose | Personal learning, education | Model training, often commercial |
Fair Use Application | Relevant for new creations, e.g., quoting | Debated for training process itself |
Market Impact | Minimal, unless new work competes | Potential, if AI output competes with originals |
Legal Precedent | Generally accepted, no infringement | Ongoing lawsuits, no clear consensus |
Computer efficiency is among the issues, since an AI model can be trained on millions of books in hours, far surpassing human capacity. Since copyright is about commercial product protection, language models therefore raise the issue of market impact. It is not so much that humans or AI models “learn” but that they can create new content that has commercial implications.
The commercial concern seems to center on the potential increase in content competition, not so much the knowledge ingestion. That is essentially what underlies the concern about huge amounts of AI-created content “drowning out” human authors.
As often happens, the conflict is between legacy interests and innovators whose new products could disrupt existing economic models. Such conflicts are common when disruptive technologies emerge.
Industry Affected | Disruptive Innovation | Legacy Industry Concerns | Outcome |
Music Industry (2000s) | Digital music streaming and MP3 sharing (Napster, Spotify) | Loss of album sales, piracy concerns | Industry shifted to streaming models, with revenue-sharing for artists and labels |
Publishing and Journalism | Google Search and News Aggregators | Decline in ad revenue, loss of control over content distribution | Publishers adapted with paywalls, licensing deals |
TV and Film Industry | Online video streaming (Netflix, YouTube) | Cord-cutting reduced traditional TV revenue | Studios launched their own streaming services (Disney+, HBO Max) |
Taxis and Transportation | Ride-sharing apps (Uber, Lyft) | Regulation circumvention, lost driver income | Ride-sharing became mainstream; regulations updated over time |
Retail (Brick-and-Mortar Stores) | E-commerce (Amazon, Shopify) | Store closures, price undercutting | Traditional retailers shifted online or hybrid models |
Finance and Banking | Cryptocurrencies, Fintech (DeFi, PayPal, Square) | Loss of control over transactions, regulatory concerns | Banks embraced fintech partnerships, crypto regulations emerged |
Photography and Film | Digital cameras and smartphones | Film sales collapsed, Kodak and Fujifilm disrupted | Kodak filed for bankruptcy; digital photography dominated |
Telecom (Landlines and SMS) | VoIP, Messaging apps (Skype, WhatsApp) | Decline in SMS and landline revenue | Telcos adapted by offering data-driven pricing models |
AI and Content Creation | Generative AI (ChatGPT, Midjourney) | Copyright concerns, job displacement fears | Legal battles ongoing; potential for licensing frameworks |
Fair use of content “scraped” by AI models is another example of a clash of perceived business interests.