Friday, July 14, 2023

What is Generative AI "Expression of an Idea" and What is an "Idea?" It Will Matter Greatly

The debate over copyright as it applies to generative AI has just begun, but already we can glimpse the ways the arguments will have to be made. “Ideas” cannot be copyrighted, but their expression can be protected. So the issue is the extent to which “expression” and “idea” are opposed. Is an AI-generated bit of content a new “expression” of a generic idea, or infringement as a “derivative” work? 


It is going to be difficult to tell the difference. 


The conflict between "ownership of ideas" versus "expressions" of ideas arises because copyright law only protects the expression of ideas, not the ideas themselves. This means that anyone is free to use the same ideas as another person, as long as they do not copy the expression of those ideas.


A company might develop a new software program, but then another company releases a similar software program that does the same thing. The first company might argue that the second company has infringed their copyright, but the second company might argue that they are simply using the same idea as the first company, and that they have not copied the expression of that idea.


So consider training a generative AI program. When an AI system uses a copyrighted work to train, it is using the work's ideas, but it is also expressing those ideas in a new way. Or is it “copying the expression?”


“Fair use” might also come into play. “Fair use”  is a legal doctrine that allows for the use of copyrighted works without permission in certain limited circumstances. In the case of search engine results, courts have held that the fair use doctrine allows search engines to use snippets of copyrighted works to provide summaries of those works. It is possible that the fair use doctrine could also be relevant in the case of AI training.


The "merger doctrine" holds that copyright protection does not extend to ideas that are inseparable from their expression. For example, if there is only one way to express an idea, then that idea is considered to be merged with its expression and is not protected by copyright. 


For example, the merger doctrine does not apply to data structures, because there are many different ways to organize data, and the choice of data structure is often a creative decision. In other words, the way that a database is organized can be protected by copyright, even if the data itself is not.


Application programming interfaces are not covered by the merger doctrine, because there are many different ways to implement an API. For example, the way that a web browser interacts with a web server can be protected by copyright, even if the data that is being transferred is not.


Likewise, user interfaces can be protected by copyright. For example, the way that a word processor displays its menus and toolbars can be protected by copyright, even if the functionality of the word processor is not.


But plot elements in a story cannot be copyrighted. 


The "scenes a faire" doctrine holds that copyright protection does not extend to elements that are standard or common in a particular genre. For example, if a particular plot element is common in all mystery novels, then that plot element is considered to be a scene a faire and is not protected by copyright. 


Courts have held that a phone book publisher's white pages were not protected by copyright because they simply listed facts and were not original works of authorship. Software programs have been found to be protected as they used different expressions of an idea. 


Google's use of Java APIs in its Android operating system has been found to be “fair use,” on the other hand. So much will turn on whether courts see “expression” or “idea” in generative AI output.


No comments:

Have LLMs Hit an Improvement Wall, or Not?

Some might argue it is way too early to worry about a slowdown in large language model performance improvement rates . But some already voic...