In what appears to be a first for large language models, Anthropic says its new Commercial Terms of Service will “enable our customers to retain ownership rights over any outputs they generate through their use of our services and protect them from copyright infringement claims.
It is an important move to promote use of large language models without fear of such legal actions, given the nascent state of AI copyright law as it applies to use of LLMs. One might also note it creates a huge new level of uncertainty about the business risks faced by LLMs, as such litigation is virtually certain, over time.
“Under the updated terms, we will defend our customers from any copyright infringement claim made against them for their authorized use of our services or their outputs, and we will pay for any approved settlements or judgments that result,” Anthropic says. “These new terms will be live on January 1, 2024 for Claude API customers and January 2, 2024 for those using Claude through Amazon Bedrock.”
There are other steps LLMs can take to limit the uncertainty associated with copyright risks, such as using robust copyright filters, which help identify and flag potentially infringing content before it's generated or shared with users.
Ensuring transparency and responsible sourcing of training data, with clear mechanisms for identifying and excluding copyrighted material, also can minimize the risk of incorporating infringing elements into LLM outputs.
Establishing partnerships and clear guidelines for collaboration with copyright holders can lead to mutually beneficial licensing agreements and promote fair use of copyrighted material within LLMs, and is an obvious avenue.
Beyond all that, copyright related to LLMs will develop over time based at least in part on prior rulings related to use of content.
Several existing legal precedents offer potential legal avenues for addressing large language model (LLM) copyright issues, one might suggest.
Fair use is an obvious issue, as large language models are trained on huge amounts of existing content. After all, all human knowledge is built on prior work, with only the unique expression of facts being protected by copyright, not the facts themselves.
Campbell v. Acuff-Rose Music, Inc. (1994) established a four-factor test for fair use, considering the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect of the use upon the potential market for or value of the copyrighted work.
Sony Music Entertainment v. Diamond Way Recordings, Inc. (2003) clarified the definition of a derivative work, stating that it must "recapture the essential elements of the original" and create a new work with a different purpose or character.
Also, the case of Sony Corp. of America v. Universal City Studios, Inc. (Betamax VCR case) (1984) said the creation and use of a device solely for the purpose of making fair use copies of copyrighted works does not constitute copyright infringement.
Some might argue no LLM can create copyrighted material. Alfred A. Knopf, Inc. v. Colby (1992) held that an expert system's creative output lacked the requisite human authorship for copyright protection.
Is accumulated human knowledge similar to a database? If so, then some precedents related to databases could apply. Feist Publications, Inc. v. Rural Telephone Service Co., Inc. (1991) ruled on the scope of copyright protection for databases, stating that only the selection and arrangement of facts, not the underlying data itself, is protected.
The European Union's "Text and Data Mining" exception allows certain research institutions to mine copyrighted works for non-commercial purposes without the copyright holder's consent.
Also, open-source licenses like the GNU General Public License (GPL) could be relevant if LLMs are trained on datasets containing open-source materials.
New legal doctrines specific to AI and machine learning, such as the "scene a faire" doctrine and the "merger doctrine" limit copyright protection for elements that are dictated by the functionality or nature of a particular work.
No comments:
Post a Comment