Thursday, January 4, 2024

Large Language Model Ingestion of Content is Not Necessarily Copyright Infringement

At least for the moment, I find myself unpersuaded that strong copyright protections some demand from use of large language models are a good idea. To be sure, as a practical matter we should expect some sort of reasonable licensing system to develop. 


Some system will emerge that compensates copyright holders for potential use of their work to train LLMs. So long as the costs are reasonable, there is little danger of stifling LLM progress. 


The big issue is some sort of legal barring of LLM use of copyrighted material for training purposes. After all, all human knowledge builds upon the past, including consumption of copyrighted material.


Copyright law protects expression, not ideas. Copyright protects the specific way an idea is expressed, not the idea itself. So long as an LLM consumes copyrighted material including books or articles, and its outputs differ significantly in form and originality, copyright infringement seems dubious. 


The transformative use doctrine, for example, allows for using copyrighted material in new and creative ways without infringing copyright, as long as it serves a different purpose or adds meaningfully to the original work. 


LLMs that use copyrighted material to generate summaries, translations, or even new creative interpretations could potentially fall under this category, depending on the nature of the transformation.


Fair use exceptions allow for limited use of copyrighted material for purposes like criticism, commentary, or research without permission. LLMs used for training could potentially fall under fair use, depending on the amount and nature of the copyrighted material used.


Just because LLMs are efficient should not necessarily infringe copyright protection, though the issue of derivative works--which can be protected--is an issue.


Potential solutions might include greater transparency and attribution; licensing models and clearer legal frameworks for LLM and copyright. Ideas and knowledge are not protected by copyright. Only the expression is protected. 


That LLMs ingest lots of copyrighted material is similar to the ways humans learn, but with vastly-greater efficiency. Where there are legitimate copyright infringement concerns is not the ingestion, but the production (“expression” of an idea).


No comments: