The Crux of the Matter

  • The recent lawsuits against OpenAI and Meta, filed by renowned comedian and author Sarah Silverman, along with authors Christopher Golden and Richard Kadrey, mark a significant turning point in the conversation around AI and copyright law.
  • These plaintiffs allege that ChatGPT and LLaMA, AI models developed by OpenAI and Meta respectively, were trained on datasets containing their works, which they claim were sourced illegally from "shadow library" websites.
  • The accused websites, such as Bibliotik, Library Genesis, and Z-Library, are known for making copyrighted books accessible in bulk via torrent systems, thereby sidestepping copyright laws.

Does AI Summarize Constitute Infringement?

  • The claimants exhibit that ChatGPT when prompted, can summarize their books, an action they interpret as copyright infringement.
  • For instance, Sarah Silverman's 'Bedwetter', Golden's 'Ararat', and Kadrey's 'Sandman Slim' are all mentioned as examples of such infringement.
  • Yet, one might ask: Is summarization truly a violation of copyright, or is it rather a new form of "fair use" in this digital age?

Unravelling the Meta Controversy

  • The separate lawsuit against Meta highlights the accessibility of the authors' books in datasets used to train Meta's LLaMA models.
  • Meta cites a source for its training datasets, 'ThePile', assembled by EleutherAI.
  • Here lies the problem: 'ThePile' is characterized by EleutherAI as having been compiled from the content of the Bibliotik private tracker, one of the aforementioned "shadow libraries".

Probing the Core Issues

  • Central to both lawsuits is the assertion that neither OpenAI nor Meta had consent to use the authors' copyrighted works as training material for their AI models.
  • As such, the lawsuits allege copyright violation, negligence, unjust enrichment, and unfair competition, amongst other charges.
  • This presents an intriguing quandary: Can copyrighted material used for training AI models, without directly reproducing the work, be considered infringement?

The Broader Picture

  • Joseph Saveri and Matthew Butterick, the attorneys representing the authors, report numerous concerns from other writers, authors, and publishers regarding AI's capacity to generate text akin to copyrighted content.
  • Another AI lawsuit has been filed by Getty Images against Stability AI, asserting that it trained its image generation tool on copyrighted images.
  • Thus, a pattern emerges, suggesting an imminent paradigm shift in how copyright laws apply to AI.

The Root Question

  • These lawsuits pose a vital question: Should AI models be held accountable for copyright infringement if they generate outputs based on copyrighted content?
  • Given AI's increasing prevalence, this question warrants serious consideration.
  • The authors’ lawsuits contain six counts of various types of copyright violations, negligence, unjust enrichment, and unfair competition.
  • It's crucial to comprehend these allegations. Does an AI model using copyrighted material as a training set indeed constitute a copyright violation?
  • How can negligence, unjust enrichment, and unfair competition be legally defined in an AI context?
  • Traditional copyright laws protect the tangible expression of an idea, not the idea itself.
  • In our case, does an AI summarizing or generating text based on copyrighted books infringe on this right?
  • This aspect has yet to be conclusively determined by the courts, marking a grey area in copyright law.

Negligence, Unjust Enrichment, and Unfair Competition

  • The negligence accusation implies that OpenAI and Meta failed in their duty to avoid foreseeable harm, in this case, copyright infringement.
  • Unjust enrichment suggests that AI companies unfairly benefited from copyrighted content. But does using content for model training, without monetizing the content directly, count as unjust enrichment?
  • Unfair competition usually refers to dishonest or fraudulent rivalry in trade and commerce. The question arises whether training AI models on copyrighted material can fall under this definition.

Pioneering Cases Set the Stage

  • Saveri and Butterick have initiated similar lawsuits on behalf of programmers and artists, setting a precedent for AI and copyright issues.
  • Getty Images' lawsuit against Stability AI, alleging copyright infringement through AI image generation, will also play a crucial role in shaping future regulations.
  • Authors Mona Awad and Paul Tremblay's case against a chatbot company mirrors the central issue: the utilization of copyrighted material in AI training.
  • These collective cases point towards an imminent need to redefine copyright laws in the context of AI.
  • These lawsuits, revolving around AI and copyright infringement, bring forward an essential debate in the modern digital age.
  • Current laws may not suffice to address the intricacies of AI and its interaction with copyrighted content.
  • There's a clear and urgent need for our legal system to adapt and evolve in response to the growing capabilities of AI.

Reference

Sarah Silverman is suing OpenAI and Meta for copyright infringement
She says the companies’ chatbots were trained on her book.
Share this post