Partial win for OpenAI in dispute with high-profile authors

A district court judge has partially granted OpenAI’s motion to dismiss claims brought against it by two class action lawsuits filed by authors, including writer and performer Sarah Silverman.

On Monday (12 February) Judge Araceli Martínez-Olguín in the Northern District of California struck down five claims brought against OpenAI, but the unfair competition portion of the claim was upheld and can proceed.

The authors, who also include writers Christopher Golden and Richard Kadrey, allege that much of the material in the ChatGPT creator’s training datasets comes from copyrighted works “including books written by plaintiffs, that were copied by OpenAI without consent, without credit and without compensation”.

These language models are “trained” by inputting large amounts of text known as the “training dataset”. The language models copy text from the training dataset and extract “expressive information”.

The judge dismissed the vicarious copyright infringement claim, with leave to amend, because the authors failed to prove “substantial similarity” between their copyrighted works and what ChatGPT’s output produced. In their filing, the plaintiffs had argued that because the defendants directly copied the copyrighted books to train the language models, “plaintiffs need not show substantial similarity”. However, the district judge said that “because they fail to allege direct copying, they must show a substantial similarity between the outputs and the copyrighted materials”.

Plaintiffs alleged that defendants removed copyright management information (CMI) from the copyrighted books used during the training process. “However, plaintiffs provide no facts supporting this assertion,” said the judge.

Indeed, she continued, the complaints include excerpts of ChatGPT outputs that include multiple references to plaintiffs’ names, suggesting that OpenAI did not remove all references to “the name of the author”.

The other claims involved OpenAI’s alleged failure to state which internet books it uses to train ChatGPT, which the plaintiffs alleged shows that it knowingly enabled infringement as ChatGPT users will not know if any output is infringing. However, the judge said: “Plaintiffs do not point to any caselaw to suggest that failure to reveal such information has any bearing on whether the alleged removal of CMI in an internal database will knowingly enable infringement.”

The plaintiffs also alleged that OpenAI created derivative works – ChatGPT outputs – and distributed those outputs without the CMI included. The claims were also dismissed by the court, as the authors had not alleged that OpenAI distributed their books or copies of their books. Instead, they have alleged that “every output from the OpenAI Language Models is an infringing derivative work” without providing any indication as to what such outputs entail – for example, whether they are the copyrighted books or copies of the books. “That is insufficient to support this cause of action under the DMCA,” the judge said.

However, the unfair competition (UCL) portion of the claim was allowed to proceed. Assuming the truth of plaintiffs’ allegations – that the defendants used the plaintiffs’ copyrighted works to train their language models for commercial profit – the court concluded that the “defendants’ conduct may constitute an unfair practice”.

Jennifer Mauri, a senior associate specialising in IP, based in Michelman & Robinson’s Orange County office, said it was “clear that anyone seeking to allege direct or vicarious copyright infringement against ChatGPT or other similar AI software must allege more than just a blanket statement that the outputs from the AI is a derivative infringing work”.

She added: “For AI software companies looking to minimise the risk that their AI outputs are infringing copyright, care should be taken to limit the AI output from being too similar to the copyrighted works. Notably, what the court’s order did not do was let AI/ChatGPT off the hook for using copyrighted works to train the software – and thus, whether AI software companies will potentially face liability for that use is very much an open question.”

In August, OpenAI had sought to dismiss all causes of action except for the claim for direct copyright infringement.

The plaintiffs must file any amendments to the complaint by 13 March in order to continue with the unfair competition claim upheld by the judge.

The judgment comes after Facebook owner Meta managed to partially dismiss a claim of copyright infringement in November, again led by Sarah Silverman and other authors, over Meta’s AI system Llama.

OpenAI is represented by Morrison & Foerster’s Joseph Gratz, Tiffany Cheung and Allyson Bennett and Latham & Watkins’ Andrew Gass, Sarang Damle and Allison Stillman.

The plaintiffs are represented by the law firm Joseph Saveri and Los Angeles lawyer Matthew Butterick, a specialist in AI, copyright and software law.