Federal Judge Sidney Stein has ruled that The New York Times can move forward with its copyright lawsuit against OpenAI, rejecting the AI company’s request to dismiss the case.
Although it restricted some of the suit’s scope, Judge Stein allowed the central copyright infringement allegations to proceed, vowing a full opinion would be forthcoming “expeditiously.”
The case is founded on allegations that OpenAI plagiarized The New York Times’ work for free or without authorization to train ChatGPT, the widely used AI chatbot.
The Times and other publishers like The New York Daily News and the Center for Investigative Reporting argue that OpenAI illegally harvested massive amounts of their content from the web.
Steven Lieberman, the attorney for the news publishers, welcomed the decision, stating, “We welcome the opportunity to demonstrate to a jury the facts about how OpenAI and Microsoft are wildly profiting from plagiarizing the original work of newspapers across the nation.”
The New York Times believes its material is among the largest sources of copyrighted content used to build ChatGPT.
Its attorneys believe such utilization constitutes a violation of copyright and threatens the news business model by potentially reducing direct traffic to news websites.
OpenAI spokesperson Jason Deutrom responded that the company “looks forward to making it clear that we construct our AI models from publicly available data, rooted in fair use, and empowering innovation.”
OpenAI claims its data collection falls under the “fair use” doctrine, which allows for limited use of copyrighted work without a license for research and commentary.
With the judge’s ruling, the case will proceed to trial, although no trial date has been set. Litigation will include private evidence gathering, executive depositions, and public pretrial hearings in an attempt to settle evidentiary disputes and other issues.
This court battle has profound implications for journalism and the creation of artificial intelligence. Publishers worry that AI chatbots that can summarize news stories would siphon off website traffic and ad revenue that many news outlets use to stay afloat.
The AI industry has typically proceeded on the assumption that transforming web content into chatbot responses is a protected activity.
But the law is still in doubt. Already, courts have determined that fair use has to be “transformative” or at least comment on the original material. The Times contends that OpenAI’s reproduction of their reporting fails to meet these standards when ChatGPT merely replicates their reporting.
A second central legal question is “market substitution” – whether chatbot answers really substitute for reading news sites or whether they serve up alternative markets. If courts find that AI chatbots do substitute for real news reading, this would be helpful to the publishers’ argument.
At a January New York hearing, lawyers for publishers claimed that ChatGPT would replicate Times articles word for word when presented with germane questions.
OpenAI lawyers rebutted that publishers had been manipulating prompts in a manner intended to get the chatbot to replicate content, not characteristic of user interaction or system architecture.
The suit not only sues OpenAI but also Microsoft, which has heavily invested in the AI company. Although the case at hand targets these two firms specifically, the outcome could have an impact on how other AI companies approach content scraping and model training.
The verdict comes at a time when so many content creators, from writers to artists, have been wondering how their content is being used to train AI tools without pay or even credit. The ultimate court verdict may set important precedents in how AI companies train their models and how they are obliged legally to treat content creators in various industries.