This browser is not actively supported anymore. For the best passle experience, we strongly recommend you upgrade your browser.
| 5 minute read

Federal Court Holds that Creator of AI Tool Infringed Copyright in Training Data: Harbinger or Blip?

On February 11, 2025, a federal district court in Delaware held that the developer of an artificial intelligence (AI) tool infringed the copyrights in thousands of works that were used to train the AI, holding that the unlicensed use of those works did not constitute fair use. The case, Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc., represents the first major decision holding that the use of copyrighted works to train AI is unlawful. With dozens of other lawsuits pending against AI companies and potentially billions of dollars in damages at stake, surely this is the beginning of the end for AI developers, right?  

Not so fast. 

First, a bit of background. The lawsuit was filed in December 2020 by Thomson Reuters, owner of the ubiquitous (and, for many lawyers, indispensable) legal research platform Westlaw. Thomson Reuters claimed that Ross Intelligence had unlawfully copied Westlaw's proprietary “headnotes”—summaries of key points of law derived from published court opinions—to build its competing AI-powered search engine.  

The claim focused on Ross’ alleged use of a third-party company, LegalEase, to create so-called “bulk memos” that largely incorporated Westlaw’s headnotes and were in turn used to train Ross’ search tool. Critically, Ross’ tool did not employ generative AI; the tool was created to field legal questions and output already-published court options in response, not AI-generated text.  Those court opinions are government works not protected by copyright. 

As with many other AI companies embroiled in copyright infringement litigation, Ross raised a fair use defense, arguing that it could freely use the Westlaw headnotes to train its AI without a license. The court disagreed, holding on summary judgment that Ross had infringed over 2000 Westlaw headnotes. (The court left for trial the question of whether Ross had infringed thousands of other headnotes, given unresolved factual questions regarding whether the bulk memos actually copied them and whether the copyrights in those headnotes had expired.) 

At first blush, the case would seem to indicate that we should expect a wave of adverse decisions against much larger AI players in other pending copyright infringement cases. But a closer look at the court’s reasoning on the fair use factors suggests that doomsday predictions of the cascading effect of this case on others may be overblown.

Factor 1: Purpose and Character of the Use of the Copyrighted Work 

Following the U.S. Supreme Court’s guidance in Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, the court held that this factor weighed against fair use because Ross’ use of the headnotes was “commercial” in nature and did not have a “further purpose or different character” from Thomson Reuters; rather, Ross had used the headnotes to create a tool that would directly compete with Westlaw. 

Other AI developers will likely have to concede that their use of training data is also “commercial,” given the subscription-based model many of them employ. But developers in other cases may be able to present a stronger argument than Ross on whether their use of training data has a “further purpose or different character.”  Arguably, tools like ChatGPT don’t function as direct competitors to the rightsholders whose data was used to train those tools—at least not in the way that Ross’ tool serves as a direct replacement for Westlaw.  And, even if they do, those other tools (unlike Ross’) largely employ generative AI, which might give their creators a more viable argument that the underlying data is being used for a transformative purpose, namely to create new AI-generated output. The court in Thomson Reuters suggested this could be a key distinguishing factor, stating: “Because the AI landscape is changing rapidly, I note for readers that only non-generative AI is before me today.”

The court was also not swayed by Ross’ argument that its copying occurred at an “intermediate step” in the creation of its final product, i.e., that Ross “turned the headnotes into numerical data about relationships among legal words to feed into its AI.”  Ross had relied on a line of previous cases in which courts permitted the intermediate copying of computer code where necessary to ensure new computer programs were compatible with existing technology—essentially where “the copying was necessary for competitors to innovate.”  But the court held that this rationale applied only to “computer-programming copying cases.”  While this certainly creates a hurdle for other AI defendants, generative AI companies may be better positioned to make this “intermediate copying” argument, on the basis that copying large data sets is necessary to develop and train generative AI—unlike Ross, which could have trained its legal-research tool on other data besides Westlaw headnotes.

Factor 2: Nature of the Copyrighted Work

The court held that because Westlaw’s headnotes are “far from the most creative works,” this factor weighed in favor of fair use—but the court ultimately discounted this factor, noting it “has rarely played a significant role in the determination of a fair use dispute.”  Although many of the other pending AI cases involve the training use of more squarely creative works (literary works, visual art, etc.), which typically enjoy greater protection, it seems unlikely this factor will drive the outcome of future cases. 

Factor 3: Amount and Substantiality of the Portion Used

The court held this factor too weighed in favor of fair use because none of the headnotes appear in outputs to Ross’ end users, concluding that “what matters is . . . the amount and substantiality of what is thereby made accessible to a public.”  But, as with factor 2, the court discounted the importance of this factor in the overall fair use analysis. 

To the extent that other AI tools can demonstrate that their training data does not substantially or recognizably appear in output, those tools may, like Ross, have the better argument on factor 3.  However, plaintiffs in other cases, such as the New York Times, have alleged that AI tools like ChatGPT reproduce training data wholesale, even permitting users to evade rightsholders’ paywalls. If those allegations are substantiated, the court’s decision in Thomson Reuters may deal a blow to other AI defendants. 

Factor 4: Effect of the Use of the Potential Market for or Value of the Copyrighted Work

Deeming this factor to be “the single most important element of fair use,” the court held that it weighed against fair use because Ross’ product was intended to serve as a “market substitute” for Westlaw and could adversely impact a potential derivative market for Thomson Reuters to license its headnotes as AI training data. 

As noted, arguably the major generative AI tools do not serve as market substitutes for the works they were trained on. However, the use of that training data likely does impact a potential licensing market—if not an actual market.  We are already seeing AI companies license significant data sets for training purposes, which only reinforces the notion of market harm to works that are used without a license.

* * *

All in all, the decision presents a mixed bag, with enough to help and harm either side in other pending AI cases. The extent to which other courts will be willing to employ this same approach in deciding the fate of generative AI defendants remains to be seen—as does those courts’ willingness to look to a Delaware federal court for guidance. The overwhelming majority of copyright cases against AI companies are pending in the Second and Ninth Circuits—20 in the Northern District of California and 12 in the Southern District of New York, at the time of this writing.  What is clear, however, is that this decision is far from the death knell that some are predicting.

Tags

entertainment & ip litigation