Sen. Josh Hawley blasts AI ‘piracy’ at copyright hearing: We have the highlights, video, and full statements

“Training their models on stolen material”: Sen. Josh Hawley led a Senate hearing into the practices of corporate AI developers who use pirated data to train their AI models.

July 17, 2025 — Sen. Josh Hawley (R-Mo.) came out firing at a Congressional hearing yesterday, charging corporate AI developers with what he called “the largest intellectual property theft in American history.”

Hawley chaired a July 16 Senate Judiciary subcommittee hearing about the unchecked piracy of copyrighted content to fuel companies’ artificial intelligence (AI) models. The hearing featured witness testimony from bestselling author David Baldacci as well as AI experts and law professors.

”AI companies are training their models on stolen material, period,” said Hawley. “And we’re not talking about these companies simply scouring the internet for what’s publicly available. We’re talking about piracy.”

“Are we going to protect [Americans’ creative community], or are we going to allow a few megacorporations to vacuum it all up, digest it, and make billions of dollars in profits—maybe trillions—and pay nobody for it? That’s not America,” the Senator said, arguing that the issue at hand is a moral one as much as a legal one.

Key takaways

Yesterday’s witnesses aired a number of eye-opening facts, including:

  • AI models being trained on over 200 terabytes of copyrighted work—or, in other words, billions of pages that would fill approximately 22 Libraries of Congress.

  • Some of the largest global tech companies have pirated this work by illegally downloading it.

  • Corporate AI developers having facilitated other actors’ piracy by illegally uploading more than 50 terabytes of copyrighted works for others’ use.

  • Meta in particular was called out for knowing it was engaging in illegal activity, illustrated by:
            • Employees internally warning each other that Meta’s piracy was illegal—and then brazenly made light of it.
            • Meta concealing its pirating via non-Meta servers, so its criminal acts would not be traced back to the company.

key witness: author david baldacci

David Baldacci, one of America’s best-selling novelists, appeared before the Senate panel to describe his experience of copyright infringement.

David Baldacci, author:

“I felt like someone had backed up a truck to my imagination and stolen everything I’d created.”

Select image for Balcaddi’s full testimony.

Baldacci said:

“Mark Twain once said that travel is fatal to prejudice, meaning if you meet people where they live, you find out they’re just like you. However, I had no chance to leave the segregated world of Richmond, Virginia when I was growing up. But I visited the library every week and I like to think through books I traveled the world without a plane ticket or a passport. And born from my love of reading came my desire to be a writer.

I worked away for decades, getting rejected over and over. But I kept going, honing my craft, remaining disciplined, taking the rejections head on and using them as motivation. And finally, I was successful.

After sixty novels under my belt, I work just as hard as I ever have. It’s the American way. Work hard, play fair, stay the course and you’ll make it.

I truly believed that until my son asked ChatGPT to write a plot that read like a David Baldacci novel. In about five seconds three pages came up that had elements of pretty much every book I’d ever written, including plot lines, character names, narrative, the works.

That’s when I found out the AI community had taken most of my novels without permission and fed them into their machine learning system. I truly felt like someone had backed up a truck to my imagination and stolen everything I’d ever created.”

Baldacci went on to point out the harm mass piracy poses to America’s authors, songwriters, and other creative producers, whose works are now in the crosshairs of Big Tech’s lawlessness.

“Every single one of my books was presented to me . . . in three seconds. It really felt like I had been robbed of everything of my entire adult life that I had worked on.” Baldacci said.         

copyright attorney: Maxwell pritt

Maxwell Pritt, counsel for creator-plaintiffs in many of the ongoing lawsuits against AI companies including the Kadrey v. Meta case, stressed the jaw-dropping scope of AI companies’ reliance on online repositories of stolen copyrighted works (some of which have been prosecuted by the FBI and Department of Justice) to seek a competitive advantage.

Maxwell Pritt, copyright attorney:

“The largest domestic piracy of intellectual property in our nation’s history.”

Select image for Pritt’s full testimony.

He noted that AI companies took “tens of millions, if not hundreds, of books and scholarly publications and articles for free instead of buying them or licensing them from copyright owners.”

Pritt explained that Meta in particular had pirated over 200 terabytes worth of pirated copyrighted works from multiple pirate e-repositories and also made copies and distributed over 40 terabytes worth to other pirates through peer-to-peer sharing. Pritt said that what was even more shocking is that top-level executives approved these practices.

The solution that already exists: licensing

Professor Bhamati Viswanathan of New England Law noted that the pirate websites and repositories that AI companies utilized have been prosecuted by the federal government, and most importantly, that engaging in piratical networks ultimately supports these criminal enterprises.

Prof. Bhamati Viswanathan, New England Law:

“The Constitution’s intellectual property clause is one of the things that makes this country not just great — but robust, powerful and economically hugely successful.”

Select image for Viswanathan’s full testimony.

Speaking of the pirate repository ‘Anna’s Archive,’ Prof. Viswanathan noted the pirate website advertised and offered to AI companies large datasets of stolen copyright protected material for sale or for data exchange.

Professor Viswanathan said:

“The solution is licensing. It already exists. The licensing of works. The fair compensation of creators . . . You cannot compromise the livelihoods creators . . . What we need is for new technologies to flourish fairly, sustainably, in ways that makes sense to us and that have already been provided for by our Constitution, by the U.S. copyright law, by intellectual property law itself.”

David Baldacci also talked about his openness to the idea of licensing his books and the importance of licensing to his craft and for the craft of writing in general. He stated that he licenses his books “all over the world” for all sorts of mediums and formats, and that he is always available to entertain offers to negotiate licensing agreements for the use of his books. He went on to explain the harm to creator incentives:

“[] the uncertainty of stealing stuff from pirate sites operated in Russia just so you can gain an advantage and you don’t really care about what happens to the likes of me and other writers coming up . . . I make a lot of money for my publishers, and my publishers use that money to take risks on new writers coming up that they ordinarily would not have been able to take a risk on. So when you hurt established writers like me, you hurt all the other writers coming behind us.”

real economic harm to creators

Prof. Michael Smith, an expert in information technology and policy at Carnegie Mellon University’s Heinz College of Public Policy Management, spoke to the economic damage done to creators by AI piracy:

“Allowing generative AI companies to train their models with pirated content is likely to harm sales for creators in two key ways.

First, the nature of BitTorrent networks is that when someone downloads a file from the network, they also share back pieces of the file to other people downloading the file. This not only increases the download speeds for the person—or in the case of Meta, the company—downloading the 2 pirated file. It also increases the download speeds for everyone else downloading the file on BitTorrent—making piracy a more attractive option to legal purchases. The economic literature shows that making it easier for consumers to download pirated content will cause direct harm by reducing sales in the legal market.

Second, allowing gen AI companies to obtain unlicensed training data through P2P pirate networks will also harm the market for licensed content. The Copyright Alliance has documented over 70 licensing contracts between gen AI companies and rightsholders including HarperCollins, Universal Music, Reddit, Shutterstock, and the Wall Street Journal. So it’s clear licensing markets between rightsholders and gen AI companies can work!”

Prof. Michael Smith, Carnegie Mellon University:

“Today we have an opportunity to create a win-win-win for society, creators, and tech firms by making it clear that piracy is wrong.”

Select image for Smith’s full testimony.

full video of hearing

A full video of the hearing is available here.

Previous
Previous

AI Legislative Update: July 18, 2025

Next
Next

Pope Leo XIV: AI ‘must help, not hinder’ child development. Read his full remarks here