OpenAI sees the downside of ‘fair use’ now that DeepSeek may have used OpenAI’s data

Well, well, well—look who suddenly wants a word with the sheriff in the fair-use Wild West landscape of artificial intelligence.

As Bloomberg reports, Microsoft and its partner company OpenAI are investigating the white-hot Chinese startup DeepSeek after Microsoft security researchers allegedly discovered people linked to DeepSeek withdrawing large amounts of data through the company’s API last fall. Elsewhere, White House AI czar David Sacks told Fox News on Tuesday that there is “substantial evidence” that DeepSeek “distilled” knowledge from OpenAI’s AI models.

These allegations align with other suspicious aspects of the new AI. For instance, when a Fast Company editor took DeepSeek for a test run earlier this week, the chatbot insisted it was made by Microsoft.

Perhaps the Chinese company—which built its new model in a matter of months with shockingly little funding and computing power—violated the law by using OpenAI’s output to develop its tech. Or maybe it operated entirely within a legal gray area. Either way, it’s ironic that a company whose entire business model is predicated on repurposing copyrighted material is now crying foul over another company repurposing its material.

Ever since OpenAI’s ChatGPT normalized generative AI in 2022, creators have accused it of essentially being a plagiarism machine. Large language models (LLMs) like ChatGPT require for their training immense sums of information about the world. That info often comes from the copyrighted work of human creators, many of whom did not sign off on their material being used for this purpose.

Sometimes, the material is sourced and linked to; other times, not. But the direct use of copyrighted material is just standard AI. A February 2024 report from plagiarism detector Copyleaks found that 60% of ChatGPT’s output contained some form of plagiarism.

Lawsuits, litigation, and legal gray areas

It should come as little surprise that all this plagiarism has kept Microsoft and OpenAI entangled in nonstop litigation over the past two years.

The companies have faced class-action lawsuits from a group of nonfiction authors led by Julian Sancton and class-action lawsuits from such novelists as Jonathan Franzen and Jodi Picoult. Comedian Sarah Silverman, who is also an author, jumped in on yet another of these lawsuits, accusing not only OpenAI but also Meta of using copyrighted work “without consent, without credit, and without compensation.”

And while publications such as the Wall Street Journal, Vox, and The Atlantic have entered into if-you-can’t-beat-‘em-join-‘em partnership deals with Microsoft and OpenAI, the New York Times Company sued both companies for alleged copyright infringement in December of 2023.

As of now, most of these cases are still ongoing, and the rules for fair use in training LLMs remain in flux. What’s illuminating in light of OpenAI’s allegations against DeepSeek, however, is how OpenAI has defended its use of copyrighted material.

During trial arguments earlier this month in the NYT Company case, OpenAI claimed (as ever) that its output is covered by the fair use doctrine, which permits the use of copyrighted material to create something new, as long as it doesn’t compete with the original work. OpenAI’s attorneys characterize ChatGPT as not actually storing copyrighted material, but merely relying on the aftereffects of material passing through its models during the training process.

According to Digiday’s reporting on the hearing, an attorney representing OpenAI claimed, “If I say to you, ‘Yesterday all my troubles seemed so . . . ,’ we will all think ‘far away’ because we have been exposed to that text so many times,” alluding to the lyrics of “Yesterday” by the Beatles. “That doesn’t mean you have a copy of that song somewhere in your brain.”

(It should be noted here that former Beatle Paul McCartney has also been quite vocal in his criticism of AI repurposing the work and creativity of human artists.)

By OpenAI’s own logic, maybe DeepSeek simply allowed output from a U.S. competitor to flow through its model during the training process. At this point, we don’t know. (Microsoft declined to comment on what’s alleged in the Bloomberg report. Fast Company also reached out to OpenAI and will update this post as needed.)

For years now, authors, journalists, artists, and all sorts of creators have been screaming at the top of their lungs, in and out of court, that AI platforms should either find a more ethical approach to their mission or abandon it altogether. Now that the entire American AI industry is reeling from a $1 trillion stock hit because a small startup allegedly gave them a taste of their own medicine, it’s no wonder that the response on social media has been a schadenfreude bonanza.

Live by the fair use doctrine, die by the fair use doctrine.

OpenAI sees the downside of ‘fair use’ now that DeepSeek may have used OpenAI’s data

No comments

Read more

I'm a matchmaker for the uber-wealthy who charges up to $500,000. Sometimes I conduct over 100 interviews for 1 date.

I sat in on an AI training session at KPMG. It was almost like being back at journalism school.

Tree wars are tearing through Vacationland USA

OpenAI sees the downside of ‘fair use’ now that DeepSeek may have used OpenAI’s data

Heather Rae El Moussa and Christina Haack are open to flipping a house together — even if Tarek El Moussa isn't so sure

Coke releases Coca-Cola Orange Cream flavor that’s making a comeback

Nvidia's CEO lays out his vision of what the next 10 years will look like — and his simple advice to young people

Alibaba rolls out AI model, claiming it’s better than DeepSeek-V3

DJT stock jumps 10% on new fintech Truth.Fi as Trump looks to deliver for crypto and Bitcoin supporters

Why is RFK Jr.'s voice raspy? He has a neurological disorder called spasmodic dysphonia