Home Industry OpenAI: GenAI tools can’t be made without copyrighted materials

OpenAI: GenAI tools can’t be made without copyrighted materials

news analysis

Jan 08, 20244 mins

Artificial IntelligenceGenerative AILegal

The company’s assertion is likely to add fuel to the fast-evolving legal debate over generative AI and intellectual property rights.

Credit: Andrew Neel

In response to gathering legal efforts to rein in its data collection, OpenAI is arguing that the creation of advanced generative AI (genAI) tools is unfeasible without the use of copyrighted content to train them.

In a report to the UK’s House of Lords Communications and Digital Select Committee, OpenAI said that training extensive large language models (LLMs) such as GPT-4, the underlying technology of ChatGPT, would be impossible without the use of copyrighted materials.

“Because copyright today covers virtually every sort of human expression — including blog posts, photographs, forum posts, scraps of software code, and government documents — it would be impossible to train today’s leading AI models without using copyrighted materials,” OpenAI said in its submission.

GenAI applications such as ChatGPT or the image-generation tool Stable Diffusion are built using vast amounts of data — much of it protected by copyright laws — collected from the internet. That’s led to increasing pushback from publishers and authors who say their work is being used without credit or compensation.

Concerns about copyrighted code

Developers have been using resources such as Google and StackOverflow for decades, said Daniel Li, CEO of Plus Docs, a company whose software uses genAI to design, create, and edit presentations. ChatGPT, he said, simply allows even more ease of use when coding.

“The important thing to realize, however, is that developers still need to understand their code. ChatGPT doesn’t change that requirement,” he said.

Li agreed that “companies need to be very careful they are not using code or other copyrighted text. This is already a major topic in software acquisitions for big tech companies, and it will only become more important.”

The statement by OpenAI comes as the company faces a raft of legal actions. Just last week, The New York Times filed a lawsuit against it and Microsoft, a significant investor in the company and a user of its tools in various Microsoft products; the suit alleges illegal use of New York Times content in the creation of OpenAI tools. OpenAI argued in return that copyright law does not prohibit the training of genAI models.

OpenAI last year faced a federal class action lawsuit in California accusing it of unlawfully using personal data for training purposes. That lawsuit, lodged in the Northern District of California, cited 15 violations, including breaches of the Computer Fraud and Abuse Act, the Electronic Communications Privacy Act, and various consumer rights statutes at the state level.

The central allegation of the California suit is that OpenAI “unlawfully acquired” the plaintiffs’ private data and used it without providing compensation.

According to the complaint, “OpenAI employed this misappropriated data to refine and advance [ChatGPT] through extensive language models and advanced language algorithms, enabling it to produce and understand language akin to a human, applicable across a multitude of uses.”

Lawsuits are proliferating

The California case is part of a growing legal fight over efforts to rein in rampant data collection by genAI tools. A group of nonfiction authors has initiated a class-action lawsuit against OpenAI and Microsoft, alleging the companies infringed on the authors’ copyrights by using their writings and academic papers to train ChatGPT without authorization.

The primary plaintiff is Julian Sancton, the author of “Madhouse at the End of the Earth: The Belgica’s Journey Into the Dark Antarctic.” That suit charges OpenAI and Microsoft with flagrantly disregarding copyright laws to create “a multi-billion-dollar business by using humanity’s collective works without permission. Instead of compensating for intellectual property, they act as though copyright laws are non-existent.”

John Licato, an assistant professor of Computer Science and Engineering at the University of South Florida, said OpenAI’s stance could result in copyright issues.

“The line separating adapting existing ideas and genuinely creating something new is already muddy, and AI is forcing us to see how poorly defined that distinction actually is,” Licato said.

by Sascha Brodsky

Sascha Brodsky is a contributing writer for the Foundry group of publications.

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

OpenAI: GenAI tools can’t be made without copyrighted materials

The company’s assertion is likely to add fuel to the fast-evolving legal debate over generative AI and intellectual property rights.

Concerns about copyrighted code

Lawsuits are proliferating

More from this author

Microsoft’s Copilot AI set to operate locally on future PCs, says Intel

Microsoft hires ex-DeepMind co-founder to lead Copilot AI initiatives

Intel finds ally in EU antitrust case

AI pros are increasingly uneasy about the technology — survey

How OpenAI plans to handle genAI election fears

Infosys to acquire InSemi

EU to review Microsoft’s investment in OpenAI

Rite Aid hit with a five-year freeze on facial recognition

Most popular authors

Show me more

Box announces upgrade to Box AI, integration with GPT-4o

Adobe adds Experience Manager ‘content hub’ to help find, reuse digital assets

Google rolls out cloud-based enterprise browser management tool

Podcast: What skills will future tech leaders need?

Podcast: Is social media as dangerous as smoking?

Podcast: Why businesses should get serious about gaming

Skills that future tech leaders will need

Is social media usage as unhealthy as smoking?

Why businesses should get serious about gaming

OpenAI: GenAI tools can’t be made without copyrighted materials

The company’s assertion is likely to add fuel to the fast-evolving legal debate over generative AI and intellectual property rights.

Concerns about copyrighted code

Lawsuits are proliferating

Related content

8 AI-powered apps that'll actually save you time

EU commissioner slams Apple Intelligence delay

Download our unified communications as a service (UCaaS) enterprise buyer’s guide

Enterprise buyer’s guide: Android smartphones for business

From our editors straight to your inbox

More from this author

Microsoft’s Copilot AI set to operate locally on future PCs, says Intel

Microsoft hires ex-DeepMind co-founder to lead Copilot AI initiatives

Intel finds ally in EU antitrust case

AI pros are increasingly uneasy about the technology — survey

How OpenAI plans to handle genAI election fears

Infosys to acquire InSemi

EU to review Microsoft’s investment in OpenAI

Rite Aid hit with a five-year freeze on facial recognition

Most popular authors

Show me more

Box announces upgrade to Box AI, integration with GPT-4o

Adobe adds Experience Manager ‘content hub’ to help find, reuse digital assets

Google rolls out cloud-based enterprise browser management tool

Podcast: What skills will future tech leaders need?

Podcast: Is social media as dangerous as smoking?

Podcast: Why businesses should get serious about gaming

Skills that future tech leaders will need

Is social media usage as unhealthy as smoking?

Why businesses should get serious about gaming