Open AI chief questions EU “over-regulation” of large language models

Sam Altman raised the possibility of the company pulling out of Europe altogether.

An EU rule requiring companies to disclose copyrighted materials used in developing generative AI tools could lead to the company behind ChatGPT pulling out of Europe, according to the CEO of the company behind the software.

Speaking at an event in London, UK, Wednesday, OpenAI CEO Sam Altman described the current draft of the EU AI Act as “over-regulating” and warned “If we can comply, we will, and if we can’t, we’ll cease operating”.

At issue is a clause in Article 28b-5a of the EU AI Act which states that providers of generative AI models shall “document and make publicly available a summary of the use of training data protected under copyright law”.

Subtle details matter

Altman suggested that changing the definition of general-purpose AI systems, might help, stressing he did not think the law “was inherently flawed” and that “we’re going to try to comply”. But, he said, “the subtle details here really matter”.

Open AI is also skeptical of how the EU is defining “high-risk”. Currently, it appears that the Act’s wording would designate large AI models, such as ChatGPT, as high risk, thereby forcing them to comply with additional safety requirements. Open AI argues its general purpose systems are not inherently high risk.

Speaking during a panel discussion , Altman acknowledged there were risks around AI, particularly around the generation of misinformation. But, he said, worries about this also needed to be applied to social media platforms. “You can generate all the disinformation you want with GPT-4, but if it’s not being spread, it’s not going to do much,” he said.

Impossible to comply

Writing on the Kluwer Copyright Blog earlier this month, João Pedro Quintais of the Institute for Information Law said that if the aim of Article 28b-5a was “for generative AI providers to list all or most of the copyrighted material they are including in their training data sets in an itemized manner with clear identification of rights ownership claims, etc, then this provision is impossible to comply with”.

He says this is due to the low threshold of definitions of originality, different requirements across national jurisdictions, and the poor quality of much ownership rights metadata, among other things.

Quintais says it is “of paramount importance to clarify the meaning and scope of this obligation” and is critical of “the absence of any impact assessment of its meaning, scope and implications”. He suggests “it could be useful to frame the newly proposed obligation as one of good faith or best efforts to document and provide information on how the provider deals with copyrighted training data.”