‘Blistering’ EU report calls for copyright reform to combat large-scale AI data training

A new study commissioned by the European Parliament has warned that EU copyright law is ill-equipped to deal with AI models that rely on large-scale data training and called for targeted reforms to strengthen legal protections for copyright holders.

The report, published by the European Parliament’s Committee on Legal Affairs (JURI), argues that generative AI systems operate at a “scale and opacity” that existing copyright frameworks were “never designed to address”.

It highlights a “legal mismatch” between AI training practices and the EU’s current text-and-data mining (TDM) exceptions, and calls for reforms that “reinforce [the EU’s] existing legal architecture” in order “to uphold core copyright values”. It also underlines the uncertain status of AI-generated content.

Ben Maling, a partner at IP firm EIP in London, described the study, Generative AI and Copyright, as “blistering”, noting its focus on the adequacy of the EU TDM exception in the Copyright Digital Single Market Directive.

Article 4 of the directive provides a TDM exception that allows use of such content unless the rights holder has opted out. It has been seen by many as a “legal gateway” for AI developers – especially commercial actors – to scrape and process massive volumes of online content.

The study argues the exception was “not designed to accommodate the expressive and synthetic nature of generative AI training, and its application to such systems risks distorting the purpose and limits of EU copyright exceptions”.

The report also calls for a statutory remuneration scheme to close the value gap between creators and developers, which could take the form of a collective licence or levy on AI outputs, administered by collective management organisations and based on transparent, auditable usage data.

However, such a solution would require strong safeguards, it notes, including enforceable disclosure obligations and public oversight.

The study argues that clarity is also urgently needed around “hybrid authorship”, where outputs result from both human and machine intervention.

Fully machine-generated content should remain in the public domain, it believes, while criteria for protecting AI-assisted works should be codified in EU law. It does not recommend the introduction of new, sui generis rights for machine-generated content, as it “risks undermining the coherence of the copyright system”.

Without timely reform, the EU risks legal uncertainty, market concentration and “cultural homogenisation”, it warns.

Maling noted that the report reflected a drive by Europe “to reduce its reliance on US tech and the extractive models of hyperscalers like Google, Amazon, Microsoft and Meta”, which he said were “lobbying hard to maintain their free lunch, particularly given the extra-territorial effect given to EU copyright law by the AI Act”.

He added: “This report is a must-read for UK policymakers before moving ahead with its own TDM plans.”

The study is released amid growing concerns over the use of copyrighted works as training data for generative AI (GenAI) systems.

There are a number of key cases working their way through the courts. A landmark trial in the UK took place at the High Court in June between Getty Images and AI outfit Stability AI, with a judgment expected in October.

It is the most significant copyright trial in decades, and one which will have an important impact on the UK landscape for both AI model developers and copyright owners.

Getty claims Stability AI infringed the copyright of millions of its copyright-protected works by using them to train its GenAI image generator, Stable Diffusion. There is also a case in the US between the parties, filed in Delaware, involving similar allegations of copyright and trademark infringement.