Regaining control by opting out of LLMs

By Essenese Obhan and Anjuri Saxena, Obhan & Associates
0
201
Whatsapp
Copy link

The theme of the year gone by in the tech world was indisputably GenAI, or generative artificial intelligence, and the way in which its development continued to accelerate. GenAI’s tentacles are lengthening in reach, raising new questions every day. These include such areas as computing power, liability and the authorship and ownership of content. A contentious issue relates to the use of copyrighted works in the training of AI and large language models (LLM). This is a live debate being considered around the world, not least by Delhi High Court in a case brought by the Indian news agency ANI Media against OpenAI, the developer of ChatGPT. The central argument revolves around an opt-out mechanism. LLM owners prefer that this is in place but few, if any, copyright holders support it.

Essenese Obhan
Essenese Obhan
Managing partner
Obhan & Associates

The principle being advanced by the LLM developer in the case is that all publicly available data or works should be free to use unless copyright holders clearly indicate that they wish their works to be excluded from or opted out of programmes being used to train LLMs. The burden is thus on the copyright holder to expressly state, for each LLM model, that their works cannot be used for training purposes. They must opt out each time.

Copyright holders can opt out in two ways. One is through location-based identifiers, that is by controlling access to and the use of works hosted on a particular domain or URL. The other is through unit-based identifiers, using tools to manage access and the use of individual copyrighted content or data. Location-based strategies are much broader, whereas unit-based identifiers permit a more nuanced approach to specific works in larger datasets.

While location-based identifiers are simpler, only those that have control over the entire domain or URLs can implement them. They may not be the copyright holders. Standard robot exclusions of text files used to instruct and control web crawlers only protect domain hosts. If the protected work is available on other sites for which the copyright holder has not explicitly opted out, an LLM can easily access and scrape it. Opting out is prospective not retrospective, leaving existing work unprotected. This also has an adverse effect on search engine rankings and visibility. With new crawlers and AI tools launched every day, rights holders must tenaciously but tediously identify each one and opt out of them.

Anjuri Saxena
Anjuri Saxena
Associate
Obhan & Associates

Even as the effectiveness of opting out is being debated, the widespread adoption of this model as the default may be underway. In December 2024, the UK government issued a consultation paper regarding the harmonisation of UK copyright law with AI and LLMs. This suggested exempting the activity of text and data mining of copyright-protected works, even for commercial activities, from copyright law. Copyright holders would be able to reserve their rights, or opt out. This approach was premised on aiding the quality development of AI models, putting the UK’s AI sector on the global map and allowing it to compete with players such as OpenAI and Google. With China’s DeepSeek now being touted as the world’s largest open-source LLM, this approach may gather further support.

Opting out sounds a convenient safeguard, but it ignores the foundation of copyright protection. It fails to protect all versions of the copyrighted works, particularly those remaining vulnerable to access by LLMs on other domains not opted out of. They also allow LLMs to be trained on protected works without any payment of compensation. This gives LLM owners an unfair advantage.

An opt-in mechanism would certainly make more sense and be more just. This is not a new concept. In India, section 6 of the Digital Personal Data Protection Act, 2023, requires data fiduciaries to seek permission from data principals before processing their personal data. Having already acknowledged and adopted an opt-in principle in data privacy regulation, consistency demands that the same concept should also apply to copyrighted works.

Adopting this approach would see the country going against global regulatory trends. In its recent report on AI governance guidelines development, the government indicated that training LLMs without the approval of right-holders may constitute copyright infringement. This may leave India choosing to opt out of a global consensus and steer its own course.

Essenese Obhan is the managing partner and Anjuri Saxena is an associate at Obhan & Associates.

Obhan & Associates
Advocates and Patent Agents
N – 94, Second Floor
Panchsheel Park
New Delhi 110017, India
Contact details:
Ashima Obhan
T: +91 98 1104 3532
E: email@obhans.com
ashima@obhans.com

Whatsapp
Copy link