OpenAI-DeepSeek AI distillation dispute

By Weng Cailin and Zhou Yu, Ronly & Tenwen Partners
0
345
Whatsapp
Copy link

With the rapid development of AI, the research and development of large language models (LLMs) has attracted significant attention. In January 2025, DeepSeek released the open-source inference model R1, which, through reinforcement learning training, achieved performances comparable to the top inference model GPT-o1 at a low development cost of USD5.6 million, quickly drawing global attention.

However, the distillation technology adopted by DeepSeek during its development process has sparked legal controversy. In a Financial Times report, OpenAI accused DeepSeek of using its models to train competing models, violating OpenAI’s terms of use. Although OpenAI has not yet pursued legal action, this dispute highlights the current legal challenges faced by distillation technology.

Definition of distillation

Weng Cailin
Weng Cailin
Partner
Ronly & Tenwen Partners

Knowledge distillation is a model compression technique first proposed by Geoffrey Hinton and others in 2015. Its core concept is to use the knowledge output of a large “teacher model” to train a smaller “student model”, enhancing the latter’s performance by mimicking the former’s outputs. This technique can significantly improve computational efficiency and reduce costs while maintaining model performance. Today, it is widely applied in LLMs.

Recently, researchers from the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Tsinghua University and others proposed a framework for evaluating the degree of model distillation, with experimental subjects including popular LLMs such as Claude, DeepSeek-V3, Llama3.1 and Gemini. The research results show that, among these open-source and closed-source models, except for Claude, Doubao and Gemini, the others exhibit a high degree of distillation.

In a January 2025 article, DeepSeek also mentioned that R1 used models such as Qwen2.5 and Llama-3.1 as the basis for distillation, achieving performance optimisation.

Legality of distillation

Zhou Yu
Zhou Yu
Paralegal
Ronly & Tenwen Partners

Distillation technology is widely used in practice. However, companies such as OpenAI, Anthropic, Mistral and xAI include strict “anti-competitive distillation” clauses in their terms of use, prohibiting users from using their services or outputs to develop competing models. These clauses are intended to protect the core assets of AI R&D enterprises, such as model data, algorithms and structures, but they can easily bring neutral and innovative technical means within the scope of prohibition, leading to intense conflict between technological innovation and intellectual property protection.

The dispute between OpenAI and DeepSeek is a typical example of this conflict. The legality of distillation technology can be analysed from the following three perspectives.

Contract law. Under the framework of US contract law, determining a breach of contract requires considering whether a valid contract exists, whether there is a breach and whether there are grounds for exemption. Although OpenAI’s terms of use explicitly prohibit competitive distillation, the agreement may fall short in its obligation to highlight and explain material terms in standard form contracts, potentially leading to its invalidity. Even if the agreement is valid, OpenAI still bears the burden of proof to demonstrate that DeepSeek breached the agreement and caused actual losses.

Copyright law. Is AI-generated content protected by copyright law? In judicial practice, Chinese courts typically make case-by-case determinations based on the degree of human intellectual input involved. US copyright law stipulates that works eligible for copyright must meet the “human authorship” standard, but in recent years, the US Copyright Office has recognised that some AI-generated content may be copyrightable.

However, OpenAI’s output data used by DeepSeek for distillation lacks sufficient human intellectual contribution and is unlikely to be considered eligible for copyright protection. Even if the standard for copyrightability is met, OpenAI’s terms of use stipulate that all rights to the output content are transferred to the user, making it difficult for OpenAI to claim copyright infringement.

Unfair competition. Whether DeepSeek’s conduct constitutes unfair competition depends on how it obtained the data. If DeepSeek only obtained output data through public APIs (application programming interfaces), rather than stealing OpenAI’s internal parameters, algorithms or source code, it is difficult to determine that it used deceptive means or infringed trade secrets. Conversely, if OpenAI were to prohibit all legitimate distillation technology, it could, due to its market dominance, be considered as abusing its market position and hindering technological innovation and free competition.

Implications

Distillation technology is neutral and innovative, with the potential to promote technological advancement and deliver widespread benefits. However, its legality remains unclear under the current legal framework. The 2025 State Council Government Work Report outlined a clear policy focus on technological innovation, emphasising the ongoing advancement of the “AI+” initiative and support for the broad application of large models. There is an urgent need for the law to strike a balance between protecting intellectual property rights and fostering technological innovation.

Regulating AI model distillation can be approached from the following two aspects.

Improving laws and regulations. Relevant laws and regulations should clarify the legal risks that may arise during the distillation process, such as whether the use of large model training data constitutes “fair use” under copyright law. At the same time, the reasonable boundaries of platform “anti-distillation clauses” should be defined to prevent abuse of market dominance.

Strengthening industry governance and self-discipline. Enterprises need to enhance compliance awareness, focusing on the transparency of data sources and the standardisation of processes during distillation. In addition, while encouraging enterprises to use open source, reasonable restrictions should be placed on the development and use of derivative models.

The dispute between OpenAI and DeepSeek reveals the legal dilemmas surrounding the legitimacy of distillation technology. From the perspectives of contract law, copyright law and anti-unfair competition law, there are numerous controversies regarding the legality of this technology. However, distillation technology has significant innovative and practical value, and the law should adopt an inclusive and prudent approach. By improving the legal system and strengthening industry governance, it is possible to protect intellectual property rights while promoting the healthy development of AI technology.

Weng Cailin is a partner and Zhou Yu is a paralegal at Ronly & Tenwen Partners

Ronly-Tenwen-Partners-logoRonly & Tenwen Partners
17/F, Jinmao Tower
88 Century Avenue
Shanghai 200120, China
Tel: +86 21 6840 7858
Fax:+86 21 6840 7599
E-mail: wengcailin@126.com | zhouy@rtlawyer.com.cn

Whatsapp
Copy link