New Article by Taner Kuru: "Lawfulness of the Mass Processing of Publicly Accessible Online Data to Train Large Language Models"

29 October 2024

We are excited to share that our Lab Member, Taner Kuru, has published a new article addressing key legal questions within the EU data protection framework, particularly on the lawfulness of training large language models (LLMs) like ChatGPT with publicly accessible online data. The title of the article is “Lawfulness of the Mass Processing of Publicly Accessible Online Data to Train Large Language Models”. It can be accessed by clicking here or on the image. 

Brief summary:
 
 

 

 

Key legal questions are emerging with the explosive rise of large language models like ChatGPT. One of the most pressing within the EU data protection framework is whether it is lawful to train these models using publicly accessible online data. This article tackles that question, using ChatGPT as a case study.

Previous discussions on this debate have focused on whether and to what extent OpenAI’s use of such data can be justified under Article 6(1)(f) GDPR. However, this article argues that this processing activity should be subjected to the Article 9 GDPR regime, given the recent rulings from the Court of Justice of the European Union. Accordingly, it explores the potential exceptions listed under Article 9(2) GDPR that could allow this processing activity.

First, the article points out a significant challenge: OpenAI cannot realistically obtain explicit consent from individuals whose personal data is used for such purposes. Moreover, it reveals that the amount of personal data that is publicly accessible online and is “manifestly made public by the data subjects”—and thus usable for training LLMs—is limited, leading to questioning the lawfulness of these processing activities. Finally, the article speculates on a possible way forward by drawing lessons from the relevant case law, and it identifies that finding the proper legal basis, if it exists, for these processing activities will be the “next big thing” in the EU data protection framework.

 
 

 

 

This article will also be presented at the upcoming Digital Legal Talks 2024 conference on November 28th. Check here for more information about the Digital Legal Talks 2024, there is limited time to register!