The European Data Protection Board (EDPB) has on 17th of December 2024 adopted its first substantive Opinion 28/2024 on data processing in the context of AI models, particularly as they relate to the subset of AI models known as Large Language Models (LLMs). This is not a general opinion on personal data relating to generative AI, but rather addresses specific issues relating to the training of AI models. Several key considerations such as processing of sensitive categories of data, data protection impact assessments and data protection by design fall outside of the scope of the Opinion. The EDPB however expects to release more such specific opinions in the future. Even though the Opinion is limited in scope, it is an important – and long awaited – indicator of the attitude of European data protection authorities towards generative AI and LLMs.
In this Opinion, the EDPB discusses issues relating to the anonymity of AI models, legitimate interest as a ground for processing of personal data in training of AI models and liability for deployment of AI models using potentially unlawfully processed personal data. Interestingly enough, the EDPB deemed that an illegally (from the point of view of data protection) created AI model can still be used without breaking the law, if the AI model has subsequently been anonymised so that it no longer contains personal data. This is of great practical importance, as most popularly used LLMs like ChatGPT have been compiled by using vast amounts of scraped personal information, which would probably be deemed unlawful use under the GDPR.
As a whole, the EDPB Opinion sets a high standard for data protection in the context of generative AI – which is hardly surprising, given the EDPB’s role. That said, the Opinion has already been criticised by both technology lawyers and business entities for setting high standards without very clear guidance for data controllers on how to reach these lofty goals. This may be indicative of a larger, impending collision between EU data protection authorities and rules and the booming generative AI/LLM business.
The EDPB Opinion distinguishes between the development and deployment phases of AI models. Each must be assessed separately. In other words, companies need to be compliant with data protection legislation both when they are developing AI applications, and when they are using either their own AI models or applications developed by third parties.
The development phase includes activities like software development, data collection for training, pre-processing, and training, while the deployment phase follows development. A key consideration for both phases is the anonymity of training data, code, structure, and output of the AI model. No AI model is inherently anonymous; technical organization or encoding does not equate to anonymization, as specific data can still be identified or extracted. For an AI model to be considered anonymous, personal data in the training set must not be extractable, and the model must not produce output related to the data subjects. Furthermore, controllers must document their processing operations related to AI model training under Article 35 of the GDPR, even if the goal is anonymization, to ensure the sufficiency of anonymizing measures
An especially noteworthy and interesting part of the Opinion relates to the possible unlawful processing of personal data in the development phase and its effects on the subsequent deployment of the model. In other words, the Opinion considers situations where an AI model is created using unlawfully processed personal data (which is de facto the situation for many LLMs like ChatGPT), and whether that model can then be used without breaching data protection legislation.
There are at least two major scenarios – one where the model was developed using unlawfully processed personal data and this personal data remains in the model when it is deployed, and one where the model was developed using unlawfully processed personal data, but this personal data is removed (for example through anonymisation) and is not present in the model at the point of its deployment.
In the first scenario, where personal data is still present in the model when it is deployed, the initial unlawfulness may impact the lawfulness of the subsequent processing. If the model is subsequently deployed by another controller who then engages in the processing of the personal data contained in the model, the latter controller is not automatically liable for the unlawfulness of the processing of personal data in the development. They must however conduct an appropriate assessment as a part of their accountability obligations to assess whether the AI model was developed by unlawfully processing personal data. Such an assessment may include whether the data originates from a data breach or if the processing was found to infringe the GDPR by a supervisory authority or a court.
In the second scenario, in which the data has been anonymised before or after the development of the AI model, the EDPB deems that the GDPR does not apply to subsequent deployment of the model. As such the initial unlawfulness does not impact the lawfulness of the subsequent use of the model, regardless of whether the model is deployed by the same or a different controller. This means, in effect, that an initially unlawfully developed model can be lawfully used, so long as the personal data included in the training data and elsewhere in the model has been effectively removed. However, the mere claim of anonymity is not enough, and a more detailed, objective analysis is required.
The EDPB confirms that legitimate interest can be used as a valid legal basis for processing of personal data in the context of deployment and development of AI models. Legitimate interest (Article 6(f) GDPR) allows a controller to process personal data as necessary for legitimate interests, to the extent that these legitimate interests are not overridden by the interests or fundamental rights of data subjects. This is a welcome clarification, as the EPDB has at times sounded the alarm about overly liberal use of legitimate interest as a basis for processing personal data (see for example EDPB opinion no 1/2024).
As has been previously established, legitimate interest as a processing basis involves a three-step test: (i) pursuit of a legitimate interest by the controller or a third party, (ii) necessity of the processing to pursue that interest, and (iii) ensuring the legitimate interest is not overridden by the data subjects' interests or fundamental rights and freedoms. Legitimate interest may for example be applicable in developing AI systems for user assistance, fraud detection, or threat detection. The EDPB has highlighted that AI development and deployment can pose a heightened risk to data subjects' rights and freedoms, such as by causing a sense of surveillance, self-censorship, discrimination, and restricted access to information. Therefore, all relevant factors and risks must be considered when applying legitimate interest as a basis for data processing in AI contexts. The EDPB notes that context of data processing in relation to AI models differs from more traditional processing by data controllers. As such more attention must be paid to the reasonable expectations of data subjects concerning the processing of the personal data. In the context of AI models, it may be difficult for data subjects to understand the variety of potential uses for the models and the methods of data processing included in their development and deployment.
We will be happy to answer any questions or provide clarifications regarding the EDPB opinion, as well as other AI and personal data related matters: arttu.ahava@berggren.fi
The blog is written by Arttu Ahava and Eeli Aakko.