Essential Checklist for Compliant Use of Personal Data in AI Development and Deployment
Artificial intelligence (AI) systems have transformed industries, leading to innovative solutions and improved efficiency in countless fields. From advancements in healthcare diagnostics to personalized customer service, the potential of AI is immense. However, with these groundbreaking benefits come significant challenges, particularly when it comes to data privacy. AI systems, often unpredictable, can make automated decisions that may unintentionally lead to issues like discrimination, raising concerns about transparency and fairness.
In response to these growing concerns, regulatory bodies started drafting new legislation around the world, including in the EU. Accordingly, the EU AI Act (EU regulation governing the use of AI) came into force on July 12, 2024, marking a pivotal moment in addressing the legal uncertainties surrounding the development, provision, and use of AI systems. Given that AI relies heavily on data—often including personal data—one of the central questions of this new regulatory era is how to lawfully use data collected for AI purposes. Ensuring compliance with these new rules is crucial to balancing the power of AI with the protection of individual privacy rights.
Recent Developments in AI and Data Privacy
Supervisory authorities around the world have already begun to scrutinize the use of AI, with global tech giants like Meta and X (formerly Twitter) at the forefront of regulatory action. As these companies process vast amounts of data, they are naturally among the first to face heightened regulatory scrutiny.
- Meta’s Global AI Tensions – Meta’s use of AI has sparked tension, particularly between the company and the EU regulators. After complaints from Noyb, Meta was forced to halt its AI-related plans in the EU. The company delayed the use of personal data for AI development and improvement “following consultations with regulators.” Meta also assured users they would be notified in advance of any changes and granted the option to refuse data processing for such purposes. Additionally, Meta decided not to release an advanced version of its AI model, LLama, within the EU, citing the “unpredictable” actions of regulators as the reason for this decision.
In contrast, in Brazil, the landscape is different. On August 30, 2024, Brazil’s data protection authority (ANPD) lifted an initial ban that had restricted Meta from using personal data to train its AI models. This demonstrates the varying approaches to AI regulation around the world, with some countries focusing on strict privacy protections, while others are more open to fostering innovation.
On August 23, 2024, the CEOs of Meta and Spotify issued a joint statement, warning that the EU risks falling behind if it doesn’t adopt a more progressive stance toward AI technology.
- X’s Updated Privacy Policy – X has also stirred attention with its recent privacy policy update, outlining its intent to use collected data for machine learning and AI training. However, in response to EU pressures, the company agreed not to use personal data from EU users for AI training until it enables them the possibility to withdraw their consent.
- Guidelines from Supervisory Authorities – In light of these developments, several countries have started to release official guidelines to regulate the use of personal data for AI purposes. Singapore, for example, has issued Advisory Guidelines on Use of Personal Data in AI. Meanwhile, in the UK, the Information Commissioner’s Office (ICO) has published a set of guidelines and answers to frequently asked questions on the processing of personal data for AI.
As stricter AI rules are most likely here to stay, our checklist will guide you through the 9 steps necessary to ensure that your use of personal data in AI development is fully compliant.
Implement Technical and Organizational Measures
To address the potential risks associated with AI, it’s crucial to implement appropriate technical and organizational measures. However, there’s no universal solution—security measures must be tailored to the specific context and nature of the AI system in use. Some effective measures to consider include:
- Data encryption to protect sensitive information during transfer and storage.
- Regular audits and assessments to ensure ongoing compliance with data protection standards.
- Access control mechanisms that limit who can access personal data used in AI training or for other AI purposes.
- Ensuring cyber-security.
Wherever possible, one of the most effective strategies is to utilize anonymous data. Anonymous data cannot be linked back to an individual, making it exempt from privacy regulations in most jurisdictions. By using data that has been anonymized, AI developers can significantly reduce their legal obligations under privacy laws. This approach not only protects individual privacy but also simplifies compliance.
If complete anonymization isn’t feasible, pseudonymization offers another valuable method of protecting personal data. Pseudonymization involves altering data in a way that prevents the identification of individuals without additional information. For example, replacing identifying fields (such as names) with unique identifiers. However, it’s important to note that pseudonymized data is still considered personal data under most regulations. As a result, it remains subject to legal protections, although with reduced risk compared to fully identifiable information.
Choose the Right Legal Basis for Data Processing in AI Systems
One of the fundamental steps in ensuring compliance with AI and privacy regulations is establishing a valid legal basis for processing personal data. All data processing activities must be grounded in one of the six legal bases defined by privacy laws, such as the GDPR. However, determining the appropriate legal basis for each specific case can be complex, as it requires a deep understanding of both the AI model’s purposes and the regulations in question.
It’s also important to recognize that a single AI model may involve multiple data processing operations (stages) — such as development/training, deployment, auditing, etc. — and each of these activities might require a different legal basis. For instance, while one stage may rely on the performance of a contract, another stage may necessitate legitimate interest or explicit consent from data subjects.
Additionally, different AI models can use personal data for various purposes, meaning that the applicable legal basis for each AI system may differ. This requires careful consideration when drafting privacy policies, as each processing activity needs to be justified by the appropriate legal basis. That being said, certain legal bases are likely to be used more frequently in the context of AI.
- Consent – Commonly used legal basis, but it comes with strict requirements: it must be specific, granular, freely given, and easily withdrawn at any time. Consent is more feasible in situations where there is direct contact with the data subject, making it easier to obtain. For example, if you’re personalizing services for users, consent may be appropriate since you can clearly communicate the AI’s purpose and allow users to opt-in. However, consent becomes impractical in scenarios where you are scraping publicly available data to train an AI model. Without direct contact, it would be nearly impossible to acquire consent from every individual whose data is being used.
- Performance of a contract – The legal basis of contractual necessity is relevant only if processing personal data is absolutely essential for fulfilling a contract with the data subject. It cannot be used simply because the AI model enhances or personalizes the service. For example, if your AI system offers personalized recommendations, this may improve the user experience, but it is not likely to be considered a necessity for the performance of the contract itself. On the other hand, if the core functionality of a service directly depends on AI, such as an AI-powered language translation app, then this basis may be appropriate. Generally, however, this legal basis should be avoided if there is a reasonable alternative for data subjects to access the service without the involvement of AI.
- Legitimate interest – One of the more flexible legal bases, but it requires careful documentation and justification. You must conduct a Legitimate Interest Assessment (LIA), which involves the a) Purpose test: Is there a legitimate purpose for the processing?; b) Necessity test: Is the processing necessary to achieve this purpose? And c) Balancing test: Do the interests of the company outweigh the privacy rights of the data subjects? The balancing test is critical, as the legitimate interest of the company must not override the rights and freedoms of individuals. For AI, this basis is often used in non-intrusive processing activities, but it requires transparency and proper safeguards.
- Legal obligation – This legal basis applies when processing is necessary to comply with a legal obligation. It is likely to be more relevant in auditing or testing phases, where laws or regulations may require specific measures to ensure fairness or accuracy in AI outputs.
- Public interest – Processing data under the legal basis of public interest is generally reserved for public authorities or organizations performing tasks that serve the broader public. For AI systems, this might be applicable in sectors like public health or law enforcement, where AI is used to fulfill governmental objectives. It is unlikely to be relevant for most private companies.
- Vital interest – The vital interest basis is rarely used but can be relevant during the deployment phase of an AI system, particularly in healthcare scenarios, such as AI used for medical diagnostics or emergency response systems.
- Take a Risk-Based Approach
The new EU AI Act adopts a risk-based approach, meaning that the obligations imposed on AI developers and providers vary depending on the level of risk an AI system poses. This approach echoes the privacy-related obligations that already exist under regulations like the GDPR, where companies must conduct a Data Protection Impact Assessment (DPIA) in certain cases.
A DPIA is required when data processing is likely to result in a high risk to the rights and freedoms of individuals, particularly when using new technologies, when automated decision-making is involved, or where it includes large-scale processing of sensitive data. Since AI is considered a new technology, the same principle likely applies to AI systems. Conducting a DPIA helps identify, assess, and mitigate risks associated with AI models, ensuring compliance and minimizing harm to individuals.
AI systems that use personal data (especially sensitive categories of data) often present higher risks, as they can affect individuals whose data was used in both the development and deployment stages of the AI model. It is also crucial for AI developers to understand that they may accidentally process personal data during the course of their work, particularly when it is difficult to separate personal data from other datasets.
Ensure Transparency
Under the EU AI Act, transparency is a fundamental requirement for AI systems. Users typically need to be informed when they are interacting with an AI system or when the content they are viewing is AI-generated. In addition, the GDPR mandates that data subjects must be informed of the purpose for which their personal data is collected and processed.
If personal data is not required for the operation of the AI system, it is highly recommended to go one step further in ensuring transparency and privacy and to include an additional disclaimer advising users not to input any personal data when using the AI system. This extra precaution can significantly reduce the risk of unnecessary personal data processing and provide added protection for users’ privacy.
Comply with the Data Minimisation principle
The data minimization principle is crucial in AI development and deployment, ensuring that only the necessary personal data is collected and used. Collecting excessive or irrelevant data not only increases the risks of privacy violations but also creates unnecessary obligations for compliance under data protection laws.
Overly intrusive practices, such as using data from users’ private chats to train AI models, should be avoided unless absolutely necessary.
A great way to support data minimization is through federated learning. This technique allows AI models to be trained across multiple devices without centralizing the data, ensuring that only the necessary model updates are shared instead of the personal data itself. This helps reduce the amount of personal data collected while still enabling the AI to learn and improve its performance.
Enable Human Review of Automated Decision-Making
Under the GDPR, automated decision-making that significantly impacts individuals, such as decisions made by AI systems, must be carefully regulated. One of the key requirements is ensuring that individuals have the right to request human intervention in any decisions made solely by automated processes.
Besides GDPR requirements, human oversight in AI systems typically comes in two forms:
- Oversight of the AI program itself – Regular monitoring and assessment of the AI system to ensure that its outputs are accurate, unbiased, and in compliance with regulations.
- Bypassing or replacing the AI process – In certain cases, human decision-making can replace the AI system. While this isn’t always feasible, it is advisable whenever possible, especially for high-stakes decisions.
This oversight is particularly important when AI systems have the potential to lead to discrimination. The risk of discrimination is often heightened if the training data is unbalanced or reflects biased societal patterns.
Make Sure that Data transfers are Compliant
When transferring personal data, especially across borders, it’s essential to meet the stringent requirements set by the GDPR. One of the first steps is to determine the roles involved in data processing, as the legal obligations differ depending on whether you are acting as a data controller, data processor, or joint controller. AI developers and deployer can be bot data controller and data processors, depending on the situation.
Example 1: An AI company develops a customer service chatbot that collects and processes user queries and personal data directly from customers. The company determines what data is collected, how it will be used, and for what purpose, such as improving the chatbot’s responses and providing personalized recommendations. Since the AI developer is deciding the purpose and means of processing, they are acting as a data controller.
Example 2: A healthcare provider hires an AI company to develop a system that analyzes patient data for diagnostic purposes. The healthcare provider determines the purpose of processing (diagnosis), and the AI developer only processes the data following the instructions given by the healthcare provider. In this case, the AI developer is acting as a data processor, as they do not decide the purpose of the processing but only follow instructions.
Once roles are clearly defined, it’s critical to sign the necessary agreements, such as a Data Processing Agreement (DPA) or Joint Controllership Agreement. These agreements ensure that each party—whether they are processor or controller—understands their responsibilities for data protection and privacy.
If personal data is being transferred to a third country, that may not provide adequate protection, additional safeguards are required. This often involves implementing Standard Contractual Clauses (SCCs), which are templates approved by the European Commission to ensure that data recipients in third countries provide adequate protection.
Additionally, in such cases, companies are also expected to conduct a Data Transfer Impact Assessment (DTIA). This assessment evaluates the risks involved with transferring data to certain regions and ensures that proper safeguards are in place.
Is Data Scraping via AI Allowed?
AI technologies can be highly effective tools for data scraping, and gathering large amounts of data from publicly available sources. However, while scraping may seem straightforward, it presents significant legal challenges—especially under the GDPR. One of the main obstacles is obtaining consent from the data subjects whose information is being scraped, which is nearly impossible when dealing with vast datasets from public sources.
This leaves legitimate interest as the primary legal basis for such data processing. However, there has been growing debate about whether this legal basis can be relied upon in the context of data scraping for AI development. Recently, the Dutch Data Protection Authority (DPA) expressed skepticism, stating that “commercial interests” can’t be qualified as a legitimate interest under the GDPR. If this stance is adopted by other supervisory authorities, AI developers may face difficulties using legitimate interest to justify data scraping, leaving them without any legal options to scrape data.
However, the European Commission has pushed back on this position, urging the Dutch DPA to reconsider. According to the Commission, commercial interests should be considered legitimate, provided that a proper balancing test is conducted to ensure that they do not override the fundamental rights and freedoms of data subjects.
Enable continuous AI governance
To ensure full compliance with AI and privacy regulations, continuous governance is essential. This involves an ongoing collaboration between the legal and tech teams. As AI technologies and legal frameworks are rapidly evolving, there are still many unresolved issues that require careful navigation. The legal team’s role is to stay on top of these developments, ensuring that the organization adheres to new laws and regulations, while the tech team adjusts and aligns AI development and deployment strategies accordingly.
A key aspect of this governance model is proactive monitoring and adaptation. As AI systems advance, they may introduce new risks or compliance challenges, especially regarding data usage and privacy concerns. Continuous governance enables the business to be flexible, mitigating risks before they become critical issues. Regular audits, impact assessments, and system reviews are essential tools to ensure that both the AI systems and the data they rely on remain compliant over time.