AI and company data: what to know
about privacy and security before you start
Where your data ends up when you use an AI model, the differences between cloud and on-premise, and what you need to stay compliant.
Reading time ~11 minutes
The question that stalls most AI projects in small and medium-sized businesses is not a technical one. It is this: "but where does our data actually go?". It is a fair question. The answer is neither "everything is safe" nor "everything is dangerous": it depends on which service you use, how you configure it, and what data you put into it. This guide sets out the things you need to understand before you start, with an eye on the GDPR.
The three questions
that really matter
When you send text to a cloud AI model, that text leaves your network, travels encrypted to the provider's servers, is processed, and comes back as a response. Three questions really matter when it comes to what happens in between:
- Is my data used to train the model? This is the most common concern. The answer depends on the service and the plan: business-grade services (APIs and business plans) typically do not use customer content to train models by default, whereas free consumer services sometimes do. This must be verified in the provider's terms, because it changes everything.
- How long is it retained? Providers retain data for a period (for example, to prevent misuse) and then delete it. Some offer zero or reduced retention options for customers who need them. This is a point to clarify upfront, not after the fact.
- In which country is it processed? This matters for GDPR. Several providers and their cloud platform partners offer data residency options within the EU.
One firm principle: policies change and vary by plan. Do not rely on hearsay or an outdated guide. Read the terms of the specific service you are about to use, in their current version.
Cloud, private cloud,
on-premise
"On-premise" means running the model on your own servers, in-house. For the most powerful models, this is rarely feasible for a small or medium-sized business today: it requires expensive hardware and specialist expertise. But it is not the only alternative to the public cloud. The real spectrum looks like this:
| Mode | Where the data is processed | Best for |
|---|---|---|
| Public cloud API | On the model provider's servers | Most small and medium-sized businesses. Fast to get started, pay-as-you-go pricing. |
| Model via your own cloud (AWS, Google Cloud, Azure) | Inside your cloud account, in a region you choose (including EU) | Organisations that already have a cloud infrastructure and data residency requirements. |
| Self-hosted open models | On your own servers or in your private cloud | Organisations with in-house expertise and highly sensitive data. Quality is lower than frontier models. |
The most interesting middle ground for many businesses is the second option: the same models, but delivered through the cloud platform you already use, with the ability to choose the geographic processing region. You gain most of the control without having to manage your own hardware.
No need to panic,
just the key concepts
If you process personal data (names, email addresses, customer or employee data) using an AI service, the GDPR applies. There are only a few concepts you need to be clear on:
Who is who
Your company is the data controller: you decide why and how to process the data. The AI provider acts as the data processor: it processes the data on your behalf. This relationship must be governed by a Data Processing Addendum (DPA), which serious providers make available. Without a signed DPA, using the service with personal data is a compliance problem.
The four things to verify
- Legal basis. You have a legitimate reason for processing that data (contract, consent, legitimate interest). Using AI does not create a new legal basis: you must already have the right to process that data for that purpose.
- Data minimisation. Send the model only the data needed for the task, not your entire archive. This is a GDPR principle, and it is also plain common sense (and cheaper).
- Transfers outside the EU. If processing takes place outside the EU, appropriate safeguards are required (standard contractual clauses, or a EU data residency option offered by the provider).
- Privacy notices and transparency. If you use AI in a way that affects individuals (customers, job applicants), this must be reflected in your privacy notices.
The golden rule of caution
Before sending anything to a cloud model, ask yourself: "Would I be comfortable if this text were, hypothetically, to leave my control?". For ordinary data the answer is usually yes, with a reputable provider and a DPA in place. For particularly sensitive data — special category data under Article 9 GDPR (health, biometric data, etc.), critical trade secrets, credentials — caution says: do not send them to the public cloud at all, or anonymise them first.
The concrete risks
(and the mundane ones)
Beyond formal privacy, there are practical security risks that are often more likely than abstract fears:
- Exposed API keys. The number one and most underestimated risk. A key that ends up in a public repository or an email is an open door to your budget. Treat it like a password: store it in a secrets manager, never in plain text in your code.
- Sensitive data in "test" prompts. During experiments it is easy to paste a real document containing customer data into a tool that has not yet been assessed. Establish early on what is permitted to paste and where.
- Shadow AI. Employees use free AI tools without disclosure, pasting company documents into them. You cannot combat this by banning it (it does not work): you combat it by offering an approved tool and stating clearly what can and cannot be put into it.
- Blind trust in output. A different but real risk: treating an incorrect response as if it had been verified. This matters especially where the output feeds into decisions or official documents.
Eight steps
to start compliantly
- Choose a business-grade service, not a free consumer plan, if you are processing company data.
- Sign the DPA with the provider before processing any personal data.
- Verify data residency and activate the EU option if you need it.
- Clarify retention and training use by reading the current terms of the specific service.
- Define a simple internal policy: what can be pasted, what cannot, and in which tools. One page is enough.
- Secure your API keys and set spending limits.
- Update your privacy notices if your use of AI affects individuals.
- For your most sensitive data, consider a private cloud or anonymisation, or simply keep it out of AI entirely.
In summary
With a business-grade provider, a signed DPA, the right data residency, and the principle of sending only what is needed, using AI on ordinary data is entirely manageable in full compliance. Highly sensitive data requires extra caution or stays out of the public cloud altogether. Policies change frequently: this guide provides the map, but the source of truth is the provider's current terms and, for complex cases, your privacy adviser or DPO.
Frequently asked
questions
Is my data used to train the models?
On the API and business plans of reputable providers (Anthropic, OpenAI, Google) the answer is no by default: content is not used for training. On free consumer plans it sometimes is. The exact answer is in the terms of the specific service: read them before sending any company data.
Can I use AI with sensitive data (customers, employees)?
Ordinary personal data: yes, with a business provider, a signed DPA, and EU data residency. For special category data under Article 9 GDPR (health, biometric data, etc.) or critical trade secrets, caution says: anonymise first, or use a private cloud or self-hosted solution rather than the public cloud.
What is a DPA and why do I need one?
A DPA (Data Processing Addendum) is the contract that governs the relationship between you (the data controller) and the AI provider (the data processor). Without a signed DPA, using the service with personal data is formally a compliance problem. Reputable providers make the DPA available directly within their service console.
Does the data stay in the EU?
It depends on the configuration. Several providers offer an explicit EU data residency option, which must be activated. Alternatively, running the model through your own cloud (AWS Bedrock, Google Vertex, Azure OpenAI) lets you choose the processing region — often simpler than calling the provider's API directly.
If I use the free version of ChatGPT, is it GDPR-compliant?
For personal use, yes. For business use with customer or employee data, no: the consumer plan does not offer a DPA, may use data for training, and does not guarantee EU residency. You should use a Business or Team plan (for internal assisted use) or the API (for integrations), with a signed DPA.
Want to talk through
your specific situation?
A 30-minute call to find your bearings. No pre-packaged demos.
Write to us at [email protected]