AI Security

AI training on company data

Consumer tiers may learn from what employees paste; enterprise terms change that. Where the line runs, and how to find which side you're on.

19%

of Polish employees pasted sensitive data into AI tools

2025

• ESET / DAGMA

+$670K

added breach cost where shadow AI runs high

2025

• IBM

On consumer tiers, pasted text can become training data governed by the vendor's terms, not yours.

what "training on company data" means in practice

Most large AI model vendors offer multiple tiers. Consumer or free tiers are typically funded partly through the data they collect. The terms of service for these tiers often state, in language that varies by vendor, that inputs may be used to improve models, to train AI systems, or for related research purposes.

Enterprise tiers, often called "enterprise" or "business" plans, typically include a data processing agreement and a commitment that customer inputs are not used for training. The vendor processes your data to provide the service and no further.

The practical difference is that an employee using a free-tier account sends company data to a vendor's infrastructure under terms that may allow permanent retention and use of that data for model training purposes. An employee using an enterprise-tier account sends the same data under terms that prohibit that retention.

In 2023, Samsung employees pasted source code and internal data into ChatGPT over a roughly three-week period. The incident prompted the company to restrict AI tool access internally. The case illustrates what happens when adoption of free-tier tools runs ahead of governance.

why employees default to personal and free-tier accounts

Free-tier AI tools are immediately available, require no procurement process, and in most cases perform the same core tasks as enterprise versions. An employee who wants to use an AI assistant has no reason to wait for IT to provision an enterprise account when they can start with a personal account in minutes.

The data handling difference between tiers is not surfaced prominently in the user experience. The tool works the same way regardless of which tier is in use. The distinction exists in the terms of service, which most users do not read, and in vendor documentation that most IT teams have not reviewed for every tool in use.

The result is that free-tier AI tool usage accumulates organically, driven by productivity and convenience, without the data handling implications being visible to either the employee or the organization.

what training on company data means for your organization

The exposure depends on what data was shared and what the vendor's terms permit.

Permanent data retention. Data used to train AI models is typically incorporated into model weights in a form that cannot be retrieved or deleted. If company data was used to train a vendor's model, it cannot be recalled. The GDPR right to erasure, for instance, cannot practically be applied to training data already incorporated into model weights. This is a known limitation that regulators are still working through.

GDPR compliance. Sending personal data to a third party requires a lawful basis. For a vendor to process personal data, a data processing agreement is typically required. A consumer-tier AI tool that lacks a DPA does not provide that lawful basis. The company using the tool bears the compliance obligation.

Intellectual property. Source code, proprietary algorithms, product documentation, and unreleased research that enters a training dataset may be reproduced in the model's outputs in response to queries from other users. The risk is difficult to quantify, but it is not zero.

Contractual obligations. Client contracts and NDAs frequently include data handling restrictions. Whether free-tier AI tool use constitutes a breach depends on what the specific contract says, but organizations using such tools with client data without reviewing their contractual obligations are taking on unquantified risk.

what works

The terms question can only be answered tool by tool, which makes the inventory the starting point: which AI tools are in use across the organization, including the personal and free-tier accounts that never crossed a procurement desk. For each tool identified, three facts settle the matter: whether the vendor uses inputs to train or improve models, whether an enterprise or business plan disables that behavior, and whether a DPA is on offer.

For tools in regular use that touch company data, the enterprise upgrade is usually the most practical fix. It changes the legal position, training off, DPA in place, retention defined, without asking employees to change a single workflow, which is why it succeeds where bans fail. The distinction has to be explained, though, because the product looks and behaves identically on both tiers. One sentence in the AI acceptable use policy, that AI tools handling work data run only on company-provisioned accounts, plus a line on why, carries most of the message.

Some tools end the analysis early. A vendor that offers no enterprise terms and no DPA cannot support compliant handling of company data, and for those tools the realistic options narrow to use without company data or no use at all.

practical guides you might find useful

let's start with a conversation

Most first conversations start with not quite knowing what you have or where to begin. That's normal, and it's exactly where we're useful.

Tell us what prompted this. An upcoming audit, an incident, a client's security questionnaire, or just a sense that things have gotten messy.

We'll take it from there

Julian Machowski

Head of Technical Sales
+48 783 762 997
julian@unshadowit.com

Let's connect on LinkedIn

Message received. We'll be in touch soon.

Something failed. Try again or call us directly.

AI training on company data

We audit environments built on

of Polish employees pasted sensitive data into AI tools

2025

•

ESET / DAGMA

added breach cost where shadow AI runs high

2025

•

IBM

On consumer tiers, pasted text can become training data governed by the vendor's terms, not yours.

what "training on company data" means in practice

why employees default to personal and free-tier accounts

what training on company data means for your organization

what works

practical guides you might find useful

let's start with a conversation