Productivity gains from ChatGPT, Copilot consumer editions, and other public SaaS AI often arrive before security finishes its review. Uploading proprietary designs, customer contracts, trade secrets, or unreleased financials sends that material into a vendor's trust boundary—where training opt-outs, retention windows, and privacy policy language describe intent, not who can read prompts and outputs while models run on shared infrastructure.
Accidental exposure is routine: a pasted source snippet, HR spreadsheet, or M&A diligence memo in a chat window. Shadow AI compounds it when employees use personal accounts because the approved tool is slow or blocked. Breach disclosure, subpoena, or a competitor's discovery can surface content you assumed was ephemeral. Enterprise agreements and no-training clauses reduce contractual risk; they do not remove in-memory exposure to platform operators, co-located tenants, or legal process during inference.
If your organization operates under export controls, healthcare privacy rules, financial supervision, or defense-adjacent contract language, the gap between "we have an enterprise agreement" and "we can explain who saw this data during inference" is where audits stall and incidents begin.
The encryption gap: data in use
Confidential computing addresses a gap standard cloud security leaves open: data and code are exposed in plaintext while models train and infer. For AI workloads, that gap is wider than a single database table—prompts, retrieved chunks, vector embeddings, fine-tuned weights, and agent state all pass through the compute path.
Encryption at rest and in transit remains necessary. It is also insufficient on its own. During inference, inputs are decrypted, tokenized, and loaded into GPU memory; RAG retrieval merges sensitive corpus text into the context window; outputs are assembled before they are encrypted again. At every step in that path, data exists in use—visible to anything inside the trust boundary of the host, hypervisor, and privileged administrators unless hardware isolation limits who can observe memory.
Compliance reviewers in regulated environments ask a direct question: who can read GPU memory during inference? TLS and disk encryption do not answer it. Platform operators, co-located tenants, compromised host kernels, and insider access are threat models—not edge cases—for buyers who must defend data residency, enclave requirements, and audit scope.
What AI exposes in a single request
A single inference call can surface more than one query string:
- User prompts and conversation history
- RAG index content retrieved into the context window
- Vector embeddings and adapter weights loaded for that session
- Agent state accumulated across tool calls
A policy breach in use can leak the corpus, not just one query. That is why "we don't train on your data" clauses, while important, do not close the architectural gap.
Legal exposure in practice
Recent litigation shows AI and cloud providers can be compelled to retain or disclose client data—not as a policy debate, but through preservation orders, discovery, and privilege rulings in active cases.
The New York Times v. OpenAI (S.D.N.Y., May 2025)
Magistrate Judge Ona T. Wang ordered OpenAI to preserve ChatGPT logs—consumer chats, Temporary Chat sessions, and API traffic—that would normally be deleted, including data enterprise customers were told would not be retained long-term. The order illustrates that vendor retention policies and customer expectations can diverge sharply once litigation begins.
United States v. Heppner (S.D.N.Y., Feb. 2026)
A defendant used Anthropic's consumer Claude to draft material for counsel. The court held attorney–client privilege and work-product protection did not apply: Anthropic's terms permit collection, training use, and disclosure to third parties and government, so submitting confidential text counted as disclosure to a third party. The matter led to AI-specific protective orders restricting uploads unless the provider contractually bans training and limits third-party disclosure.
These are not abstract compliance hypotheticals. They are active dockets where AI-generated and AI-processed content became discoverable evidence—and where privilege failed because the provider's terms and architecture placed client material outside the legal boundary the parties assumed.
Beyond AI-native vendors
The pattern is broader than AI-native vendors. Hyperscaler cloud providers (Microsoft, AWS, Google) respond to subpoenas and warrants when they hold keys or operational access—transparency reports document the volume. Under the CLOUD Act, U.S. authorities can compel U.S.-based providers for data regardless of storage location. Content processed or generated by AI is electronically stored information subject to eDiscovery like any other business record.
Encryption and trusted execution environments raise technical barriers, but they do not eliminate legal process when a provider retains access or can be ordered to produce logs. Private AI, on-prem enclaves, and contracts that narrowly define retention, training, and disclosure—matched to your threat model—reduce exposure before litigation, not after.
What confidential computing changes
Hardware-based trusted execution environments (TEEs)—confidential VMs on CPU paired with GPU trusted execution where workloads require it—run code and data inside an attested enclave the host OS and hypervisor cannot read or modify. On-prem and air-gapped deployments push the same boundary entirely inside your facility.
Remote attestation closes a gap slide decks cannot: before decryption keys, model weights, or corpus data enter the enclave, a remote verifier checks cryptographic evidence that CPU and GPU firmware, measurements, and policy match a known-good state. Security and audit teams get proof—not marketing—that the environment is genuine.
Private AI architecture still requires governance: who may prompt which model, what corpora are indexed, where logs are retained, and which outputs require human review. Confidential computing shrinks the trust boundary; it does not replace policy, logging, or identity controls.
Honest limits on TEE isolation
Hardware TEEs materially raise the bar, but no isolation is perfect. Side-channel research against shared accelerators remains an active field. Application bugs, availability attacks, and network exposure sit outside the enclave. OWCER designs for defense in depth—realistic threat models, logging, and governance—not absolute security marketing.
When to choose private AI over public SaaS
Public SaaS LLMs may meet productivity goals, but they rarely satisfy policy when prompts, RAG corpora, and model weights must stay protected in use—not only at rest and in transit. Consider private AI architecture when:
- Contract or regulatory language requires data residency, enclave deployment, or air-gapped operation
- Auditors ask who can read memory during inference—not just whether data is encrypted on disk
- Legal counsel needs confidence that privileged material will not become third-party disclosure through a consumer AI account
- Self-hosted GPU clusters or private endpoints exist but lack workflow integration, champion networks, and governance prerequisites
No-training clauses and enterprise agreements document vendor intent. They do not remove the architectural reality that shared infrastructure, operator access, and legal process can expose workload memory unless hardware isolation shrinks the trust boundary.
Primary sources: court filings and reporting on United States v. Heppner (S.D.N.Y. 2026) and the OpenAI/New York Times preservation order (S.D.N.Y. 2025).
For architecture, landing zones, confidential computing, and activation paths matched to your boundary requirements, see Private AI, on-prem or cloud. To assess readiness and prioritize use cases before scale, start with the AI Activation Assessment.
Contact OWCER to design private AI with guardrails your security and compliance teams can defend—not just deploy.














