6 Jun 2026

Deploying agents in your own VPC

The Context team

The first question a serious enterprise asks about an AI vendor is where the data goes. For most AI products the honest answer is our cloud, and that is a non-starter the moment the work touches sensitive IP, customer records, or a regulated process.

So we built the platform to deploy somewhere else: entirely inside the customer's own cloud account. The data does not go anywhere. The platform comes to it.

Three layers, all inside the VPC

Everything runs in the customer's VPC, across multiple availability zones. There are three layers, and none of them sits in our cloud.

The control plane is the web app, the agent runtime (the durable workflow engine and the actor cluster from an earlier post), the authorization layer, observability, and admin tooling. The agent sandboxes are per-session pods on a dedicated, isolated node pool, in the isolation modes covered earlier. The integration surface is the connectors to the customer's systems, plus the per-sandbox proxy every outbound call flows through.

All three are deployed into the customer's account, on their Kubernetes, with their networking. We do not host a copy and forward requests to it. The software runs where the data already is.

Under the hood, the shape is ordinary cloud infrastructure, which is the point. The components sit in private subnets spread across availability zones, behind a private load balancer. A Kubernetes cluster runs the application services and a separate pool for the live actor cluster, with the sandbox node pool kept apart from both. A relational database and an in-memory store hold state. Secrets are pulled from the customer's own secret store through an operator at runtime, rather than baked into images. There is nothing exotic to audit; it looks like any other well-run workload in the account.

The spread across availability zones is not only about where the data sits; it is about staying up. The relational store runs multi-AZ with automatic failover, the node pools span zones, and the durable session state from an earlier post replays across pod and node failures. A zone problem degrades the deployment rather than downing it, inside the customer's account, on the customer's infrastructure.

Customer cloud account: VPC, multiple availability zones
Control plane
web app, agent runtime, authorization, admin console
Sandbox node pool
per-session sandboxes, dedicated and isolated
Data and state
Postgres, the Drive, trace and audit stores
Integration surface
connectors to your systems, and the per-sandbox proxy every outbound call flows through
private endpoint ↓
Chosen model provider or internal LLM gateway
reached over a private path, never the public internet
Every layer runs inside the customer's VPC across availability zones. Inference leaves only over a private endpoint to the customer's chosen model.

The backing services are yours

The stores the platform depends on are created in the customer's account, not handed over from ours. The object storage behind the Drive, the relational database, the secret store: all provisioned in the customer's cloud, under the customer's controls.

That means customer-managed encryption keys, and access through scoped IAM roles rather than long-lived static credentials sitting in a config file. Rotating a key, tightening a role, or revoking access is something the customer does in their own console, on their own schedule, without involving us. The data is not just resident in their account. It is governed by their account.

Inference never touches the public internet

The model is the one piece people assume has to be a call out to someone's API. It does not. Inference routes through the sandbox proxy to the customer's chosen model provider over a private path: a managed model service reached through a VPC endpoint, an on-prem endpoint, or the customer's own internal LLM gateway.

The prompt and the response travel inside the customer's network boundary. A customer who already runs an inference proxy can route through it. Model choice, sensitivity policy, and retention are all workspace-scoped, and every inference call is recorded, prompt and response, in a trace store that also lives in the VPC. The model is reachable, and the traffic to it never leaves the perimeter.

The routing is policy, set by the customer. A workspace can pin a single model or restrict to an allowlist, and a sensitivity policy can send the most sensitive work to an in-VPC or on-prem model while routing everything else to a managed one. Each call is attributed to the workspace, the user, and the task, so an after-the-fact review can say exactly which model saw which data.

The data does not leave, and there is no phone-home

This is the property that makes the whole thing viable for sensitive work. All customer data stays inside the VPC: prompts, agent outputs, files, traces, the audit log, even the credentials. There is no phone-home, no license-verification call, no telemetry back to us.

The learning that makes the platform better, the traces and corrections and rubrics, compounds inside the customer's perimeter, under their governance, not in our cloud. That is a deliberate design choice and not only a compliance one: the most sensitive and most valuable signal a deployment produces is exactly the signal that must not leave. The intelligence stays where the data lives.

Audit and telemetry go to your stack

Every action produces two records, an entry in an append-only audit log and a trace of what the model saw and produced, and both live in the VPC. Neither is something the customer has to take on faith, because both export into the tools they already run: distributed traces through OpenTelemetry, structured logs into whatever SIEM they use, metrics into Prometheus.

Retention is configurable per workspace, resource type, and data classification, with legal hold for anything that must be preserved. The customer watches the platform with the same dashboards and the same alerting they point at everything else, which is the only way oversight actually happens in practice.

The customer runs it; we do not have the keys

A detail that matters for the threat model: we do not have access to the running cluster. All operations, deploys, configuration changes, troubleshooting, are performed by the customer's own administrators, through an admin console that runs in-cluster as a pod.

Updates flow through a gated pipeline the customer drives, from a development channel to beta to stable, with rollback. We ship software; the customer decides when and whether to run it. There is no standing path from our side into a customer's deployment, for the simple reason that we did not build one. You cannot misuse access you were never given.

Before any release applies, preflight checks validate the cluster, and a release can be rolled back if it misbehaves. A separate emergency channel exists outside the normal pipeline for security-critical fixes, so an urgent patch does not have to wait on the full promotion sequence. The customer still initiates all of it. The pipeline just makes the common case safe and the urgent case fast, and nothing applies that the customer did not choose to apply.

When something does need debugging, the in-cluster admin console generates a diagnostic bundle, cluster state, pod logs, and configuration, with customer data and PII automatically redacted. The customer reviews the contents before any of it is sent, and for environments that allow no transmission at all, the bundle can be exported to local storage instead. Even the act of asking for help does not open a channel out of the boundary.

Portable, within reason

It is worth being honest about the shape rather than waving at portability. The platform deploys on managed Kubernetes, and the Helm chart is portable across Kubernetes platforms, so the same deployment runs on the major clouds: AWS, Azure, and GCP. The infrastructure-as-code modules are cloud-specific where they have to be.

For environments that cannot have any egress at all, an air-gapped install is supported, with offline image bundles instead of a registry pull and no external connectivity of any kind. That is the right mode for the most stringent boundaries, and it is a deliberately separate path from the standard VPC deployment, not a checkbox on the same one.

In-boundary is a qualification, not a feature

For a large class of buyers, in-boundary deployment is not a nice-to-have on a feature list. It is the gate that decides whether a conversation happens at all. An organization whose AI agents would touch regulated data or sensitive IP cannot evaluate a multi-tenant product, because the architecture disqualifies it before the demo.

Meeting the strictest deployment posture by default is what gets you into the room with those buyers. A product can be excellent and still never be seen, if the only place it runs is someone else's cloud.

For the most regulated and defense-adjacent environments, the same in-boundary principle extends past the VPC to an on-prem appliance and fully air-gapped operation, where the customer maintains the entire compliance perimeter and the platform runs inside a boundary that is already authorized. The VPC deployment is the commercial-cloud answer. These are the answers for boundaries that commercial cloud cannot enter.

The platform comes to the data

Where does the data go is the right first question, and for an AI platform that will touch the most sensitive work a company does, the only good answer is nowhere. The control plane, the sandboxes, the integrations, the traces, all of it runs inside the customer's own boundary, across their availability zones, reaching their chosen model over a private path, operated by their own administrators. The platform comes to the data. The data stays home.