Infrastructure is the layer most teams want to skip past on AI engagements, because the model and the retrieval feel more interesting. Skipping it is how engagements end with a working pipeline that legal will not sign off on, or with a working pipeline that costs three times what it should because no one set up private endpoints. This is the layer where most of the boring decisions live, and getting it right is what makes the rest of the stack possible.
What this layer covers
Compute, networking, identity, accounts, secrets, KMS, and the IaC that holds it all together. For a production AI workload, the specific questions are:
- Where does inference traffic terminate, and how does it leave the customer's network boundary (if at all)?
- Which AWS accounts hold what — and how is blast radius contained between them?
- Who owns the KMS keys that encrypt the document corpus, the index, the inference logs?
- How is the whole stack reproduced from source — including the Bedrock model configs, the OpenSearch domain, the VPC endpoint policy?
- What happens at handoff, when the customer's team operates this without us?
Default reference architecture
This is the shape we end up at on most engagements. We deviate when the customer has a strong existing pattern we should respect, but the default is the default for good reasons.
Account topology
Three accounts at a minimum, often more:
- A workload account for the production AI pipeline — Bedrock VPC endpoints, OpenSearch domain, Lambda or ECS for the orchestration layer, the customer-facing API gateway.
- A data account for the source corpus — S3 with the original documents, the parsed/chunked variants, and the lineage manifest. Permissioned through cross-account IAM roles, never shared SDK credentials.
- A platform / ops account for CI/CD, observability collection, log archival, and the audit trail. This is the account your security team should have unfettered read-only access to.
Multi-account is not optional once the engagement is real. A single account is fine for the spike. It is not fine for the production handoff — blast radius from a misconfigured IAM policy in a single account can be the entire customer relationship.
Network boundary
Bedrock VPC endpoint, in the workload account, in the region the
customer's data already lives. No traffic to bedrock-runtime.{region}.amazonaws.com
crosses the public internet. The endpoint policy restricts which
principals can call which models — locking the production pipeline
to a known model ID, and forbidding others by default.
OpenSearch Serverless or a managed OpenSearch cluster — depending on scale and reserved-capacity needs — sits in the same VPC, behind a similar private endpoint. The pipeline never sees a public DNS name for the index. We log the VPC flow logs to the platform account so the security team can trace any anomalous egress.
If your AI pipeline can be reproduced by anyone who has a Bedrock API key on a laptop, you have not architected an infrastructure layer. You have a developer-mode prototype. The boundary is the point.
KMS and key ownership
Customer-managed KMS keys for: the S3 buckets holding the corpus,
the OpenSearch domain, the inference log group, the secrets that
hold any third-party tokens. The key policies restrict
kms:Decrypt to the workload account's service roles only.
The customer's security team can rotate or revoke any of these keys
without us in the loop — that is the test of "data lives in your
accounts" being real and not marketing.
On engagements with regulated data (HIPAA, PCI, government), we additionally pin the KMS keys to specific regions and the inference endpoints to specific availability zones — so the residency guarantee can be audited from the resource policy, not just from a written promise.
IaC pattern
CDK in TypeScript, by default. Sometimes the customer mandates Terraform or Pulumi — fine, we will adapt. The substance is the same regardless of the tool. Every resource on the diagram above is in source control. Every parameter — model ID, embedding model ID, chunking strategy version, retrieval index name — is in source control. The deployment is reproducible from a clean account.
A specific pattern we like:
Sprintsail
primitives for the application-facing pieces (the API, the workers,
the cache), composed alongside raw CDK or Terraform for the
Bedrock + OpenSearch + KMS + VPC endpoint resources. Sprintsail's
nine primitives cover ~80% of the application surface and let us
move the workload between AWS and the Sprintsail Runtime with
sail migrate; the AI-specific resources are bespoke per
engagement.
Build vs. buy at this layer
Default: buy. The cloud is the canonical commodity. AWS for the default — and we are extremely comfortable here — Azure or GCP when the customer's strategic relationship makes that the right call. Nobody is going to differentiate on rack management, and the customer's compliance posture almost always already includes the cloud provider.
Things to build at this layer:
- The IaC. Source-available, your team owns it. We will write it; you will run it.
- The KMS policy. Boilerplate, but the boilerplate is yours. Don't copy a vendor's example into production.
- The cross-account IAM glue. Roles, policies, the assumption chain. Each one named for what it does, not "lambda-execution-role-1."
Things to buy at this layer:
- Bedrock for inference. Hosting your own model on EC2 GPUs to save a few dollars-per-million-tokens is a sinkhole at engagement scale. The exceptions are real but rare: regulatory residency that no vendor meets, model fine-tunes that genuinely beat hosted models on a domain-specific eval, or workloads at a scale where the GPU TCO genuinely wins.
- OpenSearch Serverless for the vector + lexical index. The team-month it would take to operate your own Elasticsearch cluster goes into retrieval quality instead.
- KMS as the encryption substrate. Building your own envelope-encryption scheme is a category mistake.
The five mistakes we see
1. Single account, single region, "we will fix it later"
Later does not come. The right time to set up multi-account is at the start of the spike, because every other decision after that — IAM, KMS, logging — is shaped by the account boundary. Retrofitting multi-account onto a working system is six weeks of pure overhead work that ships zero new capability.
2. Bedrock calls over the public internet
Default Bedrock endpoints are public. Forgetting to set up the VPC
endpoint and the endpoint policy means your inference traffic
leaves the customer's network — even if the rest of the
architecture is private. We have audited deployments where the
architecture diagram showed Bedrock-on-VPC and the actual traffic
went through the public DNS. Always test with
aws bedrock-runtime invoke-model from a host with no
internet egress — if the call works, the endpoint is in place. If
it errors, the diagram lies.
3. Inference log group with no retention or KMS
The default CloudWatch log group for Bedrock invocations has no encryption at rest beyond the AWS-owned default key, and indefinite retention. That is a compliance fire and a cost fire at the same time. Set the log group's KMS key to a customer-managed key, set retention based on what the audit trail actually needs (90 days, 365 days — not "forever"), and route a copy of any flagged inferences to a longer-term store with a different key.
4. Secrets in the Lambda environment
Plain environment variables in the orchestration Lambda holding the third-party API tokens, the Slack webhook, the Anthropic key for fallback. Use Secrets Manager (or Parameter Store with SecureString) and rotate. The orchestration code resolves them on cold start, with KMS calls auditable in CloudTrail.
5. The handoff that wasn't
"We deployed the stack" is not a handoff. A handoff means the customer's team can teardown and redeploy the entire stack to a new account in an afternoon — because the IaC is theirs, the IAM trust chain is documented, the KMS key replacement procedure is written down, and the on-call runbook covers the boring outages (a region Bedrock outage, an OpenSearch domain reaching its index limit). If they cannot do that, you have not handed off; you have just stopped showing up.
How it connects to the other layers
Layer 01 shapes everything above it. The retrieval layer only works if OpenSearch is reachable from the orchestration runtime and not from the public internet. The data layer only works if the S3 buckets are KMS-encrypted with the right key policy. The governance layer (auditing, prompt-injection defenses, refusal logging) only works if the inference traffic is captured and stored where the customer's auditor can read it.
Build it first. Build it right. The rest of the stack follows.
Related: the retrieval layer reference architecture, the tooling catalog (what we use at each layer and why), and the Quantum Leap initiative page (the full nine-layer footprint).