Building a Federal AWS Environment with Terraform & DevSecOps
Pyramid Systems
01 July 2025
Reading time:
7 min.
Building an AWS environment for federal use is not a generic cloud-engineering exercise. It is a regulated systems-engineering exercise that happens to use AWS as the substrate. The constraints — NIST 800-53, FedRAMP, agency-specific overlays, IronBank-class container hardening — shape every architectural choice.
This post documents the playbook from a federal AWS engagement Pyramid Systems delivered before AWS Control Tower was viable for the workload — meaning the multi-account landing zone, the security baseline, the configuration governance, and the DevSecOps pipelines were all built in Terraform from the ground up. It covers what we chose, what we got wrong, and what we changed when the strategy stopped scaling.
If you are a CTO, platform engineering lead, or DevSecOps architect inheriting or designing a federal AWS environment, the patterns here are reusable — even if Control Tower is now part of your stack. The hard parts of federal cloud are not the AWS services. They are the boundaries, the evidence trail, and the operating model.
Why Federal AWS Is Different
Commercial AWS is permissive by default. Federal AWS — GovCloud or commercial-with-FedRAMP-overlay — is restrictive by default. That single inversion drives almost every design choice.
The constraints that matter most in practice:
Region and partition boundaries. GovCloud is a separate partition with its own service availability calendar and its own IAM. Patterns that work in commercial AWS may not have the same service surface in GovCloud, or may have it three quarters later.
Identity boundaries. Federation typically flows through an agency identity provider (often a CAC/PIV-backed IdP). Local IAM users are minimized or eliminated. Roles assume across accounts through controlled jump paths.
Baseline compliance. NIST 800-53 controls map to specific AWS Config rules, specific AMI builds, specific encryption requirements, specific logging destinations, and specific retention durations. The baseline is not optional, and it is auditable.
Network egress posture. Default-deny outbound is the norm. Allowlisted egress through inspection VPCs, transit gateways, and proxy points is the design pattern — not the exception.
Evidence preservation. CloudTrail, Config, VPC Flow Logs, GuardDuty findings, and KMS key usage are not just operational telemetry — they are evidence. They have to be tamper-resistant and retained on schedule.
None of this is impossible. All of it has to be designed into the environment as code, not bolted on as a runbook.
When Control Tower Wasn't Available
AWS Control Tower automates much of the multi-account landing-zone pattern in commercial AWS today. When the engagement began, Control Tower either was not available for the partition and account model we needed, or did not yet cover the controls and customizations the workload required. We had to build the equivalent ourselves.
The pieces we owned end-to-end in Terraform:
Account vending. A pipeline that created new AWS accounts under the management account, applied baseline guardrails, configured logging destinations, and registered the new account with the rest of the platform.
Service Control Policies. Hierarchical SCPs at the OU level enforced what an account could not do — preventing privilege escalation, region pinning, and dangerous IAM patterns regardless of in-account admin.
Centralized logging. One log-archive account collected CloudTrail and Config from every member account, write-once on S3 with object lock, with KMS-key separation so account owners could not delete their own evidence.
Network landing zone. A shared transit-gateway hub with inspection VPCs, default-deny egress, and per-workload spoke VPCs provisioned through the same Terraform modules so all accounts behaved consistently.
Identity landing zone. Federation through an agency IdP, AWS SSO permission sets mapped to mission roles, and break-glass procedures documented and tested.
The cost of building this yourself is real: months of engineering, ongoing maintenance, and the risk that an AWS service-team change shifts the ground under you. The benefit, when you have to operate without the managed service, is that you control the controls — every guardrail is inspectable, every audit question has a code answer.
How Our Terraform Strategy Evolved
The Terraform strategy went through three phases, and the lessons from each shaped how we structure infrastructure-as-code for federal workloads today.
Phase 1: Monorepo. Everything in a single repository. One state per environment. Simple cross-references, easy “walk the graph” comprehension, one CODEOWNERS file. The problem: a plan over the full graph took 20+ minutes, blast radius was global, and any change required coordination with every team. Velocity declined as the platform grew.
Phase 2: Micro-repos. Every module in its own repository. Per-team CI/CD, fast plans, isolated blast radius. The problem: cross-module references became fragile, semver discipline across dozens of repos was operationally expensive, and an incident response that touched three modules now touched three repositories, three pipelines, and three audit trails. Coordination cost outweighed velocity gains.
Phase 3: Module-per-stack with shared module registry. One repository per deployable stack (network, identity, logging, workload-A, workload-B). Common modules pinned by version from a shared registry. Each stack ships through its own pipeline with its own approvals and its own audit trail, but the underlying primitives — VPCs, KMS keys, IAM roles, logging destinations — come from a single versioned source of truth. Typical deployment time settled around 5 minutes per stack, and blast radius matched the stack boundary.
The principle that survived all three phases: blast radius equals boundary equals audit unit. If three things share a Terraform state, they share an outage, a change-review, and an audit. Federal compliance pushes you toward smaller, more explicit boundaries — the IaC layout should match.
DevSecOps Integration
DevSecOps in federal context is not a separate function bolted onto the platform. It is the way the platform ships every change. The same pipelines that provisioned infrastructure also enforced policy and proved compliance.
What ran inside every infrastructure pipeline:
Static analysis on Terraform. tfsec or Checkov scanned plans against baseline policy — encryption required, public S3 forbidden, IMDSv1 forbidden, default security groups locked down. Findings broke the build, not the audit.
Container image scanning. Hardened base images (IronBank-style, NIST 800-53 mapped) entered the build through a signed registry. Application images inherited from those bases and were re-scanned at build time. Critical CVEs blocked promotion.
Policy-as-code at apply time. Open Policy Agent or AWS Service Control Policies enforced runtime guardrails. A Terraform plan that violated a guardrail was rejected before reaching apply.
Artifact signing and provenance. Built artifacts (container images, Lambda packages, configuration bundles) were signed. Pipelines verified signatures at deploy. Provenance metadata (commit SHA, builder identity, build time) was attached and retained.
Configuration drift detection. A scheduled job re-ran Terraform plans against deployed state. Drift outside an approved change window paged a human owner.
The audit benefit: when an assessor asks “how do you ensure encryption is enabled on all storage?”, the answer is “the Terraform policy that blocks unencrypted storage is at this path, the most recent enforcement run is this report, here is the pipeline that ran it.” Evidence is a query, not an interview.
Six Lessons Learned
From the engagement, six lessons translate beyond the specific stack:
1. Account boundaries are political, not just technical. Where the account lines are drawn determines who can do what, who pays for what, and who is responsible when something breaks. Negotiate boundaries with program offices before you provision — redrawing them post-launch is painful.
2. Modules are products. Internal Terraform modules need versioning, changelogs, deprecation policy, and consumers. Treat them like a library — because that is what they are. Modules without owners become technical debt.
3. Drift is a leading indicator. Drift is rarely a Terraform problem. It is a process problem — someone made a change outside the pipeline and either could not wait for the pipeline or did not trust it. Address the process, not the symptom.
4. Golden images age fast. CIS-hardened AMIs and IronBank-class containers need a refresh cadence (monthly is reasonable). Without one, the security posture you certified at go-live diverges from production within a quarter.
5. Pipelines need owners. Every pipeline needs a named owner with both the authority and the time to maintain it. Unowned pipelines accumulate workarounds, accumulate failures, and eventually accumulate exemptions that defeat the security model.
6. Audit traceability has to be a side effect of the work. If the audit trail is constructed from interviews and screenshots after the fact, you have already lost. The artifacts that prove compliance — logs, plans, scan results, signed deploys — have to be produced by the normal pipeline, retained on schedule, and queryable by the assessor without a human in the loop.
Conclusion
Building a federal AWS environment in Terraform — before Control Tower could carry the load — is a stress test of an engineering team's ability to balance compliance, velocity, and blast-radius discipline. The right answer is not more tooling. It is fewer, smaller, better-bounded units of infrastructure, each shipped through pipelines that produce evidence as a by-product of normal operation.
The patterns above — module-per-stack, baseline as code, policy and provenance in the pipeline, drift as a leading indicator, audit traceability as a side effect — outlive any individual AWS service. They are how Pyramid Systems delivers federal cloud today, and they are how we evaluate platform health for the agencies we partner with.
FAQ
What is AWS Control Tower and why did Pyramid build a custom alternative?
AWS Control Tower is a managed service that automates multi-account AWS environment setup with built-in governance. When this engagement began, Control Tower was either unavailable for the partition and account model required, or did not yet cover the controls the workload demanded. Pyramid built the equivalent in Terraform: account vending, hierarchical Service Control Policies, centralized tamper-resistant logging, a shared network landing zone, and federated identity through the agency IdP.
Is monorepo or micro-repo better for Terraform at federal scale?
Neither, in isolation. Pyramid's strategy evolved through monorepo (simple references, slow CI/CD), then micro-repos (fast deployments, coordination overhead), and settled on a balanced module-per-stack pattern. One repository per deployable stack with common primitives pinned by version from a shared module registry. Typical deployment time landed around 5 minutes per stack, with blast radius matching the stack boundary.
What compliance baselines does this federal AWS environment meet?
The environment uses CIS-hardened AMIs aligned to NIST 800-53, IronBank-style hardened containers, continuous configuration monitoring via AWS Config, centralized tamper-resistant logging in a dedicated log-archive account, and policy-as-code enforcement inside the Terraform pipelines. Evidence is produced as a by-product of normal operation rather than reconstructed at audit time.
How long does a Terraform deployment take in this setup?
After moving from monorepo to a balanced module-per-stack structure with shared versioned modules, typical deployment times dropped to approximately 5 minutes per stack. The reduction came from smaller plan graphs, better-bounded blast radius, and per-stack pipelines that no longer waited on unrelated changes.
How does DevSecOps integrate with Terraform on federal AWS?
Static analysis (tfsec/Checkov), container image scanning against hardened base images, policy-as-code at apply time, artifact signing with provenance, and scheduled drift detection all run inside the same pipelines that provision infrastructure. Security gates and infrastructure changes share a single audit trail, so assessors can query the evidence directly rather than interview the engineers who produced it.