for AWS partners →Get a partner to build your EKS →

amazon eks setup · production guide · 2026

Amazon EKS setup — a production-ready cluster, not a demo, step by step.

Q: How long does it take to set up an Amazon EKS cluster?

Creating a working cluster takes about 15–20 minutes with eksctl or Terraform — that part is genuinely quick. Making it production-ready is the real timeline: networking sized against IP exhaustion, the data-plane choice (Karpenter/Fargate/node groups), pod-level IAM with IRSA or Pod Identity, ingress via the AWS Load Balancer Controller, observability, security hardening, GitOps delivery, and cost controls. As a focused build that is commonly 2–6 weeks depending on depth and how many environments you need. The runnable checklist in this guide is how you know when it's actually done.

Q: Should I use eksctl, Terraform/OpenTofu, or EKS Auto Mode to create the cluster?

Use eksctl to prototype and learn — one command gives you a working cluster fast, but it manages its own CloudFormation stacks separate from the rest of your infrastructure. Use Terraform or OpenTofu (the open fork; Terraform itself is now BSL-licensed) for production, with the standard terraform-aws-modules/eks module, so the cluster lives in one state alongside the rest of your AWS infra and every change is reviewable. Consider EKS Auto Mode if you don't have a platform engineer — AWS runs the data plane (compute, scaling, add-on lifecycle) for a modest premium, leaving you far less to operate. CDK, Pulumi, and CloudFormation are fine alternatives if your team already standardizes on them.

Q: Managed node groups vs Fargate vs Karpenter — which should I use?

In 2026, most production clusters default to Karpenter for general workloads: it launches right-sized EC2 per pending pod in seconds and consolidates waste, typically cutting compute cost 30–50% versus static groups when tuned with a Spot strategy. Keep a small managed node group for cluster-critical add-ons that need guaranteed capacity. Reach for Fargate selectively — spiky workloads, strong per-pod isolation, or eliminating node operations entirely — accepting its steady-state premium and limits (no DaemonSets, no GPU, slower start). They combine; pick a primary model deliberately rather than ending up with static node groups by default.

Q: How do I give pods AWS permissions safely in EKS?

Use IAM Roles for Service Accounts (IRSA) or EKS Pod Identity — never the node instance role and never static AWS keys in a Secret or environment variable. IRSA maps a Kubernetes service account to an IAM role via an OIDC provider; Pod Identity is the newer, simpler option where you install one add-on and create service-account-to-role associations through the EKS API (easier across many clusters). Both give each workload least-privilege, keyless AWS access. Pair them with namespace-scoped RBAC and secrets pulled at runtime from AWS Secrets Manager via the External Secrets Operator. This is table stakes for SOC 2, HIPAA, ISO 27001, and PCI.

Q: What is the most common EKS setup mistake?

VPC CNI IP exhaustion. The default Amazon VPC CNI gives every pod a real VPC IP, so subnets sized for a few nodes run out of addresses as the cluster grows, and pods stop scheduling with "insufficient IP addresses." Prevent it up front with generous CIDR planning and prefix delegation (each node gets a /28 block, hosting far more pods per IP). Other frequent ones: pods on the node IAM role instead of IRSA/Pod Identity, one ALB per service inflating the bill, static node groups with no Karpenter consolidation, and hand-run kubectl apply with no GitOps or clean rollback.

Q: How do I expose services running on EKS to the internet?

Install the AWS Load Balancer Controller. It provisions an Application Load Balancer (ALB) for HTTP/HTTPS via Kubernetes Ingress, or a Network Load Balancer (NLB) for raw TCP/UDP via Service. Configure TLS termination with ACM certificates, add AWS WAF where needed, and — importantly — consolidate many services behind a shared ALB using Ingress grouping rather than letting each service spin up its own load balancer, which is a common and avoidable cost leak. For internal-only services, the controller can provision internal load balancers in your private subnets.

Q: What does Amazon EKS cost?

The fixed cost is small: roughly $0.10/hour (~$73/month) per cluster for the managed control plane, plus an extended-support surcharge if you run an older Kubernetes version past standard support. Everything else is variable — the EC2 or Fargate compute, load balancers, NAT gateway data processing, EBS volumes, and observability — and that is where bills balloon. The biggest savings come from Karpenter consolidation and Spot for fault-tolerant workloads, right-sized requests and limits, consolidated ingress, and per-namespace cost visibility (Kubecost or AWS split-cost allocation). An untuned cluster commonly wastes 30–50% of its compute.

Q: Can CloudRoute just build the EKS cluster for me — and is it really $0?

Yes. CloudRoute matches you within 24 hours to a vetted AWS partner with real EKS production experience who builds a production-ready cluster to the checklist in this guide and hands it to you as infrastructure-as-code (Terraform/OpenTofu, GitOps manifests, runbooks) that you own. For credit-eligible companies — typically institutionally funded or running a qualifying workload — the build is frequently AWS-funded: the partner is paid through AWS partner programs and your AWS usage during the build is covered by Activate credits, so your out-of-pocket is $0 or low cost. The honest limit: that applies to credit-eligible engagements. If you don't qualify, it's still a vetted-partner referral and you pay the partner directly — and we tell you which case applies up front.

Spinning up an Amazon EKS cluster takes one command. Making it production-ready — networking that won't exhaust IPs, autoscaling that's actually cheap, pod-level IAM with no static keys, ingress that doesn't spawn a load balancer per service, observability your on-call can use, and a security posture that passes an audit — is the real work. This guide walks the whole path: how to create the cluster (eksctl, Terraform/OpenTofu, or EKS Auto Mode), the data-plane choice (managed node groups vs Fargate vs Karpenter), networking, identity, add-ons, and a checklist to call it done. Then how CloudRoute matches you to a vetted partner who builds it — often AWS-funded, so you pay $0.

Get a partner to build your EKS →→ jump to the readiness checklist

cluster up in

~20 min

production-ready in

2–6 wks

matched within

< 24h

cost if credit-eligible

TL;DR

Creating an EKS cluster is the easy 20 minutes; production-readiness is the weeks that follow. The work that matters: pick a creation tool you can reproduce (eksctl for speed, Terraform/OpenTofu for real infra-as-code, or EKS Auto Mode to let AWS run the data plane), choose your compute model (managed node groups, Fargate, or — increasingly the default — Karpenter), size the VPC and CNI so you never hit IP exhaustion, give pods least-privilege AWS access with IRSA or EKS Pod Identity, put ingress behind the AWS Load Balancer Controller, and wire observability, security, and cost controls before you take traffic.
The single highest-leverage decision is the data plane. Managed node groups are predictable but static; Fargate removes nodes entirely (great for spiky or isolation-sensitive workloads, premium-priced at steady state); Karpenter — now AWS's recommended node autoscaler — provisions right-sized EC2 in seconds, consolidates waste, and is where most of the cost savings live. Most production clusters in 2026 run Karpenter for general workloads and reach for Fargate selectively.
You can build all of this yourself with this guide — or have a vetted AWS partner do it. CloudRoute matches you to one in under 24 hours, scoped to a production-ready cluster you own as code (Terraform/OpenTofu, GitOps manifests, runbooks). For credit-eligible companies the build is frequently AWS-funded — the partner is paid through AWS programs and your AWS usage is credit-covered — so you pay $0 or low cost. For everyone else it's a vetted referral that skips the hiring slog.

step 1 — create the cluster

ICreating the cluster: eksctl vs Terraform vs EKS Auto Mode

There are three sane ways to create an Amazon EKS cluster in 2026, and the right one depends on whether you want speed, reproducible infrastructure-as-code, or to hand the whole data plane to AWS. The mistake to avoid is clicking through the console once and never being able to recreate what you built.

Whatever you choose, the goal is the same: a cluster definition that lives in a repository, is reviewable, and can be rebuilt from scratch in a different account or region without tribal knowledge. The console is fine for learning EKS; it is not how you run it. Below are the three reproducible paths and when each fits.

eksctl — fastest to a working cluster

eksctl is the official CLI for EKS. A single `eksctl create cluster` (or, better, a checked-in `ClusterConfig` YAML) provisions the control plane, a VPC, subnets across multiple Availability Zones, and a starter node group in roughly 15–20 minutes. It is the quickest way to a working cluster and ideal for proofs of concept, learning, and teams that don't yet have a Terraform practice. The trade-off: eksctl owns its own CloudFormation stacks, so if the rest of your infrastructure is in Terraform, you end up with two state systems and drift between them. Use eksctl to learn and prototype; graduate to Terraform when EKS becomes load-bearing.

Terraform / OpenTofu — the production default

For anything you intend to run in production, define the cluster as code with Terraform (HashiCorp, now BSL-licensed) or OpenTofu (the open-source fork — a drop-in for most modules and the safer license choice for many teams). The community `terraform-aws-modules/eks` module is the de-facto standard: it wires the control plane, node groups or Fargate profiles, the VPC CNI and other add-ons, access entries, and IRSA in one reviewable configuration that lives beside the rest of your AWS infrastructure in one state. AWS CDK, Pulumi, and raw CloudFormation are all viable alternatives if your team already standardizes on them. The point is one IaC tool, one state, one source of truth — so the cluster is reproducible and every change goes through review.

EKS Auto Mode — let AWS run the data plane

EKS Auto Mode (GA since late 2024) is the newest option and changes the calculus for smaller teams. With Auto Mode, AWS manages compute provisioning (Karpenter is built in and operated by AWS), core networking, load balancing, and add-on lifecycle for you — you declare workloads and AWS handles the nodes, scaling, patching, and much of the operational glue underneath. You still define the cluster in eksctl or Terraform, but you operate far less of it. For a team without a dedicated platform engineer, Auto Mode trades a modest cost premium for dramatically less to run, and is often the right call. Teams that need fine-grained control over node configuration, custom AMIs, or specialized scheduling may prefer to run the data plane themselves.

the practical default

For most teams in 2026: prototype on eksctl, run production on Terraform/OpenTofu with the standard EKS module, and seriously consider EKS Auto Mode if you don't have a platform engineer to operate the data plane. Whatever you pick, pin the Kubernetes version explicitly and decide your upgrade cadence on day one — EKS supports each version for a defined window, and clusters that drift onto unsupported versions are the most painful to rescue later.

step 2 — the data plane

IIThe data plane: managed node groups vs Fargate vs Karpenter

This is the decision that most shapes your cluster's cost, scaling behaviour, and operational burden. EKS gives you a managed control plane; how the worker capacity behaves underneath is your choice — and in 2026 the choice is essentially node groups, Fargate, Karpenter, or a deliberate mix.

These are not mutually exclusive — a real cluster often runs Karpenter for general workloads, a small managed node group for cluster-critical add-ons that need stable capacity, and Fargate for a handful of isolation-sensitive jobs. But you should choose a primary model on purpose rather than ending up with one by accident. The three options, and the full comparison table further down, lay out the trade-offs.

Managed node groups — predictable EC2 capacity

A managed node group is a set of EC2 instances of a chosen type that EKS provisions and lifecycle-manages (drains and replaces them on upgrades). They are predictable and simple to reason about, which is why they remain a sensible home for cluster-critical components that should always have capacity. The downside is that they are relatively static: you pick instance types and min/max sizes up front, capacity arrives slowly when you scale, and right-sizing across diverse workloads is a manual, ongoing chore. On their own, node groups are where clusters quietly waste money.

AWS Fargate — serverless pods, no nodes to manage

Fargate runs each pod on its own right-sized, AWS-managed micro-VM — there are no EC2 instances for you to patch, scale, or secure. You assign pods to Fargate via a Fargate profile (namespace + label selectors). It shines for spiky or unpredictable workloads, for strong workload isolation (each pod is its own VM), and for teams that want to eliminate node operations entirely. The trade-offs are real: a higher per-vCPU/GB price than EC2 at steady state, no DaemonSets, no GPU or privileged workloads, and slightly slower pod start. Use Fargate selectively — batch jobs, bursty services, isolation-sensitive workloads — rather than as the whole cluster.

Karpenter — right-sized nodes in seconds (the 2026 default)

Karpenter is AWS's open-source node autoscaler and is now recommended over the older Cluster Autoscaler for most clusters. Instead of scaling fixed node groups, Karpenter watches for unschedulable pods and launches the optimal EC2 instance type and size to fit them — in seconds, not minutes — then consolidates underused nodes by repacking pods onto fewer instances. You configure NodePools and EC2NodeClasses to express what capacity is allowed (instance families, Spot vs On-Demand, AZs), and Karpenter does the bin-packing. This is where most of the cost savings from a good build come from: getting consolidation, a sensible Spot/On-Demand split for fault-tolerant workloads, and interruption handling right routinely cuts compute spend 30–50% versus static node groups.

step 3 — networking

IIIVPC, subnets, and the Amazon VPC CNI

EKS networking is where the most common production failure hides. The default Amazon VPC CNI gives every pod a real VPC IP address — powerful, because pods are first-class on your network and reach RDS, ElastiCache, and other resources natively — but it makes IP exhaustion a genuine outage mode when subnets are sized for a handful of nodes and then the cluster grows.

Plan the network before you create the cluster, not after the first "insufficient IP addresses" page. The core decisions: a VPC with both private and public subnets spread across at least two (ideally three) Availability Zones; worker nodes and pods in the private subnets; load balancers in the public subnets; and CIDR ranges sized generously for the pod density you expect — because with the VPC CNI, every pod consumes a VPC IP.

The two techniques that prevent IP exhaustion are prefix delegation (assigning each node a /28 block of IPs so it can host far more pods without burning through the subnet) and simply allocating a large enough secondary CIDR. For clusters that will grow, both are effectively mandatory. When you need Kubernetes NetworkPolicy for east-west segmentation, you either enable the VPC CNI's built-in network-policy support or swap in a CNI like Cilium; security-group-per-pod is available when specific workloads need their own security groups. Getting subnet sizing and prefix delegation right on day one is the difference between a cluster that scales quietly and one that falls over under load.

the IP-exhaustion trap

The most common EKS rescue call is "pods stopped scheduling with insufficient IP addresses." It is almost always undersized subnets plus the VPC CNI handing every pod a real IP. The fix — CIDR planning and prefix delegation — is cheap and fast before it pages you, and disruptive after. Size the network for where the cluster is going, not where it starts.

step 4 — identity & access

IVIRSA and EKS Pod Identity: give pods AWS access without static keys

Kubernetes runs its own permission system (RBAC) on top of AWS IAM, and the two have to be reconciled. The single most important security decision at setup is how pods obtain AWS permissions — and the answer in 2026 is never "the node's IAM role" and never "static keys in an env var."

There are two correct mechanisms, and you should use one of them from day one. IAM Roles for Service Accounts (IRSA) maps a Kubernetes service account to an IAM role via an OIDC provider, so each workload assumes exactly the AWS permissions it needs — no long-lived credentials, scoped per service account. EKS Pod Identity is the newer, simpler alternative: instead of per-cluster OIDC setup, you install the Pod Identity Agent add-on once and create associations between service accounts and IAM roles through the EKS API, which is easier to manage across many clusters. Both deliver the same outcome — least-privilege, keyless AWS access per pod.

The anti-patterns this replaces are exactly the ones that fail audits: pods inheriting the node instance role (so every pod gets whatever the node can do — wildly over-privileged), or AWS access keys baked into a Secret or environment variable (leakable, long-lived, and a finding waiting to happen). For SOC 2, ISO 27001, HIPAA, or PCI, IRSA or Pod Identity plus namespace-scoped RBAC is table stakes. Pair it with secrets pulled at runtime from AWS Secrets Manager via the External Secrets Operator rather than committed into the cluster.

Which to choose

For a new cluster, EKS Pod Identity is usually the simpler choice — less per-cluster setup and easier to operate at scale, especially across multiple clusters. IRSA remains the right call when you need its broader compatibility or already have it wired across your estate. Either is correct; what matters is that you use one of them and never the node role or static keys.

step 5 — ingress & add-ons

VIngress, the AWS Load Balancer Controller, and the add-ons you actually need

Getting traffic into the cluster, and choosing the right set of cluster add-ons, is the step where clusters either stay lean or accrete cost and complexity. The AWS-native ingress pattern is well-trodden; the add-on list is where discipline matters.

For ingress, install the AWS Load Balancer Controller. It provisions an Application Load Balancer (ALB) for HTTP/HTTPS Ingress resources or a Network Load Balancer (NLB) for raw TCP/UDP Services. This is where TLS termination with ACM certificates, AWS WAF, and host/path-based routing get configured. The expensive mistake — and a frequent line item on inflated bills — is letting every service create its own load balancer; instead, consolidate many services behind a shared ALB using Ingress grouping. Done right, ingress is one or a few load balancers, not dozens.

On add-ons, EKS manages the core ones (the VPC CNI, CoreDNS, kube-proxy, and the EBS/EFS CSI drivers) as managed add-ons you can version and upgrade cleanly — prefer those over hand-installed equivalents that drift. Beyond the core, the typically-needed set is: the AWS Load Balancer Controller (ingress), Karpenter (autoscaling, unless on Auto Mode), the External Secrets Operator (secrets from Secrets Manager), cert-manager or ACM for certificates, a metrics pipeline, and a GitOps controller. Resist installing more than you will operate — every add-on is something to keep current and secure. EKS Auto Mode folds much of this list into the managed plane, which is precisely its appeal.

step 6 — observability & hardening

VIObservability and security hardening

A cluster you cannot see and cannot defend is not production-ready, however well it scales. These two workstreams turn a running cluster into one you can operate on-call and put in front of an auditor.

Observability means metrics, logs, and traces wired before you take traffic — not bolted on after the first incident. The common stacks: Amazon Managed Service for Prometheus with Amazon Managed Grafana, CloudWatch Container Insights, or Datadog; a node-level log pipeline (Fluent Bit) shipping to CloudWatch Logs or your log store; and OpenTelemetry for distributed traces. The deliverable that matters is not dashboards for their own sake but alerts that fire on symptoms users actually feel — latency, error rate, saturation tied to your SLOs — rather than on every CPU spike, so the on-call rotation isn't buried in noise while real incidents slip through.

Security hardening checklist

The baseline hardening every production EKS cluster should have: pod permissions via IRSA or Pod Identity (covered above), namespace-scoped RBAC with no blanket cluster-admin, a private API server endpoint (or tightly restricted public access), encrypted secrets and EBS volumes with KMS, control-plane audit logging enabled and shipped to CloudWatch, Pod Security Standards (restricted profile) enforced, NetworkPolicies for east-west segmentation, image scanning in the pipeline with provenance you trust, and a managed upgrade cadence so the cluster never runs an unsupported Kubernetes version. None of these are exotic; together they are the gap between a demo cluster and one that passes SOC 2 or HIPAA.

Reliability primitives

Resilience comes from small, unglamorous additions: liveness and readiness probes on every workload, PodDisruptionBudgets so a routine node recycle or upgrade doesn't take a service down, topology spread across Availability Zones, sensible resource requests and limits so the scheduler can bin-pack without OOM kills, and Multi-AZ everything. With these in place, a node failure is a non-event; without them, it's an incident.

alert on symptoms, not noise

The fastest way to make an on-call rotation quit is to page them on raw CPU and memory. Alert on what users feel — latency, error rate, saturation — tied to SLOs, and let dashboards carry the rest. A cluster with clean symptom-based alerting and disruption budgets survives node failures without anyone waking up; that is the bar for "production-ready," not "it's running."

step 7 — cost

VIIWhat EKS costs — and where the bill leaks

EKS billing is the control-plane fee plus everything the data plane and surrounding services consume, and an untuned cluster wastes a large share of the latter. Knowing where the leaks are is half of controlling them.

The fixed part is small and predictable: AWS charges a per-hour fee for each EKS cluster's control plane (on the order of ~$0.10/hour, roughly $73/month per cluster), plus an additional fee if you opt into extended support for older Kubernetes versions. Everything else is variable — the EC2 or Fargate compute your workloads run on, the load balancers, NAT gateway data processing, EBS volumes, and observability. The variable side is where bills balloon, and almost always for the same reasons.

Static node groups with no consolidation — Fixed instance types sized for peak, running half-empty at trough. Moving general workloads to Karpenter with consolidation and a Spot strategy for fault-tolerant services is typically the single biggest saving — often 30–50% of compute.
Oversized (or missing) requests and limits — Requests set too high reserve capacity nobody uses; missing requests make bin-packing impossible. Right-sizing from real usage data shrinks the node count directly.
One load balancer per service — Every standalone ALB/NLB is a recurring charge. Consolidating behind shared ALBs via Ingress grouping cuts a surprising line item.
NAT gateway and cross-AZ data — Chatty workloads and per-AZ NAT gateways quietly accumulate data-processing charges. VPC endpoints for AWS services and AZ-aware routing help.
No per-team cost visibility — If nobody can see what a namespace costs, nothing gets optimized. Kubecost or AWS split-cost allocation makes spend visible per team so it actually gets managed.

the honest gate

VIIIBefore you build: do you even need EKS, or would ECS do?

The most useful thing to settle before standing up EKS is whether you need Kubernetes at all. A large share of teams reaching for EKS would ship faster and operate more cheaply on Amazon ECS with Fargate or on App Runner — and the right time to learn that is before the cluster exists, not after.

EKS earns its complexity when you have many services across multiple teams and want a shared internal platform, when you need portability or a real multi-cloud story, when you run workloads Kubernetes handles distinctly better (GPU scheduling, complex stateful systems, service mesh, rich operators), or when you already have Kubernetes expertise on the team. If none of those are true — a handful of containerized services, a small team, AWS-only is fine, and you'd rather spend engineering hours on product than on operating a platform — ECS with Fargate gives you serverless containers with a fraction of the operational surface, and App Runner is simpler still for straightforward web services.

This isn't an argument against EKS; it's an argument for choosing it on purpose. The images and CI you build on ECS transfer to EKS later, so starting simple is rarely a dead end. If your honest answer to "why Kubernetes?" is "it's the standard" or "we might need it later," that's a signal to start on ECS + Fargate and graduate when a concrete need appears. If your answer is "we have N teams and M services and need a platform with these specific capabilities," then read on and build EKS well. The companion ECS-vs-EKS and Amazon ECS setup guides go deeper on the decision and the simpler path.

call it done

IXThe production-readiness checklist

A cluster is "production-ready" when it can take real traffic, survive a node failure without paging anyone, deploy reversibly, and pass an audit. Run this checklist before you cut over — if you can't tick an item, that's the remaining work, not a nice-to-have.

Defined as reproducible code — The cluster lives in Terraform/OpenTofu (or eksctl config / CDK), in one state, reviewable, and rebuildable in a clean account. No console-only resources.
Kubernetes version pinned with an upgrade cadence — Version is explicit, on a supported release, with a documented plan to upgrade before support ends. Core add-ons are EKS-managed, not hand-installed.
Multi-AZ across the board — Nodes, pods (topology spread), and any stateful data span at least two Availability Zones. A single-AZ failure is a non-event.
Autoscaling that consolidates — Karpenter (or Cluster Autoscaler) for nodes with consolidation enabled, plus HPA/KEDA for pods. A deliberate Spot/On-Demand split for fault-tolerant workloads.
Network sized against IP exhaustion — Private/public subnets across AZs, CIDRs sized for target pod density, prefix delegation enabled, and NetworkPolicy where east-west segmentation is required.
Pod-level IAM with no static keys — IRSA or EKS Pod Identity for every workload that touches AWS, namespace-scoped RBAC, and secrets pulled from Secrets Manager via External Secrets — never the node role or env-var keys.
Consolidated ingress with TLS — AWS Load Balancer Controller, shared ALBs via Ingress grouping, ACM certificates, and WAF where needed — not one load balancer per service.
Observability with symptom-based alerts — Metrics, logs, and traces flowing (Managed Prometheus/Grafana, CloudWatch, or Datadog) with alerts on latency/error-rate/saturation tied to SLOs.
Security hardening complete — Private/restricted API endpoint, KMS-encrypted secrets and volumes, control-plane audit logging shipped, Pod Security Standards enforced, image scanning in CI.
Reliability primitives in place — Liveness/readiness probes, PodDisruptionBudgets, resource requests and limits, and quotas — so node recycles and upgrades don't cause incidents.
GitOps delivery with rollback — Argo CD or Flux reconciling from Git as the source of truth, image scanning in the pipeline, and automated rollback (Argo Rollouts) for bad deploys.
Cost controls and visibility — Right-sized requests, Karpenter consolidation, Savings Plans for the steady-state baseline, and per-namespace cost visibility (Kubecost or split-cost allocation).

who builds it

Every item above is achievable in-house with time and a platform engineer who has run EKS before. If you don't have one — or want it done right the first time without a hiring search — CloudRoute matches you to a vetted AWS partner who delivers exactly this list as infrastructure-as-code you own, and for credit-eligible companies the build is often AWS-funded so you pay $0. See the Kubernetes consulting path, the $100K AWS credits route, and the startup engagement detail.

the data-plane decision

Managed node groups vs Fargate vs Karpenter — side by side

The compute model is the choice that most shapes cost, scaling, and operational burden. These aren't mutually exclusive — many clusters run Karpenter for general workloads, a small node group for critical add-ons, and Fargate selectively — but pick a primary on purpose.

Variable	Managed node groups	AWS Fargate	Karpenter
What it is	EKS-managed sets of EC2 instances of chosen types	Serverless pods — each on its own AWS-managed micro-VM, no nodes	Autoscaler that launches right-sized EC2 per pending pod, then consolidates
Best for	Stable, cluster-critical capacity (core add-ons); predictable baseload	Spiky/unpredictable load, strong per-pod isolation, eliminating node ops	General production workloads wanting cost-efficient, fast, dynamic capacity
Scaling	Static — fixed instance types, slow to add capacity	Instant per-pod; AWS handles it entirely	Seconds to provision; bin-packs and consolidates continuously
Ops burden	Medium — you pick types, patch via managed updates, right-size manually	Lowest — no nodes to patch, scale, or secure	Low–medium — configure NodePools; Karpenter does the rest
Cost shape	Pay for running instances; wastes money half-empty if untuned	Premium per-vCPU/GB at steady state; great for bursty/idle-heavy	Lowest for variable load — Spot + consolidation often cut compute 30–50%
Watch out for	Quiet waste; slow capacity; manual right-sizing across workloads	No DaemonSets, no GPU/privileged, slower start, steady-state premium	Needs interruption handling for Spot; NodePool/limits must be set sensibly

The 2026 default for most production clusters: Karpenter for general workloads (with a Spot strategy for fault-tolerant services), a small managed node group for cluster-critical add-ons that need guaranteed capacity, and Fargate reached for selectively where per-pod isolation or zero node-ops matters. EKS Auto Mode runs Karpenter-style provisioning for you if you'd rather not operate the data plane at all.

don't want to operate the data plane yourself?

Get matched with a partner who builds production EKS as code you own

Start in 3 minutes →

a recent match

A greenfield production EKS build — anonymized

inquiry · series-a b2b saas, remote (US-East)

Series-A B2B SaaS, ~20 engineers, moving off a single large EC2 + docker-compose box, already on AWS at ~$6K/month

Situation: The product had outgrown a hand-managed EC2 host running containers via docker-compose — no autoscaling, no clean deploys, a looming SOC 2 audit, and a growing services count (12 and rising) across two product teams. They'd decided they genuinely needed Kubernetes for the multi-team platform story, but had no one in-house who had stood up production EKS, and a contractor's earlier proof-of-concept cluster had IP-exhaustion issues and pods running on the node IAM role. They wanted a cluster they could own, not a black box.

What CloudRoute did: Routed within 16 hours to a US-East partner with EKS production references and a containers specialization. Discovery confirmed EKS was the right call for their service count and team structure. Over ~5 weeks the partner built it in Terraform/OpenTofu: a multi-AZ VPC with prefix delegation sized against IP exhaustion, Karpenter for general workloads with a Spot split plus a small managed node group for core add-ons, EKS Pod Identity with namespace-scoped RBAC (no static keys), the AWS Load Balancer Controller with consolidated ALBs and ACM TLS, Managed Prometheus + Grafana with symptom-based alerts, Argo CD GitOps with automated rollback, Pod Security Standards and control-plane audit logging for the audit, and Kubecost for per-team visibility — all handed over as IaC with runbooks.

Outcome: Production cutover on schedule against the full readiness checklist. Deploys went from manual SSH to reviewable GitOps with one-click rollback; a node failure during week 4 was a non-event thanks to PDBs and topology spread. The SOC 2 IAM and logging gaps were closed by design. Because the company was credit-eligible, the engagement was AWS-funded and the AWS usage during the build was credit-covered — the customer paid $0 to the partner, and CloudRoute's commission came from the partner's AWS engagement funding.

build window: ~5 weeks · founder/eng time: ~14 hours · deploys: manual → GitOps · audit gaps: closed · cost to customer: $0

faq

Common questions

How long does it take to set up an Amazon EKS cluster?

Creating a working cluster takes about 15–20 minutes with eksctl or Terraform — that part is genuinely quick. Making it production-ready is the real timeline: networking sized against IP exhaustion, the data-plane choice (Karpenter/Fargate/node groups), pod-level IAM with IRSA or Pod Identity, ingress via the AWS Load Balancer Controller, observability, security hardening, GitOps delivery, and cost controls. As a focused build that is commonly 2–6 weeks depending on depth and how many environments you need. The runnable checklist in this guide is how you know when it's actually done.

Should I use eksctl, Terraform/OpenTofu, or EKS Auto Mode to create the cluster?

Use eksctl to prototype and learn — one command gives you a working cluster fast, but it manages its own CloudFormation stacks separate from the rest of your infrastructure. Use Terraform or OpenTofu (the open fork; Terraform itself is now BSL-licensed) for production, with the standard terraform-aws-modules/eks module, so the cluster lives in one state alongside the rest of your AWS infra and every change is reviewable. Consider EKS Auto Mode if you don't have a platform engineer — AWS runs the data plane (compute, scaling, add-on lifecycle) for a modest premium, leaving you far less to operate. CDK, Pulumi, and CloudFormation are fine alternatives if your team already standardizes on them.

Managed node groups vs Fargate vs Karpenter — which should I use?

In 2026, most production clusters default to Karpenter for general workloads: it launches right-sized EC2 per pending pod in seconds and consolidates waste, typically cutting compute cost 30–50% versus static groups when tuned with a Spot strategy. Keep a small managed node group for cluster-critical add-ons that need guaranteed capacity. Reach for Fargate selectively — spiky workloads, strong per-pod isolation, or eliminating node operations entirely — accepting its steady-state premium and limits (no DaemonSets, no GPU, slower start). They combine; pick a primary model deliberately rather than ending up with static node groups by default.

How do I give pods AWS permissions safely in EKS?

Use IAM Roles for Service Accounts (IRSA) or EKS Pod Identity — never the node instance role and never static AWS keys in a Secret or environment variable. IRSA maps a Kubernetes service account to an IAM role via an OIDC provider; Pod Identity is the newer, simpler option where you install one add-on and create service-account-to-role associations through the EKS API (easier across many clusters). Both give each workload least-privilege, keyless AWS access. Pair them with namespace-scoped RBAC and secrets pulled at runtime from AWS Secrets Manager via the External Secrets Operator. This is table stakes for SOC 2, HIPAA, ISO 27001, and PCI.

What is the most common EKS setup mistake?

VPC CNI IP exhaustion. The default Amazon VPC CNI gives every pod a real VPC IP, so subnets sized for a few nodes run out of addresses as the cluster grows, and pods stop scheduling with "insufficient IP addresses." Prevent it up front with generous CIDR planning and prefix delegation (each node gets a /28 block, hosting far more pods per IP). Other frequent ones: pods on the node IAM role instead of IRSA/Pod Identity, one ALB per service inflating the bill, static node groups with no Karpenter consolidation, and hand-run kubectl apply with no GitOps or clean rollback.

How do I expose services running on EKS to the internet?

Install the AWS Load Balancer Controller. It provisions an Application Load Balancer (ALB) for HTTP/HTTPS via Kubernetes Ingress, or a Network Load Balancer (NLB) for raw TCP/UDP via Service. Configure TLS termination with ACM certificates, add AWS WAF where needed, and — importantly — consolidate many services behind a shared ALB using Ingress grouping rather than letting each service spin up its own load balancer, which is a common and avoidable cost leak. For internal-only services, the controller can provision internal load balancers in your private subnets.

What does Amazon EKS cost?

The fixed cost is small: roughly $0.10/hour (~$73/month) per cluster for the managed control plane, plus an extended-support surcharge if you run an older Kubernetes version past standard support. Everything else is variable — the EC2 or Fargate compute, load balancers, NAT gateway data processing, EBS volumes, and observability — and that is where bills balloon. The biggest savings come from Karpenter consolidation and Spot for fault-tolerant workloads, right-sized requests and limits, consolidated ingress, and per-namespace cost visibility (Kubecost or AWS split-cost allocation). An untuned cluster commonly wastes 30–50% of its compute.

Can CloudRoute just build the EKS cluster for me — and is it really $0?

Yes. CloudRoute matches you within 24 hours to a vetted AWS partner with real EKS production experience who builds a production-ready cluster to the checklist in this guide and hands it to you as infrastructure-as-code (Terraform/OpenTofu, GitOps manifests, runbooks) that you own. For credit-eligible companies — typically institutionally funded or running a qualifying workload — the build is frequently AWS-funded: the partner is paid through AWS partner programs and your AWS usage during the build is covered by Activate credits, so your out-of-pocket is $0 or low cost. The honest limit: that applies to credit-eligible engagements. If you don't qualify, it's still a vetted-partner referral and you pay the partner directly — and we tell you which case applies up front.

Get a production-ready EKS cluster — built right, owned by you.

CloudRoute matches you to a vetted AWS partner who stands up EKS to the full readiness checklist — networking, Karpenter, IRSA/Pod Identity, ingress, observability, security, and cost — handed over as infrastructure-as-code. Credit-eligible companies often pay $0. No hiring search, no black box.

Get matched with an EKS partner →→ see the startup engagement detail

matched within< 24h

production-ready in2–6 wks

cost if credit-eligible$0