amazon ecs setup · 2026 reference + production build

Amazon ECS setup — from first task to a production service, and who builds it.

A production Amazon ECS service is a task definition behind an Application Load Balancer, running on Fargate (or EC2), in private subnets with awsvpc networking, autoscaling on real signals, secrets pulled from Secrets Manager, logs in CloudWatch, and a deploy strategy — rolling or blue-green via CodeDeploy — that can roll back. This page walks the real decisions (Fargate vs EC2, task sizing, ALB, autoscaling, networking, secrets, deploys, observability, cost), gives you a production checklist, makes the honest ECS-vs-EKS call, and shows how a vetted AWS partner builds it for you — often AWS-funded if you qualify for credits.

launch types
Fargate / EC2
typical production setup
1–3 wks
nodes to manage on Fargate
0
cost if credit-eligible
$0
TL;DR
  • Amazon ECS is AWS's native container orchestrator: you package your app as a container image, describe how to run it in a task definition, and run it as a long-lived service behind a load balancer. The core objects are the cluster, the task definition, the service, and the launch type — Fargate (serverless, no nodes) or EC2 (you run the instances).
  • For most teams in 2026 the default is ECS on Fargate, in private subnets with awsvpc networking, fronted by an Application Load Balancer, autoscaling on CPU/memory or request count, with secrets injected from AWS Secrets Manager, logs and metrics in CloudWatch, and a rolling or blue-green (CodeDeploy) deploy. Choose EC2 launch type when you need GPUs, very large or specialized instances, the lowest steady-state cost at scale, or daemon workloads on every host.
  • You can stand this up yourself — this page is the map — or CloudRoute can route you to a vetted AWS partner who builds the production ECS platform (networking, task definitions, services, ALB, autoscaling, secrets, deploys, observability, IaC) for you. For credit-eligible companies the engagement is often AWS-funded, so the customer pays $0; otherwise it is a vetted-partner referral that skips the hiring and vetting slog.
the model

IWhat Amazon ECS actually is — clusters, tasks, services, launch types

ECS (Elastic Container Service) is AWS's own container orchestrator. It is simpler than Kubernetes by design: there is no control plane for you to run, no cluster to upgrade, and a small set of objects to understand. Before you set anything up, it pays to know the four nouns that everything else hangs off.

A container image is your app packaged once (a Dockerfile build) and stored in a registry — almost always Amazon ECR. A task definition is the blueprint for running that image: which image and tag, how much CPU and memory, environment variables and secret references, the port it listens on, the log configuration, the IAM roles it assumes, and (on Fargate) the platform settings. A task is one running instance of that blueprint — one or more containers scheduled together. A service keeps a desired number of tasks running, replaces them when they die, registers them with a load balancer, and orchestrates deployments when you ship a new version. A cluster is the logical boundary the services run in.

The launch type is the one decision that colours everything else. With Fargate, AWS runs the compute for you — you specify CPU and memory per task and AWS schedules it on capacity you never see, patch, or scale. With the EC2 launch type, you run a fleet of EC2 instances (an Auto Scaling group, registered to the cluster via a capacity provider) and ECS bin-packs tasks onto them; you own the host OS, the AMI, and the instance scaling. Most teams start on Fargate because it removes an entire category of operational work; some move specific workloads to EC2 for cost or hardware reasons. Section IV is the full comparison.

The honest framing for this page: ECS itself is genuinely easy to get running — a single task behind a load balancer is an afternoon. What is not trivial is the production envelope around it: private networking, least-privilege IAM, secrets that are never baked into images, autoscaling that reacts to the right signal, a deploy strategy that can roll back, health checks that actually catch a bad release, observability you can debug an incident with, and all of it defined as code rather than clicked together once. That envelope is the work — and it is exactly what a good AWS partner hands you in a week or two.

the core objects

IITask definitions and services: sizing, roles, and what to get right

The task definition and the service are where most of the real configuration lives. Get the sizing, the two IAM roles, and the health checks right here and the rest of the setup is comparatively mechanical.

A task definition is versioned — every change creates a new revision, and the service points at a revision. This is a feature: a deploy is really "move the service from task-def revision N to N+1," and a rollback is "point it back at N." Treat task definitions as immutable artifacts the same way you treat the image: define them in code (Terraform/OpenTofu/CDK/CloudFormation), not by hand-editing in the console.

Right-sizing CPU and memory

On Fargate you pick a CPU/memory combination from a fixed matrix (for example 0.25 vCPU / 0.5 GB at the small end, up to 16 vCPU / 120 GB, with newer larger configurations available) — task CPU and memory must be a valid pair. On EC2 you set CPU and memory reservations per container and ECS bin-packs them onto instances, so you can oversubscribe more flexibly. Start conservative, then size from real data: ECS publishes per-task CPU and memory utilization to CloudWatch, and Container Insights gives you the percentiles. Most teams over-provision early; right-sizing after a week of real traffic is one of the cheapest wins available.

The two IAM roles (do not confuse them)

Every ECS task has up to two distinct roles, and conflating them is the single most common setup mistake. The task execution role is used by the ECS agent / Fargate to start the task — pulling the image from ECR, fetching secret values from Secrets Manager/SSM at launch, and writing logs to CloudWatch. The task role is the identity your application code assumes at runtime to call other AWS services (S3, DynamoDB, SQS, etc.). Scope both to least privilege: the execution role needs only ECR pull, the specific secret ARNs, and the log group; the task role needs only the application's actual AWS calls. Never give either an admin policy.

Health checks and the service's job

The service is what makes ECS production-grade: it maintains the desired task count, replaces unhealthy tasks, and during a deploy spins up new-revision tasks before draining old ones. Define a container health check (or rely on the ALB target-group health check) so ECS and the load balancer agree on what "healthy" means, and set a sensible healthCheckGracePeriodSeconds so a slow-starting app is not killed before it is ready. Set minimumHealthyPercent and maximumPercent deliberately — they govern how many tasks must stay up and how many extra can spin up during a rolling deploy. Getting these wrong is how a deploy briefly takes the service below capacity.

networking + ingress

IIIawsvpc networking and the Application Load Balancer

How tasks get their network identity and how traffic reaches them is the part that, done wrong, either exposes you or makes everything mysteriously unreachable. On modern ECS there is really one networking mode that matters, and one front door that matters.

In awsvpc mode — the default and the only option on Fargate — each task gets its own elastic network interface and its own private IP inside your VPC, with its own security group. This is the model you want: a task is a first-class citizen on the network, you control its inbound and outbound rules at the task level, and there is none of the port-juggling that older bridge/host modes required. Run tasks in private subnets with no public IP; give them outbound internet (for pulling images, calling external APIs) via a NAT gateway, or, better for cost and security, via VPC endpoints (PrivateLink) to ECR, S3, Secrets Manager, and CloudWatch Logs so image pulls and secret fetches never leave the AWS network.

The front door for an HTTP/HTTPS service is an Application Load Balancer. The ALB lives in public subnets; your tasks live in private subnets; the ALB's security group is the only thing allowed to reach the task security group on the app port. ECS integrates natively: the service registers and deregisters task IPs with an ALB target group automatically as tasks come and go, and the target group's health check is what gates whether a task receives traffic. Terminate TLS at the ALB with an ACM certificate, route by host or path with listener rules, and put AWS WAF in front if you need request filtering. For non-HTTP/TCP/UDP workloads (game servers, gRPC at L4, databases) you use a Network Load Balancer instead; for most web and API services the ALB is the right choice.

A few specifics that save incidents: enable connection draining (deregistration delay) on the target group so in-flight requests finish before a task is killed during a deploy or scale-in; right-size the health-check thresholds so a brief blip does not flap tasks out of service; and if you run many services, consider ECS Service Connect (or AWS Cloud Map service discovery) for clean service-to-service communication inside the cluster instead of routing everything through the ALB.

the launch-type call

IVFargate vs EC2 launch type — how to actually choose

This is the decision people agonize over and usually overthink. The short version: default to Fargate, and move specific workloads to EC2 only when a concrete reason appears. Here is the reasoning, and the full comparison is in the table below.

Choose Fargate when you want to stop thinking about servers. There are no instances to patch, no AMIs to maintain, no node Auto Scaling group to tune, and no bin-packing to reason about — you pay per task for the vCPU and memory you request, billed per second. For the overwhelming majority of web services, APIs, workers, and scheduled jobs at startup and scaleup scale, this is the right answer; the slightly higher per-unit compute price buys back a large amount of engineering time and removes a whole class of operational risk. Fargate also supports Spot for fault-tolerant workloads, which claws back much of the cost gap.

Choose the EC2 launch type when you have a concrete reason: you need GPUs or specialized hardware (ML inference/training) that Fargate does not offer; you need very large instance types or particular CPU architectures and want full control of the host; you run at high, steady utilization where reserved or Spot EC2 capacity that you bin-pack densely is meaningfully cheaper than per-task Fargate; you need daemon workloads (a log shipper or agent on every host); or you have host-level requirements (custom kernel parameters, specific networking, privileged access) that Fargate's managed model does not permit. With EC2 you trade operational simplicity for control and, at scale and high utilization, lower cost — but you now own instance patching, scaling, and capacity planning.

The pragmatic pattern many teams land on is both: Fargate as the default for stateless services and bursty/spiky workloads, and an EC2 capacity provider for the specific workloads that justify it (GPU jobs, a few always-on high-utilization services). ECS supports mixing launch types and capacity providers in the same cluster, so this is not an all-or-nothing decision. Start on Fargate; let real cost and hardware needs — not theory — pull individual workloads onto EC2 later.

scaling on signals

VService autoscaling: scaling tasks (and, on EC2, the cluster)

Autoscaling on ECS happens at two layers, and the distinction matters. Service autoscaling adjusts how many tasks run; cluster/capacity scaling (EC2 only) adjusts how much underlying compute exists for those tasks to land on. On Fargate you only deal with the first.

Service Auto Scaling changes the desired task count of a service via Application Auto Scaling, on a policy you choose. Target tracking is the default and the right starting point: pick a metric and a target (for example "keep average CPU at 60%," "keep average memory at 70%," or — often the best for request-driven services — "keep ALB requests-per-target at N"), and ECS adds or removes tasks to hold that target. Step scaling reacts to CloudWatch alarm thresholds in defined increments for more bespoke behaviour, and scheduled scaling pre-warms capacity for known traffic patterns (business-hours ramps, a daily batch window, a marketing event). Always set sensible minimum and maximum task counts, and tune cooldowns so the service does not thrash.

On the EC2 launch type you also need the underlying instances to scale, or tasks will sit in PENDING with nowhere to run. The modern mechanism is a capacity provider with managed scaling: ECS watches how much capacity your desired tasks need and scales the EC2 Auto Scaling group up and down to match a target utilization you set, including scaling in to zero spare capacity when idle. (The older "Cluster Auto Scaler" wiring is superseded by capacity providers for new builds.) On Fargate this entire layer disappears — there is no cluster to scale, so service autoscaling is all you configure. That deletion of an entire scaling concern is a big part of why Fargate is the default recommendation for teams without dedicated infra staff.

A practical note on cold starts and headroom: target-tracking reacts to load, it does not predict it, so for spiky traffic give yourself a little static headroom (a higher minimum task count, or scheduled scaling ahead of known spikes) rather than relying on the autoscaler to catch a sudden surge in time. The combination of request-per-target target tracking plus a sane minimum is what keeps latency flat during traffic bursts.

secrets + config

VISecrets and configuration done the right way

Secrets are where homegrown ECS setups are most often wrong: API keys and database passwords end up in plaintext environment variables baked into the image or the task definition. ECS gives you a clean, native way to avoid that entirely.

Store application secrets in AWS Secrets Manager (or in SSM Parameter Store as SecureString for simpler, non-rotating config), encrypted with KMS. In the task definition, reference them under secrets by ARN rather than putting values in environment; at task launch, the task execution role fetches the values and injects them as environment variables into the container, so the plaintext never lives in your image, your repo, your IaC state in cleartext, or the task-definition JSON. Scope the execution role to exactly the secret ARNs that task needs — nothing broader. Secrets Manager also handles rotation (for example database credentials) so you are not redeploying to change a password.

For non-secret configuration (feature flags, environment names, tunables) plain environment entries or Parameter Store String values are fine. The discipline that matters: never echo secret values in logs, never bake them into the container image, and keep the boundary clear — the execution role pulls secrets to start the task, and the application's own task role governs what AWS APIs the running code may call. For image pulls and secret fetches, prefer VPC endpoints (PrivateLink) to ECR, Secrets Manager, and SSM so those calls stay on the AWS network even from private subnets.

shipping safely

VIIDeployments: rolling vs blue-green (CodeDeploy), and rollback

How a new task-definition revision reaches production is where outages are prevented or caused. ECS gives you a built-in rolling deploy and, via CodeDeploy, a true blue-green with instant rollback. Choosing between them is mostly about how much risk a single release carries.

The rolling update is ECS's native, default deployment. The service starts tasks on the new revision and drains tasks on the old one a few at a time, governed by minimumHealthyPercent and maximumPercent, registering new tasks with the ALB target group and deregistering old ones as it goes. It needs no extra services, costs nothing extra, and is the right default for lower environments and lower-risk services. Its limit: during the roll, both versions serve live traffic, so a bad release reaches some users before health checks catch it — which is why good health checks and connection draining matter so much here. ECS also supports deployment circuit breaker, which automatically rolls a failed rolling deploy back to the last known-good revision if the new tasks never reach a healthy steady state — turn it on.

Blue-green via AWS CodeDeploy stands up the new revision as a separate ("green") task set alongside the live ("blue") one, lets you validate green against a test listener, then shifts production traffic at the ALB — all at once, or as a canary (a small percentage first, then the rest) or linear ramp. Rollback is effectively instant because blue is still running: if a CloudWatch alarm trips during or just after the shift, CodeDeploy reverts traffic to blue automatically. This is the pattern for production services where a bad release is expensive: you get pre-shift validation, a controlled traffic shift, automated alarm-based rollback, and a clean previous version held warm. The cost is briefly running two task sets and the extra moving part of CodeDeploy. A common, sensible split is rolling (with the circuit breaker on) for dev/staging and CodeDeploy blue-green or canary for production.

Because every deploy is just a pointer to an immutable, SHA-tagged image and a task-definition revision, rollback is unambiguous: redeploy the previous revision (or, with blue-green, flip back to blue). The asterisk, as always, is database migrations — make them backward-compatible with expand-then-contract (add the new column, deploy code that writes both, backfill, remove the old later) so the previous task revision still runs against the new schema. Test the rollback path before you need it; a rollback you have never run is a hope, not a plan.

ecs deployment strategies · trade-offs
StrategyHow traffic shiftsRollbackExtra costHow it runs
Rolling (ECS native)Replace a few tasks at a time behind the ALBAuto via deployment circuit breaker → last good revisionNoneBuilt into the ECS service
Blue-green (CodeDeploy)Validate green, shift at the ALB (all-at-once / canary / linear)Instant — revert to blue, auto on CloudWatch alarmTwo task sets brieflyECS + CodeDeploy deployment controller
Canary (CodeDeploy)Small % first, watch alarms, then the restAuto on alarm; only a slice exposedSmall (extra task set)CodeDeploy traffic-shifting config
Default pattern: rolling with the deployment circuit breaker enabled for lower environments; CodeDeploy blue-green or canary for production, with the traffic shift gated on CloudWatch alarms and a post-deploy smoke test.
see it + pay for it

VIIILogging, observability, and cost

A service you cannot observe is a service you cannot operate, and a service whose cost you do not understand is one that surprises you on the invoice. ECS gives you native answers for both; the trick is turning them on deliberately rather than discovering the gaps during an incident.

Logging: configure the awslogs log driver (or FireLens/Fluent Bit for routing to a third party) so container stdout/stderr lands in CloudWatch Logs, one log group per service, with a retention policy set (unbounded retention is a silent cost leak). Metrics and tracing: turn on CloudWatch Container Insights for per-task and per-service CPU, memory, network, and task-count metrics, and instrument the application with AWS X-Ray or OpenTelemetry for distributed traces. Teams already on Datadog, Grafana, or Prometheus typically ship ECS metrics/logs there via the OpenTelemetry collector or a sidecar — fine, as long as something is watching. The non-negotiable: alarms on the signals that matter (error rate, p99 latency, unhealthy host/target count, task restarts) wired to the deploy rollback and to on-call, so the system tells you it is unhealthy instead of a customer doing it.

Cost: on Fargate you pay per task for requested vCPU and memory, per second — so cost scales directly with how many tasks run and how big each is, which makes right-sizing task CPU/memory and tuning autoscaling the two biggest levers. Fargate Spot can cut compute cost substantially for interruption-tolerant workloads (workers, batch, stateless services with graceful draining). On EC2, cost is driven by your instances, so the levers are dense bin-packing, Reserved Instances / Savings Plans for steady-state capacity, and Spot for the fault-tolerant portion — at high, steady utilization this can undercut Fargate, which is one of the main reasons to choose EC2. Across both, the recurring savings come from right-sized tasks, autoscaling that scales in as well as out, log retention limits, NAT-vs-PrivateLink choices for egress, and killing the over-provisioned defaults nobody revisited. A good build sets these correctly from day one; Compute Savings Plans cover Fargate too, so even the serverless path has a committed-use discount.

before you call it done

IXThe production-readiness checklist

A task running behind a load balancer is a demo. A production ECS service is the list below. None of it is exotic; all of it is skipped under deadline pressure, and all of it is cheaper to do up front than to retrofit after the first incident.

Run this before you put real traffic on an ECS service:

  • Private subnets + awsvpc + tight security groups — Tasks in private subnets, no public IPs, each task SG reachable only from the ALB SG on the app port. Egress via NAT or, better, VPC endpoints to ECR/S3/Secrets Manager/Logs.
  • Two least-privilege IAM roles — Execution role scoped to ECR pull + the exact secret ARNs + the log group; task role scoped to only the AWS calls the app actually makes. No admin policies on either.
  • Secrets from Secrets Manager / SSM, never in the image — Referenced by ARN in the task definition under secrets, injected at launch, rotation enabled where relevant. No plaintext in environment, image, or repo.
  • ALB health checks + grace period + draining — Target-group and/or container health checks aligned, a sane healthCheckGracePeriodSeconds for slow starts, and deregistration delay so in-flight requests finish during deploys and scale-in.
  • Service autoscaling on the right signal — Target tracking on CPU/memory or (often best) ALB requests-per-target, sensible min/max task counts, cooldowns tuned, plus a little headroom for spikes. On EC2, a capacity provider with managed scaling.
  • A deploy strategy with automatic rollback — Rolling with the deployment circuit breaker on for lower envs; CodeDeploy blue-green or canary gated on CloudWatch alarms for production. Rollback path tested, not assumed.
  • Logging + metrics + tracing + alarms — awslogs to CloudWatch with retention set, Container Insights on, X-Ray/OTel traces, and alarms on error rate, latency, unhealthy targets, and task restarts wired to on-call.
  • Multi-AZ and right-sized tasks — Tasks spread across at least two Availability Zones, desired count high enough to survive an AZ loss, CPU/memory sized from real utilization data rather than guesses.
  • Everything as code — Cluster, task definitions, services, ALB, autoscaling, IAM, and networking in Terraform/OpenTofu/CDK/CloudFormation — reviewed on PRs, applied on merge, never click-opsed into existence.
the one rule

Promote an immutable, SHA-tagged image through every environment and define every ECS object as code. A deploy then becomes "point the service at a new task-definition revision," a rollback becomes "point it back," and there is no console drift to debug at 3 a.m. If you take one thing from this page, take this.

the honest call

XECS vs EKS: which one should you actually run?

The most common question right after "how do I set up ECS" is "should it be ECS or EKS (managed Kubernetes)?" The honest answer for most teams is ECS — and it is worth being clear about why, and about the cases where EKS genuinely wins.

Default to ECS when your goal is to run containers on AWS with the least operational overhead. There is no control plane to manage or upgrade, no Kubernetes version treadmill, far fewer moving parts, deep native integration with ALB/IAM/CloudWatch/Secrets Manager/CodeDeploy, and — on Fargate — no nodes at all. For a startup or scaleup running web services, APIs, workers, and jobs, ECS gets you to production faster and keeps you there with a smaller surface area to secure and operate. It is the pragmatic choice precisely because it does less.

Choose EKS when you have a concrete Kubernetes-shaped reason: you need the Kubernetes ecosystem (Helm, operators, CRDs, a specific controller or service mesh), you want portability across clouds or a consistent platform with on-prem, you are building an internal developer platform on Kubernetes primitives, or your team already has deep Kubernetes expertise and tooling. EKS is excellent and fully managed at the control-plane level — but you still own node groups (or use Fargate/Karpenter), cluster upgrades, add-ons, and a much larger configuration and security surface. That power is worth it when you will use it, and pure overhead when you will not.

The blunt version: do not adopt Kubernetes to run a handful of containers. If you cannot name the Kubernetes feature you need, ECS is the right answer, and you can revisit EKS the day a real requirement (a specific operator, multi-cloud, a platform play) actually appears. We have a dedicated ECS-vs-EKS breakdown and an EKS setup guide linked below if you want to go deeper on the decision before committing.

get it built

XIHave a partner build your production ECS platform — often AWS-funded

You can build everything above yourself; this page is the map. But most teams searching "amazon ecs setup" do not actually want to spend three weeks becoming ECS experts — they want a production-grade container platform shipped correctly so the team can get back to the product. That is what CloudRoute routes you to.

CloudRoute matches you to a vetted AWS partner who builds the ECS platform end to end: the VPC and awsvpc networking, the cluster and capacity providers (Fargate, EC2, or both), task definitions and services sized from real data, the ALB and target groups, service autoscaling on the right signal, least-privilege IAM, secrets via Secrets Manager, the rolling or CodeDeploy blue-green/canary deploy with tested rollback, CloudWatch logging/Container Insights/X-Ray observability, and all of it as infrastructure-as-code in your repo. You get the work done by people who do this for a living, without running a hiring loop or vetting agencies yourself.

The commercial part, stated honestly: for credit-eligible companies, the partner engagement is frequently AWS-funded — the partner is paid through AWS partner-funding programs and your AWS usage during the build is covered by credits — so the customer pays $0 or low cost. If you are not credit-eligible, it is a straightforward vetted-partner referral: you still skip the hiring-and-vetting slog, you just pay the partner for the engagement directly. CloudRoute is paid a commission by the partner, not by you. We tell you which bucket you are in up front; we do not pretend everything is free.

If you also want the AWS credits themselves — which is what funds the engagement — that runs in parallel. See the AWS credits routes (the $100K Activate Portfolio tier is the common one for funded startups) and the startup persona page below; the ECS build and the credit application are typically filed by the same partner in the same week.

what you actually hand over

Repo access + which AWS account(s) + your container(s) and what they need (ports, secrets, dependencies) + Fargate or EC2 preference + how hands-on you want to stay. The partner returns a production ECS service — networking, ALB, autoscaling, secrets, safe deploys, observability, and the IaC in your repo — with a rollback you have watched work. For credit-eligible companies, often at $0.

launch type, side by side

Fargate vs EC2 launch type on Amazon ECS

The launch-type decision compared on the axes that actually drive it. The honest default for most teams is Fargate; the table makes clear exactly when an EC2 capacity provider earns its keep.

DimensionFargateEC2 launch type
Who runs the computeAWS — no instances you seeYou — an EC2 Auto Scaling group you own
Ops burden on youNone (no patching, no AMIs, no node scaling)You patch the OS/AMI, scale and bin-pack instances
Pricing modelPer task: requested vCPU + memory, per secondPer EC2 instance (On-Demand / RI / Savings Plan / Spot)
Cost sweet spotVariable, spiky, or low-utilization workloadsHigh, steady utilization with dense bin-packing
Cheaper-at-scale optionFargate Spot for interruption-tolerant tasksSpot + Reserved/Savings Plans on the fleet
GPUs / specialized hardwareNot availableYes — GPU and specialized instance types
Daemon workloads (per-host agents)No (no hosts to run them on)Yes (DAEMON scheduling on every instance)
Host-level controlNone (managed runtime)Full (kernel params, custom AMI, privileged needs)
Networking modeawsvpc only (task-level ENI + SG)awsvpc (recommended); bridge/host also possible
Scaling layers to manageOne — service autoscaling onlyTwo — service autoscaling + capacity-provider scaling
Best fitMost teams; the default to start onGPU/ML, very large instances, steady high-utilization, daemons
Default to Fargate; move specific workloads to an EC2 capacity provider when a concrete reason appears (GPUs, host control, or lower cost at steady high utilization). ECS lets you mix both in one cluster, so it is not all-or-nothing.
skip the three-week build
Get a production Amazon ECS platform built for you — often AWS-funded
Get matched with a partner →
a recent match

From a single hand-run container to a production ECS service — anonymized

inquiry · seed-stage b2b saas, 11 engineers, on AWS
Seed-stage B2B SaaS, 11 engineers, running one container on ECS by hand

Situation: Already on AWS with a containerized API limping along on a single ECS service the founding engineer had clicked together: tasks in public subnets with broad security groups, a database password and a third-party API key sitting in plaintext environment variables in the task definition, no autoscaling, no real health checks, deploys done by editing the service in the console, and no rollback when a release went bad. A recent deploy had dropped the service below capacity mid-roll and caused a 25-minute partial outage. They had no in-house DevOps hire and could not justify one yet — and they were raising and qualified for AWS credits.

What CloudRoute did: CloudRoute routed them within a day to a US-based AWS partner with an ECS/Fargate track record. The partner rebuilt the platform as code in Terraform: tasks moved to private subnets with awsvpc and tight per-service security groups behind an ALB, the database and API secrets moved into Secrets Manager (referenced by ARN, injected at launch via a least-privilege execution role), a separate least-privilege task role for the app's S3/DynamoDB calls, service autoscaling on ALB requests-per-target across two AZs, CodeDeploy blue-green deploys gated on CloudWatch alarms with automatic rollback, and CloudWatch Container Insights + X-Ray with alarms wired to on-call. They left it on Fargate (no GPU or steady-high-utilization reason to go EC2) and filed the AWS Activate Portfolio credit application in the same week.

Outcome: Production-grade ECS service live in under three weeks. Deploys went from a console edit and crossed fingers to a gated blue-green shift with automatic rollback on a failed health check; zero plaintext secrets remained in task definitions; the service now survives an AZ loss and scales on real request load. Because the company was credit-eligible, the engagement was AWS-funded and the customer paid $0; CloudRoute was paid by the partner.

build window: < 3 weeks · plaintext secrets removed: 100% · prod deploys: CodeDeploy blue-green, auto-rollback · cost to customer: $0 (credit-eligible)

faq

Common questions

Should I use Fargate or the EC2 launch type for ECS?
Default to Fargate. It removes servers entirely — no instances to patch, no AMIs, no node scaling, no bin-packing — and you pay per task for the vCPU and memory you request, per second. For the large majority of web services, APIs, workers, and jobs at startup and scaleup scale, that is the right call, and Fargate Spot recovers much of the cost gap for interruption-tolerant work. Choose the EC2 launch type only when you have a concrete reason: GPUs or specialized hardware, very large or specific instance types, daemon workloads on every host, host-level control Fargate does not allow, or high steady-state utilization where densely bin-packed Reserved/Spot EC2 is meaningfully cheaper. ECS lets you mix both in one cluster, so you can start on Fargate and move specific workloads to EC2 later.
What is the difference between a task definition, a task, and a service in ECS?
A task definition is the versioned blueprint for running your container(s): image and tag, CPU and memory, ports, environment variables and secret references, log configuration, and the IAM roles. A task is one running instance of that blueprint. A service keeps a desired number of tasks running, replaces unhealthy ones, registers them with a load balancer, and orchestrates deployments when you ship a new task-definition revision. The cluster is the logical boundary they run in. In practice a deploy is "move the service from task-def revision N to N+1," and a rollback is "point it back at N."
What are the two IAM roles an ECS task uses, and why does it matter?
The task execution role is used to start the task — pulling the image from ECR, fetching secret values from Secrets Manager/SSM at launch, and writing logs to CloudWatch. The task role is the identity your application code assumes at runtime to call other AWS services like S3, DynamoDB, or SQS. Confusing the two is the most common ECS setup mistake. Scope both to least privilege: the execution role needs only ECR pull, the specific secret ARNs, and the log group; the task role needs only the AWS calls the app actually makes. Never attach an admin policy to either.
How should secrets be handled in ECS task definitions?
Store secrets in AWS Secrets Manager (or SSM Parameter Store as SecureString for simpler config), encrypted with KMS, and reference them by ARN in the task definition under "secrets" rather than putting plaintext in "environment." At launch, the task execution role fetches the values and injects them as environment variables, so the plaintext never lives in your image, repo, or the task-definition JSON. Scope the execution role to exactly those secret ARNs, enable rotation in Secrets Manager where relevant, and never echo secret values in logs. Use VPC endpoints to Secrets Manager and ECR so those fetches stay on the AWS network from private subnets.
How do I connect an Application Load Balancer to an ECS service?
Run the ALB in public subnets and the tasks in private subnets, and allow only the ALB security group to reach the task security group on the app port. The ECS service integrates natively with an ALB target group: as tasks start and stop, the service automatically registers and deregisters their private IPs, and the target-group health check gates whether each task receives traffic. Terminate TLS at the ALB with an ACM certificate, route by host or path with listener rules, enable connection draining (deregistration delay) so in-flight requests finish during deploys and scale-in, and optionally put AWS WAF in front. For non-HTTP L4 workloads use a Network Load Balancer instead.
How does autoscaling work on ECS?
There are two layers. Service Auto Scaling changes how many tasks a service runs, via Application Auto Scaling — target tracking is the default (keep CPU, memory, or ALB requests-per-target at a chosen value), with step scaling and scheduled scaling for more control; always set sensible min/max task counts and cooldowns. On the EC2 launch type you also need the instances to scale, handled by a capacity provider with managed scaling that grows and shrinks the EC2 Auto Scaling group to fit your tasks. On Fargate that second layer does not exist — there is no cluster to scale — so service autoscaling is all you configure, which is a big reason Fargate is the default for teams without dedicated infra staff.
What is the difference between rolling and blue-green deployments on ECS?
A rolling update is ECS's native default: it replaces tasks a few at a time behind the ALB, governed by minimum/maximum healthy percent, and can auto-roll-back via the deployment circuit breaker if the new tasks never go healthy. Both versions serve traffic during the roll, so health checks matter. Blue-green via AWS CodeDeploy stands up the new revision as a separate task set, lets you validate it, then shifts production traffic at the ALB all-at-once or as a canary/linear ramp — with instant, automatic rollback to the old version if a CloudWatch alarm trips. Use rolling (circuit breaker on) for lower environments and CodeDeploy blue-green or canary for production where a bad release is expensive.
Should I use ECS or EKS?
For most teams, ECS. It has no control plane to manage or upgrade, far fewer moving parts, deep native AWS integration, and on Fargate no nodes at all — so you reach production faster with a smaller surface to secure and operate. Choose EKS when you have a concrete Kubernetes reason: you need the Kubernetes ecosystem (Helm, operators, CRDs, a service mesh), portability across clouds or with on-prem, an internal developer platform built on Kubernetes, or existing deep Kubernetes expertise. The blunt rule: do not adopt Kubernetes to run a handful of containers; if you cannot name the Kubernetes feature you need, ECS is the answer.
Can CloudRoute set up Amazon ECS for us, and what does it cost?
Yes. CloudRoute routes you to a vetted AWS partner who builds the whole production ECS platform — VPC and awsvpc networking, cluster and capacity providers (Fargate, EC2, or both), task definitions and services, the ALB, service autoscaling, least-privilege IAM, secrets in Secrets Manager, rolling or CodeDeploy blue-green/canary deploys with tested rollback, CloudWatch and X-Ray observability, and all of it as infrastructure-as-code in your repo. For credit-eligible companies the engagement is often AWS-funded, so the customer pays $0 or low cost; for everyone else it is a vetted-partner referral and you pay the partner directly. CloudRoute is paid a commission by the partner, never by you, and we tell you up front which case applies.

Want a production Amazon ECS service without hiring for it?

CloudRoute routes you to a vetted AWS partner who builds your ECS platform end to end — networking, ALB, autoscaling, secrets, safe deploys, observability, and IaC. For credit-eligible companies it is often AWS-funded — customer pays $0. Otherwise, a clean vetted-partner referral.

matched within< 24h
typical build1–3 weeks
cost if credit-eligible$0
Amazon ECS setup — the 2026 production guide (Fargate vs EC2) · CloudRoute