The senior-engineer playbook for deploying Django on AWS and moving a production app there: the target architecture (gunicorn/uvicorn on ECS Fargate or App Runner, RDS/Aurora PostgreSQL, ElastiCache for Celery and cache, S3 + CloudFront for static and media via django-storages, an ALB out front), how to Dockerize the app, settings and secrets into Secrets Manager / SSM, Celery workers, running migrations and collectstatic in CI/CD, the database cutover with DMS, and the gotchas that bite Django teams. A MAP-funded AWS partner can run the whole thing — often at little-to-no cost to you.
Django apps arrive at AWS from two directions: teams outgrowing a PaaS (Heroku, Render, DigitalOcean App Platform) where the bill and the platform limits have stopped making sense, and teams on a single VPS or on-prem box that need real high-availability, managed Postgres, and room to scale. The destination is the same; the reasons rhyme.
The first reason is cost at scale. A PaaS charges a flat premium for abstracting the infrastructure — fine for a v1, expensive once you run multiple web processes, Celery workers, a managed Postgres plan, and a managed Redis add-on around the clock. On AWS you buy the underlying services (Fargate or EC2, RDS/Aurora, ElastiCache) at list price, then bend the bill down with a Compute Savings Plan and right-sizing. For a mid-size Django app the move commonly lands 30–60% lower, and the gap widens as you add workers.
The second reason is the architecture Django apps actually need. A real deployment is not one process — it is a WSGI/ASGI web tier, a Celery worker tier, usually a Celery beat scheduler, Postgres, a Redis broker/cache, and somewhere to serve static and media. A PaaS can do all of this, but coordinating it — private networking between web and database, connection pooling, scaling each tier independently, blue/green deploys — is exactly where its abstraction starts to fight you. ECS on Fargate models each tier as its own service with its own scaling policy, inside your VPC.
The third reason is the surrounding requirements: VPC isolation, IAM, private subnets for the database, SOC 2 / HIPAA controls, multi-AZ failover, and integration with the rest of an AWS estate (S3, SQS, SES, CloudFront, KMS). These are first-class on AWS and bolted-on or absent elsewhere — frequently the deciding factor for a company in security review or selling into the enterprise, ahead of cost.
The honest counterpoint: AWS asks you to own more than a PaaS does — a landing zone, networking, IAM, container images, deploy pipelines, observability. That operational surface is precisely why most teams do not run this migration alone. The rest of this page is the architecture and the plan; the CloudRoute angle is that a vetted AWS partner runs it for you, and AWS often funds it through MAP.
A mid-size Django app commonly runs $900–$2,500/month on a PaaS once you count multiple web processes, two or three Celery workers, a managed Postgres plan, managed Redis, and log/metrics add-ons. The equivalent on AWS — Fargate (web + workers) + RDS/Aurora + ElastiCache + S3/CloudFront + CloudWatch, with a Compute Savings Plan — commonly lands at $450–$1,300/month. That recovered margin is what pays for the migration (and with MAP, AWS pays for it instead).
There is no single "AWS version of a PaaS dyno." There are two sensible compute targets, and the right one depends on how much control you want versus how close to a push-to-deploy experience you want to stay. Everything else — database, broker, cache, static, media — maps almost one-to-one.
For compute, the two realistic targets are AWS App Runner and Amazon ECS on Fargate. App Runner is the closest thing AWS has to a PaaS: point it at a container image (or source repo) and it builds, deploys, serves HTTPS, and autoscales on request volume — the fastest path for a single straightforward web service. ECS on Fargate is the workhorse: serverless containers with full control over the VPC, subnets, security groups, task sizing, per-service autoscaling, an Application Load Balancer, and clean separation of the web tier from Celery workers and beat. Teams migrating a real Django app almost always want Fargate, because Celery and independent tier scaling are exactly what App Runner abstracts away. (EKS is overkill for a single Django app unless you already run Kubernetes.)
The web tier runs your project under a production WSGI or ASGI server. For a conventional synchronous app that means gunicorn with a sensible worker count (a common starting point is `2 × vCPU + 1`, tuned by load testing). For an app using async views, Django Channels, or ASGI, you run uvicorn — typically `gunicorn` with `uvicorn.workers.UvicornWorker` — so async views and WebSockets work as intended. Either way the container exposes a port, the ALB terminates TLS (via an ACM certificate) and routes to the ECS service, and health checks hit a lightweight endpoint.
For the database, Postgres becomes Amazon RDS for PostgreSQL (the like-for-like managed Postgres) or Aurora PostgreSQL (AWS's cloud-native engine — a little more per hour, but faster failover, autoscaling storage, up to 15 read replicas, and Serverless v2 capacity scaling). Because Django's ORM speaks PostgreSQL either way, your models, migrations, and queries do not change — you point `DATABASES` at the new endpoint. Put the database in private subnets, reachable only from the application security group.
Celery is the piece a generic "deploy a web app" guide forgets. It needs a broker and usually a result backend; on AWS that is Amazon ElastiCache for Redis (OSS/Valkey), and the same cluster doubles as Django's cache backend. Each worker pool runs as its own ECS Fargate service (`celery -A proj worker`), and Celery beat (the periodic scheduler) runs as a single-instance service (`celery -A proj beat`) so schedules don't double-fire. Some teams use Amazon SQS as the broker instead of Redis to drop a stateful component; Celery supports both, and the choice is a real architectural decision a partner makes with you.
Static and media files are the other Django-specific concern. Static assets (CSS/JS/admin) are gathered by `collectstatic` and served from Amazon S3 behind Amazon CloudFront via django-storages (`STORAGES`/`STATICFILES_STORAGE`); small apps can instead serve static from the container with WhiteNoise and skip S3 for static. User-uploaded media must go to S3 via django-storages regardless — container filesystems are ephemeral and there are multiple tasks, so writing uploads to local disk silently breaks the moment a second container starts. CloudFront fronts both with a CDN and TLS. Outbound email (a Django `EMAIL_BACKEND`) moves to Amazon SES or stays with your provider billed directly.
App Runner for a single stateless Django web service when you want push-to-deploy simplicity and have little or no background work (or you run it elsewhere). ECS on Fargate the moment you have Celery workers + beat, need VPC networking control, want to scale the web and worker tiers independently, or need private connectivity to RDS/ElastiCache — the common landing spot for a production Django app. Skip EKS unless you already run Kubernetes.
The full row-by-row mapping — every Django building block, its AWS service, the cost direction, and the effort — lives in the comparison table further down. Two principles make reading that table safe.
First, anything that speaks a standard protocol moves with a config change and no application rewrite: PostgreSQL (your `DATABASES` setting → RDS/Aurora endpoint), Redis (Celery broker + Django cache → ElastiCache endpoint), and S3 (media via django-storages). Your models, tasks, and views are untouched — you are changing connection strings and storage backends, not logic.
Second, anything PaaS- or host-proprietary needs a one-time translation into an AWS-native primitive: the process model (web/worker/beat) → separate ECS services, environment config and secrets → Secrets Manager + SSM Parameter Store, the release/deploy hook (where `migrate` and `collectstatic` run) → a CI/CD step, the build → a Dockerfile + ECR, and TLS/custom domains → Route 53 + ACM + CloudFront. Done once and documented, that translation is the work a migration partner does — the "Medium/High effort" rows in the table.
The single most hands-on task in a Django→AWS move is turning the app into a production container image and moving its configuration off the host and into AWS-managed config. It is mechanical, but it is where teams without container experience slow down — so here is exactly what it involves.
The Dockerfile is short and standard: a slim Python base (e.g. `python:3.12-slim`), OS build deps only if a wheel needs compiling, copy and install `requirements.txt` (or a Poetry/uv lockfile) into a clean layer, copy the project, switch to a non-root user, and set the start command to your WSGI/ASGI server — `gunicorn proj.wsgi:application --bind 0.0.0.0:8000 --workers N` for sync, or `gunicorn proj.asgi:application -k uvicorn.workers.UvicornWorker` for async. The same image serves every tier: the web service runs gunicorn/uvicorn, the worker service overrides the command to `celery -A proj worker`, and beat overrides it to `celery -A proj beat`. Build once, run three ways — that is the ECS pattern.
Settings are the other translation, and Django's flat `settings.py` is the first thing to refactor. Drive everything from environment variables (django-environ, or `os.environ` with `python-dotenv` for local dev): `DEBUG`, `ALLOWED_HOSTS`, `DATABASE_URL`, `CELERY_BROKER_URL`, `CACHE_URL`, `SECRET_KEY`, API keys. On AWS those split by sensitivity — real secrets (`SECRET_KEY`, database credentials, API keys, signing keys) in AWS Secrets Manager; non-sensitive config (feature flags, public URLs, log level, bucket name) in SSM Parameter Store. ECS injects both as environment variables via the task definition's `secrets` and `environment` blocks, so the container sees them exactly as it would locally and your code reads `os.environ` unchanged. The migration step is a one-time inventory, a sort into secret-vs-config, and a load into Secrets Manager / Parameter Store — which a partner scripts so nothing is missed or baked into an image layer.
Two production-settings details to fix while you are in there. Set `DEBUG=False` and `ALLOWED_HOSTS` to your real domain plus the ALB/health-check host (a misconfigured `ALLOWED_HOSTS` is the most common cause of a 400 on AWS — see the gotchas). And configure the database connection deliberately: enforce SSL to RDS/Aurora and use `CONN_MAX_AGE` carefully — persistent connections across many autoscaled tasks can exhaust Postgres connections, which is why RDS Proxy or PgBouncer belongs in the design, not as an afterthought.
On a single-server or single-dyno setup you can get away with running `migrate` and `collectstatic` when the app starts. On AWS, with multiple tasks starting in parallel behind an ALB, that pattern causes race conditions and flaky deploys. The fix is to make them explicit, ordered deploy steps.
Run `python manage.py collectstatic --noinput` at build time (or in the CI pipeline) so the image — or the S3 bucket, via django-storages — already has the gathered static assets before any container serves traffic. Baking collection into the build keeps the runtime container fast to start and means every task serves identical assets. If you serve static from S3/CloudFront, the pipeline runs `collectstatic` with the S3 backend so the files land in the bucket; with WhiteNoise they are collected into the image.
Run database migrations as a single one-off task per deploy — never in the web container's entrypoint, where N parallel tasks would each try to apply the same migration. The clean pattern on ECS is a dedicated `migrate` step in the pipeline: the CI job runs `aws ecs run-task` with the same image and the command overridden to `python manage.py migrate --noinput`, waits for it to succeed, and only then updates the web and worker services. CodePipeline + CodeBuild can orchestrate this, or you keep GitHub Actions running build → push to ECR → migrate task → update services. Either way the contract is: migrate first, then roll the services.
This ordering is also what makes zero-downtime deploys safe. Use backward-compatible migrations (add columns/tables before the code that needs them; remove only after the old code is gone) so the brief window where old and new tasks coexist never sees a schema the running code cannot handle. ECS rolling updates (or blue/green via CodeDeploy) then replace tasks gradually behind the ALB with no downtime — a discipline that pays off well beyond the migration.
1. Build image, run collectstatic (to S3 or into the image). 2. Push image to ECR. 3. Run a one-off migrate task; wait for success. 4. Update the web service (rolling or blue/green). 5. Update the Celery worker + beat services. Migrations are backward-compatible, so steps 4–5 are safe while old tasks drain.
Celery is where Django deployments differ most from a plain web app, and where a naive lift-and-shift goes wrong. The web tier and the background tier have different scaling, failure, and concurrency characteristics, so they get modeled as separate services.
The broker is the first decision. The simplest choice is Amazon ElastiCache for Redis (OSS/Valkey) as both the Celery broker and result backend, and the same cluster as Django's cache — one managed component covering three jobs, with a connection-string change from your current Redis. The alternative is Amazon SQS as the broker, which removes a stateful service and scales effortlessly, at the cost of some Redis-specific Celery features (and SQS is not a result backend, so you pair it with another store). For most teams ElastiCache is the like-for-like move; SQS is worth it when you want managed, near-infinite queue capacity and minimal ops.
Each worker pool is its own ECS Fargate service running `celery -A proj worker -l info`, with concurrency and autoscaling tuned to the work — CPU-bound tasks want lower concurrency and scale on CPU; IO-bound tasks tolerate higher concurrency. Splitting queues by workload (e.g. a `default` and a `heavy` queue on separate services) keeps a slow report job from starving fast user-facing tasks and lets each scale independently. Celery beat runs as a separate single-replica service so scheduled tasks fire exactly once — running two beat instances double-schedules everything, a classic post-migration bug.
Observability for the worker tier matters as much as for web. Ship Celery logs to CloudWatch, alarm on queue depth (Redis list length or SQS `ApproximateNumberOfMessagesVisible`) and on task failure rates, and set task time limits and retries so a stuck task can't pin a worker forever. EventBridge Scheduler can replace Celery beat for simple cron jobs if you prefer AWS-native scheduling, though most teams keep beat through the migration and revisit later.
Everything else — containers, the landing zone, S3/CloudFront, Celery services — can be built and tested without touching production. The database cutover is the one moment that touches live data, where a plan earns its keep. Done right, user-visible downtime is zero to fifteen minutes.
The core technique is to keep the new RDS/Aurora Postgres continuously in sync with your current database while you test, so that at cutover you switch to an already-current copy rather than copying a cold database under time pressure. AWS Database Migration Service (DMS) does exactly this: a full load into RDS/Aurora, then change data capture (CDC) that streams every subsequent insert/update/delete in near-real time. Because both ends are PostgreSQL this is a homogeneous migration — no Schema Conversion Tool needed; Django's schema moves as-is (a `pg_dump --schema-only` restore, or let your migrations build it and DMS replicate the data). Very small databases can skip DMS for a single `pg_dump | pg_restore` inside the window; DMS earns its place once a cold dump/restore would blow past an acceptable window.
With CDC running and the new stack smoke-tested against the synced database, the cutover is short and scripted: enable a maintenance page → let DMS drain the last few seconds → pause Celery beat and let workers finish in-flight tasks → flip `DATABASES` (and `CELERY_BROKER_URL`/`CACHE_URL`) to the AWS endpoints and switch traffic to the new ALB/App Runner service → update Route 53 (TTL lowered ahead of time) → verify writes land in RDS → exit maintenance. Because the data was already synced, the window is dominated by DNS propagation and verification, not data transfer — commonly 0–15 minutes during a low-traffic period.
The rollback plan is non-negotiable and simple: keep the old environment running until you are confident, and if something is wrong, switch the database connection and DNS back. The clean window is before RDS has accepted production writes the old database hasn't seen — once committed, rolling back means reconciling those writes — so most teams keep it short (minutes to hours) and watch closely. Lower DNS TTL to 60 seconds a day ahead so both the switch and rollback are fast. One Django-specific note: confirm the new database's sequence values are correct after a data-only load (DMS does not always advance sequences), or the first inserts collide on primary keys.
A cold pg_dump | pg_restore makes your downtime window = export + transfer + import of the whole database: 20–60 minutes for 5GB, unacceptable at 50GB+. DMS with CDC moves the bulk load before the window and only drains the last few seconds during it — so downtime stays in single-digit minutes regardless of database size. Validate row counts and re-sync Postgres sequences before you exit maintenance.
Here is the end-to-end sequence a partner runs, mapped to the AWS MAP phases (Assess → Mobilize → Migrate). For a typical Django app this is a 3–8 week project — most of it parallelizable, none of it touching production until the final cutover.
The definitive lookup: every Django building block, where it lands on AWS, the rough cost direction, and the engineering effort. "Trivial" rows are config or backend swaps; "Medium/High" rows are the real work — and the work a partner does for you.
| Django component | AWS service | What changes | Cost direction | Effort |
|---|---|---|---|---|
| WSGI web (gunicorn) | ECS/Fargate + ALB, or App Runner | Containerize; ALB terminates TLS | Down 30–60% | Medium |
| ASGI / async / Channels (uvicorn) | ECS/Fargate + ALB (uvicorn workers) | uvicorn worker class; ALB/WebSocket config | Down 30–60% | Medium |
| PostgreSQL database | RDS for PostgreSQL / Aurora PostgreSQL | DATABASES endpoint + cutover via DMS | Down 30–55% | High (cutover) |
| Celery broker + result backend | ElastiCache (Redis/Valkey) or Amazon SQS | CELERY_BROKER_URL; or SQS transport | Down 40–70% | Medium |
| Celery workers | ECS/Fargate service (per queue) | Worker process → its own service | Down 30–60% | Medium |
| Celery beat (scheduler) | ECS single-replica service, or EventBridge Scheduler | Run exactly one beat; or AWS cron | Near $0 | Low |
| Django cache framework | ElastiCache (Redis/Valkey) | CACHES backend → ElastiCache endpoint | Down 40–70% | Trivial |
| Static files (collectstatic) | S3 + CloudFront (django-storages) or WhiteNoise | STORAGES backend; collectstatic in CI | Down / neutral | Low |
| User media uploads | Amazon S3 (django-storages) | DEFAULT_FILE_STORAGE → S3 | Down | Low |
| settings.py config | SSM Parameter Store | Read from env injected by ECS | Negligible | Low |
| SECRET_KEY + DB creds + API keys | AWS Secrets Manager | Inject as task secrets; never in image | Negligible | Low |
| Dependencies + build (pip/Poetry) | Dockerfile + ECR | Explicit container build | Negligible | Medium |
| Email backend (SMTP) | Amazon SES (or provider direct) | EMAIL_BACKEND / SMTP config | Down | Low |
| Custom domain + TLS | Route 53 + ACM + CloudFront | DNS + free cert on ALB/CloudFront | Down (free certs) | Low |
Situation: Margin pressure ahead of a Series-B raise plus a SOC 2 commitment that required VPC isolation and a private database — neither comfortable on the PaaS. The three-person team had no AWS or container experience and could not afford a multi-month DIY migration or a risky big-bang database cutover, especially with 80GB of Postgres and a large media bucket to move.
What CloudRoute did: Routed within 24 hours to an AWS Advanced-tier partner with a Django + Celery track record, who ran the MAP Assess phase (free) and filed the work as a MAP engagement. Target: ECS on Fargate (gunicorn web behind an ALB; two Celery worker services split by queue; a single-replica beat), Aurora PostgreSQL with RDS Proxy for pooling, ElastiCache for the Celery broker + Django cache, S3 + CloudFront for static and media via django-storages, settings parameterized with secrets in Secrets Manager + SSM. CI/CD via GitHub Actions: build → ECR → one-off migrate task → rolling ECS update, with collectstatic at build time. Postgres moved with AWS DMS (full load + CDC); the media bucket synced with S3 DataSync ahead of cutover.
Outcome: Cutover ran in a Saturday-night window with ~13 minutes of write-downtime — DMS had Aurora fully in sync, so the switch was DNS + sequence re-sync + verification, not data transfer. Steady-state AWS bill landed at ~$1,050/month (a ~52% cut); the connection ceiling was gone behind RDS Proxy, and the worker tiers now scale independently of web. Project ran ~6 weeks. Because the workload qualified for MAP, AWS funded the assessment and credited the migration cost — out-of-pocket migration cost was effectively $0, and CloudRoute's commission was paid by the partner from MAP funding.
project length: ~6 weeks · cutover downtime: ~13 min · monthly spend: $2,200 → $1,050 (−52%) · migration cost to customer: ~$0 (MAP-funded)
CloudRoute routes you to a vetted AWS partner who plans and runs the whole Django→AWS migration — gunicorn/uvicorn on ECS Fargate or App Runner, RDS/Aurora Postgres, ElastiCache for Celery, S3 + CloudFront for static and media, the DMS cutover, and the cost optimization. Qualifying migrations are MAP-funded, so you capture the ongoing savings without paying the usual migration bill.