A Node.js app can run on AWS five different ways, and picking the wrong one is the single most expensive decision you will make. This is the senior-engineer playbook for deploying and migrating Node.js to AWS: when a long-running API belongs on App Runner or ECS/Fargate versus Lambda + API Gateway, RDS/Aurora vs DynamoDB, ElastiCache, WebSockets, secrets, Dockerizing, CI/CD, the database cutover, and the gotchas (cold starts, connection pooling with RDS Proxy). A MAP-funded AWS partner can run the whole thing — often at little-to-no cost to you.
Almost every bad Node.js-on-AWS architecture traces back to one mistake: forcing a long-running app into a serverless mold (or vice-versa) because someone heard "Lambda is the AWS way to run Node." It is not. The honest first question is about your workload's shape, not the trend.
A long-running service benefits from staying warm and holding state between requests: a persistent Express/Fastify/NestJS HTTP API, a WebSocket server, a worker draining a queue, anything maintaining a database connection pool, an in-memory cache, or a long-lived gRPC/Kafka client. These want a container that boots once and serves many requests — on AWS, AWS App Runner or Amazon ECS on Fargate. The pool is reused, there is no per-request cold-start tax, and the cost model is predictable per running task.
An event-driven workload is the opposite: bursty, stateless, and idle much of the time — image-processing on S3 upload, a webhook receiver, a scheduled job, a Stripe/Twilio callback, a fan-out from SQS/EventBridge, an occasional low-traffic internal API. These map to AWS Lambda — you pay only while code runs, it scales from zero to thousands of concurrent executions automatically, and there is no idle server to pay for. Behind Amazon API Gateway, a Lambda becomes a fully-managed HTTP endpoint.
The trap is the middle. Teams take a steady-traffic Express API doing 50 requests/second, wrap it in `@vendia/serverless-express`, and put it on Lambda — and now they pay a cold-start tax on latency, fight connection-pool exhaustion against RDS, and pay *more* than a small always-on Fargate task would cost at that volume. Lambda is brilliant for spiky and idle; it is a poor fit for steady, latency-sensitive, connection-pooling synchronous APIs. And nothing forces a single answer: mature Node.js systems are frequently *hybrid* — the synchronous core on Fargate, the spiky asynchronous edges (webhooks, image processing, scheduled tasks, fan-out) on Lambda — which is exactly the allocation a migration partner produces first.
Ask: does a request benefit from a warm process with an open database connection pool? If yes (a typical REST/GraphQL API, a WebSocket server, a worker) → App Runner or ECS/Fargate. If the work is bursty, stateless, idle-heavy, or cron-like (webhooks, S3-triggered processing, scheduled jobs) → Lambda + API Gateway. Steady high-throughput synchronous API on Lambda is the most common over-engineering mistake — and the most common source of cold-start and connection-pool pain.
There are five technical ways to run Node on AWS (App Runner, ECS/Fargate, Lambda, ECS/EC2, EKS), but for a team migrating an existing app there are only three sensible ones. Here is what each is genuinely good at, in plain terms; the full side-by-side is in the comparison table below.
AWS App Runner is the closest thing AWS has to Heroku or Render for Node. Point it at a container image (or a GitHub repo for managed source builds) and it builds, deploys, serves HTTPS on a managed domain, and autoscales on request volume — the fastest path to "my Node API is live on AWS" with the least operational surface, and the right call for one or a handful of stateless web services where you do not want to own a VPC, a load balancer, and task definitions. The tradeoff is less control: VPC egress and private database connectivity are configurable but constrained, and very complex topologies outgrow it.
Amazon ECS on Fargate is the workhorse and the most common landing spot for production Node.js APIs. You get serverless containers (no servers to patch) with full control: VPC networking, security groups, an Application Load Balancer, fine-grained task CPU/memory, per-service autoscaling, and clean separation of an API service from a worker service. Your Express/Fastify/NestJS process boots once per task and serves many requests, so the pool is reused and there is no cold-start tax. This is where teams land when they have a real database, workers, WebSockets, or networking requirements — most teams migrating an established app.
AWS Lambda + Amazon API Gateway is the serverless target: per-millisecond billing, automatic scale from zero, zero idle cost. For Node it is ideal for spiky APIs, webhook handlers, scheduled jobs, S3/DynamoDB-stream triggers, and asynchronous fan-out — you write handler functions (or run a framework via an adapter, with caveats) and API Gateway provides the HTTP front door. The costs are specific: cold starts on latency-sensitive paths, a 15-minute execution ceiling, payload limits, and the database-connection problem at scale (see the gotchas); provisioned concurrency removes cold starts but also the scale-to-zero advantage. (Skip the other two targets when migrating — ECS on EC2 pays off only for specific cost/GPU/placement needs, and EKS is overkill for a single Node app unless you already run Kubernetes.)
App Runner — a few stateless HTTP services, git-push simplicity, minimal ops. ECS/Fargate — production API + workers, a real relational database, WebSockets, or VPC/networking control (the default for "we have a real app"). Lambda + API Gateway — spiky/idle-heavy/event-driven workloads, or the asynchronous edges of a long-running system. Unsure for a steady synchronous API? Choose Fargate — fewest sharp edges for Node.
The compute choice gets the attention; the data layer is where the migration risk actually lives. The good news for Node teams is that the mapping is mechanical and your ORM usually does not change.
If your Node app talks SQL — PostgreSQL or MySQL through Prisma, Drizzle, Sequelize, TypeORM, Knex, or raw `pg`/`mysql2` — it moves to Amazon RDS (the like-for-like managed database) or Amazon Aurora (AWS's cloud-native PostgreSQL/MySQL-compatible engine, with faster failover, autoscaling storage, up to 15 read replicas, and Serverless v2 capacity scaling). Because both speak the same wire protocol as your current database, schema, queries, and ORM models are unchanged — you swap the connection string and migrate the data. A common pattern is RDS first to minimize change at cutover, then evaluate Aurora once stable; Serverless v2 suits spiky workloads because it scales capacity with load.
If your app is key-value or document-shaped and built for horizontal scale — heavy single-table access, very high write throughput, predictable millisecond latency at scale — Amazon DynamoDB is the AWS-native fit, and it pairs especially well with Lambda (it scales the same way functions do, and its HTTP connection model sidesteps connection exhaustion entirely). The honest caveat: DynamoDB is not a drop-in for a relational app. If you rely on joins, ad-hoc queries, and cross-table transactions, "migrate Postgres to DynamoDB" is a *refactor*, not a migration — so most teams keep relational data on RDS/Aurora and reach for DynamoDB only for specific high-scale tables or new services.
For Redis — and Node apps use it constantly via ioredis/node-redis for caching, BullMQ/Bee-Queue job queues, session stores, rate-limiters, and pub/sub — the target is Amazon ElastiCache (Redis OSS or Valkey). It speaks the same protocol, so your client code keeps working with a connection-string change and cache, queues, and rate-limiters move with effectively no rewrite. Two supporting pieces round out the layer: file uploads and user assets belong on Amazon S3 (mandatory, not optional — local-disk writes break the moment you run multiple tasks or invocations), and full-text search on a self-hosted Elasticsearch moves to Amazon OpenSearch Service. Both are easy to forget until something stateful breaks across instances, so both live on the migration checklist.
Real-time is where the long-running-vs-serverless fork gets sharp, because WebSockets are inherently long-lived connections and Lambda is inherently ephemeral. If your Node app uses Socket.IO, `ws`, or GraphQL subscriptions, this section decides your compute choice as much as the API does.
The straightforward path for a Node WebSocket server (Socket.IO, `ws`, `uWebSockets.js`) is to run it as a long-running container on ECS/Fargate behind an Application Load Balancer — the ALB handles WebSocket upgrades natively, holds the persistent connection, and the connection lives in your process exactly as today. This is the lowest-friction move because your real-time code does not change. The one addition at scale: across multiple Fargate tasks, Socket.IO needs a shared adapter (the Redis adapter backed by ElastiCache) so a message published on one task reaches clients connected to another — a well-trodden pattern, not a rewrite.
The serverless path is Amazon API Gateway WebSocket APIs: API Gateway manages the persistent connections and invokes a Lambda on connect/disconnect/message events, and you push messages back out via the API Gateway Management API. It scales to huge connection counts with no servers to run and excels at fan-out-style real-time (notifications, live dashboards) where each message is a discrete event. The tradeoff is a genuinely different programming model — your Socket.IO server does not "just run" here; you re-implement the connection lifecycle around API Gateway routes and a connection store (usually DynamoDB). So the honest recommendation: if you already have a working Socket.IO / `ws` server, keep it long-running on Fargate behind an ALB and add the Redis adapter (the cheapest, lowest-risk move most partners default to); reach for API Gateway WebSocket APIs only for greenfield real-time, zero idle cost, or connection counts large and spiky enough that running your own socket fleet is the bigger cost.
On a single instance, Socket.IO "just works." The moment you autoscale to multiple Fargate tasks, a message emitted on task A will not reach a client connected to task B unless you add the Socket.IO Redis adapter backed by ElastiCache (or the AWS adapter). This is the #1 "real-time broke after we scaled" bug in Node migrations — design the shared adapter in from day one, not after the incident.
For a long-running Node app, three mechanical tasks make up most of the hands-on work: writing a container image, moving environment variables to AWS-native secret stores, and wiring a deploy pipeline. None is hard; all are easy to do badly, which is why they live on a checklist.
Dockerizing a Node app is usually a ~12-line Dockerfile, and the detail worth getting right is the multi-stage build: a build stage runs `npm ci` and `npm run build` (for TypeScript/Next/Nest), then a slim runtime stage (`node:22-alpine` or distroless) copies only `node_modules` and the built output and runs as non-root — small image, low attack surface. Set `NODE_ENV=production`, expose the listen port, and make the start command your real entrypoint (`node dist/main.js`), not `npm start`. Build in CI and push to Amazon ECR; App Runner can build from source instead, but a Dockerfile is the portable choice most partners standardize on so the same image runs in Fargate, App Runner, or locally.
Environment variables are the second translation. A Node app reads a flat `process.env` bag — `DATABASE_URL`, `REDIS_URL`, JWT secrets, API keys, feature flags. On AWS you split by sensitivity: real secrets into AWS Secrets Manager (which can auto-rotate RDS credentials), non-sensitive config into SSM Parameter Store. Both inject into the ECS task definition (or App Runner/Lambda config) so the container sees them as ordinary environment variables — `process.env.DATABASE_URL` works exactly as before and your code does not change. The migration step is a one-time export, a sort into secret-vs-config, and a scripted load so nothing leaks into an image layer or a committed `.env`.
CI/CD is the third piece. The simplest robust pipeline keeps your existing GitHub Actions and adds three steps: build + test, build and push the image to ECR, then update the ECS service (or trigger an App Runner deploy) for a health-checked, zero-downtime rollout. Teams wanting everything in-AWS use CodePipeline + CodeBuild + CodeDeploy (which adds blue/green deploys for ECS out of the box); a Lambda app uses AWS SAM, the Serverless Framework, or AWS CDK. The principle is identical: build an immutable artifact, push it, roll it out with health checks and automatic rollback.
Everything above can be built and tested without touching production. The database cutover is the one moment that touches live data — where a plan earns its keep. Done right, user-visible downtime is zero to fifteen minutes.
The core technique is to keep the new RDS/Aurora database continuously in sync with your current one while you test, so that at cutover you switch to an already-current copy rather than copying a cold database under time pressure. AWS Database Migration Service (DMS) does exactly this: a full load into RDS/Aurora, then a switch into change data capture (CDC) mode that streams every subsequent change in near-real time. When source and target are the same engine — PostgreSQL → RDS for PostgreSQL, or MySQL → Aurora MySQL — this is a homogeneous migration: no Schema Conversion Tool needed, schema moves as-is. If you are also changing engines, the AWS Schema Conversion Tool (SCT) translates the schema/stored-logic first, and that heterogeneous case is where the real effort lives.
With CDC running and the new Node stack smoke-tested against the synced database, the cutover is short and scripted: enter maintenance (or read-only) mode → let DMS drain the last few seconds → flip `DATABASE_URL` to the RDS/Aurora endpoint and switch traffic to the new Fargate service (or the new API Gateway stage) → update the Route 53 record (TTL lowered ahead of time) → verify writes land in RDS → exit maintenance. Because the data was already synced, the window is dominated by DNS propagation and verification, not data transfer — commonly 0–15 minutes during a low-traffic period, regardless of database size.
The rollback plan is non-negotiable and simple: keep the old environment running until you are confident, and if something is wrong, switch `DATABASE_URL` and DNS back. The clean window is before RDS has accepted production writes the old database has not seen — once committed, rollback means reconciling those writes — so most teams keep the window short (minutes to hours) and watch closely. Lowering DNS TTL to ~60 seconds a day ahead makes both the switch and the rollback fast. For a DynamoDB target the mechanics differ (export/import plus dual-writes, or a Glue/custom backfill) — one more reason a relational-to-DynamoDB move is a refactor, not a cutover.
A cold pg_dump | pg_restore (or mysqldump) makes your downtime window = export + transfer + import of the whole database: 20–60 minutes for 5GB, unacceptable at 50GB+. DMS with CDC moves the bulk load before the window and only drains the last few seconds during it — so downtime stays in single-digit minutes regardless of database size. Tiny databases can still skip DMS for a single dump/restore inside the window.
Here is the end-to-end sequence a partner runs, mapped to the AWS MAP phases (Assess → Mobilize → Migrate). For a typical Node.js app this is a 2–6 week project — most of it parallelizable, none of it touching production until the final cutover.
Most Node.js→AWS migrations that go sideways do so for a small, repeatable set of reasons — and several are specific to how Node and AWS interact. Naming them up front is the cheapest insurance there is.
The definitive three-way comparison for running a Node.js app on AWS. Pick the column that matches your workload shape — most production APIs land on Fargate, the simplest web services on App Runner, and spiky/event-driven workloads on Lambda. (Numbers are representative 2026 ranges.)
| Dimension | App Runner | ECS / Fargate | Lambda + API Gateway |
|---|---|---|---|
| Best for | Stateless web services, simplest ops | Production APIs, workers, WebSockets | Spiky, event-driven, async edges |
| Process model | Long-running container (warm) | Long-running container (warm) | Ephemeral per-invocation function |
| Cold starts | None (scales from low, not zero by default) | None | Yes — 200ms–2s+ unless provisioned concurrency |
| Scale to zero / idle cost | Scales low; minimal idle cost | Pay per running task (no scale-to-zero) | True scale-to-zero, $0 idle |
| Connection pooling to RDS | Reused per instance | Reused per task (best for pools) | Needs RDS Proxy (per-invocation conns) |
| WebSockets | Limited | Yes (ALB + Redis adapter) | Via API Gateway WebSocket (different model) |
| Max execution / request | No hard limit | No hard limit | 15 min; ~6MB sync payload |
| Networking control | Constrained (VPC egress configurable) | Full (VPC, SG, ALB, private subnets) | Full (VPC-attachable) |
| Ops surface | Lowest (Heroku-like) | Medium (tasks, ALB, autoscaling) | Low (no servers; more moving parts) |
| Cost at steady high traffic | Low–moderate | Lowest for steady load (+ Savings Plan) | Can be highest at sustained volume |
| Cost at spiky / low traffic | Low | Pay for idle tasks | Lowest (pay per ms) |
| Migration effort from existing app | Low | Medium (Dockerfile + task defs) | Medium–High (handler/adapter rework) |
Situation: They had outgrown the single VM but had no AWS or container experience, and a previous "let's just put it on Lambda" spike had stalled on cold starts and Prisma connection-pool errors against Postgres. They needed a resilient, autoscaling setup without a multi-month DIY effort or a risky big-bang database move — and they wanted real-time to survive horizontal scaling.
What CloudRoute did: Routed within 24 hours to an AWS Advanced-tier partner with a Node/TypeScript track record, who ran the MAP Assess phase (free) and filed the work as a MAP engagement. Target: the NestJS API and the Socket.IO server as two ECS/Fargate services behind an ALB (long-running, warm — not Lambda), Aurora PostgreSQL with RDS Proxy for Prisma pooling, ElastiCache for Redis (BullMQ + sessions + the Socket.IO Redis adapter for cross-task messaging), env vars split into Secrets Manager + Parameter Store, multi-stage Dockerfiles, and GitHub Actions → ECR → ECS rolling deploys. Two spiky edges — image thumbnailing on S3 upload and a nightly export — moved to Lambda. Postgres migrated via AWS DMS (full load + CDC).
Outcome: Cutover ran in a Saturday-night window with ~9 minutes of write-downtime — DMS had Aurora fully in sync, so the switch was DNS + verification, not data transfer. Real-time survived autoscaling (the Redis adapter fixed the multi-instance problem), the Prisma connection errors disappeared behind RDS Proxy, and steady-state AWS spend landed at ~$780/month with autoscaling headroom the single VM never had. Project ran ~4 weeks. Because the workload qualified for MAP, AWS funded the assessment and credited the migration cost — out-of-pocket cost was effectively $0, and CloudRoute's commission was paid by the partner from MAP funding.
project length: ~4 weeks · cutover downtime: ~9 min · architecture: single VM → Fargate (API + Socket.IO) + Aurora + RDS Proxy + ElastiCache + 2 Lambdas · migration cost to customer: ~$0 (MAP-funded)
CloudRoute routes you to a vetted AWS partner who chooses the right compute target (App Runner / Fargate / Lambda), stands up RDS/Aurora or DynamoDB + ElastiCache, Dockerizes, wires CI/CD, and runs the DMS cutover with RDS Proxy and the WebSocket fixes baked in. Qualifying migrations are MAP-funded, so you get a resilient, autoscaling Node deployment without paying the usual migration bill.