AWS gives you three modern load balancers — Application (ALB), Network (NLB), and Gateway (GWLB) — plus the deprecated Classic one you should be migrating off. They share one mental model (listeners and rules point at target groups; target groups health-check their targets) but solve different problems. This page explains when each is the right tool, the concepts that decide your design — health checks, TLS termination, sticky sessions, cross-zone — how the LB ties into Auto Scaling, WAF, and CloudFront, what LCU pricing really costs, and the misconfigurations behind most "works in staging, 502s in prod" incidents.
AWS markets all of this as "Elastic Load Balancing" (ELB), but ELB is a family, not a product — four members, three of them current. Picking the wrong one is the most expensive decision in this space because it is the hardest to reverse: the LB type is baked into your listeners, target-group types, and often your DNS.
Application Load Balancer (ALB) operates at Layer 7 — it understands HTTP, HTTPS, HTTP/2, gRPC, and WebSockets. Because it reads the request, it can route on hostname, URL path, HTTP headers, query strings, source IP, and method. It terminates TLS, integrates with AWS WAF and Cognito, and supports weighted target groups for blue/green and canary releases. For the large majority of web apps, APIs, and microservices the ALB is the right default, and the rest of this page assumes it unless stated otherwise.
Network Load Balancer (NLB) operates at Layer 4 — it forwards TCP and UDP flows without parsing them. That makes it extremely fast (single-digit-millisecond added latency), capable of millions of connections per second, and able to preserve the client source IP. It can hand out static IPs and Elastic IPs (one per Availability Zone), which is why teams reach for it when a customer needs to allowlist a fixed address. It can terminate TLS or pass it straight through to the backend.
Gateway Load Balancer (GWLB) is the specialist: it transparently inserts third-party network appliances — next-gen firewalls, IDS/IPS, deep packet inspection — into your traffic path using the GENEVE protocol on port 6081. You almost certainly do not need it unless you are building a centralised security-inspection VPC or running a vendor appliance like Palo Alto, Fortinet, or Check Point in AWS. If you have to ask whether you need GWLB, you do not.
Classic Load Balancer (CLB) is the original, and it is on its way out — EC2-Classic-era Classic Load Balancers are being retired, and CLB lacks essentially every modern feature (no host/path routing, no WAF, no HTTP/2, weak target-group semantics). If you still run one, schedule its migration to an ALB (for HTTP) or NLB (for TCP) as a backlog item rather than an emergency. New designs should never start on Classic.
Speaks HTTP and you want routing, TLS, and WAF → ALB. Raw TCP/UDP, lowest latency, static IPs, or TLS passthrough → NLB. Inserting a firewall/IDS appliance into the packet path → GWLB. Anything currently on Classic → plan its migration.
Once you internalise this four-part model, every AWS load balancer looks the same and the documentation stops being confusing. The differences between ALB, NLB, and GWLB are just differences in what each part is allowed to do.
A listener is the front door: it binds to a protocol and port — say HTTPS on 443 — and waits for connections. An ALB listener speaks HTTP/HTTPS; an NLB listener speaks TCP, UDP, TCP_UDP, or TLS. A load balancer can have several (commonly 80 and 443, with 80 doing nothing but redirecting to 443).
A rule (ALB only) decides what to do with each request. Rules evaluate in priority order, match on conditions — host header, path pattern, HTTP header, query string, source IP — then take an action: forward to a target group, redirect, return a fixed response, or authenticate. The default rule is the catch-all at the bottom. NLB has no rules; a listener forwards straight to one target group.
A target group is the pool of backends plus its routing config. Its target type is what people get wrong: instance (EC2 instances), ip (raw IPs — required for Fargate, on-prem over Direct Connect/VPN, or fine-grained control), lambda (an ALB can invoke a function as a target), or alb (an NLB can forward to an ALB, combining static IPs with L7 routing). One target group belongs to exactly one VPC.
A health check is the target group continuously probing each registered target and sending live traffic only to those passing. You set the protocol, path (for HTTP), healthy/unhealthy thresholds, interval, timeout, and — critically — the success matcher (which status codes count as healthy). A target that flaps, or never goes healthy at all, is behind a large share of load-balancing incidents, which is why the next section treats health checks as their own topic.
Put together: a client hits the load balancer's DNS name (resolved to LB nodes spread across your subnets), the listener accepts, an ALB evaluates rules top-down while an NLB forwards directly, and the chosen target group picks a healthy target — round-robin by default on an ALB, flow-hash on an NLB. If no targets are healthy, the ALB returns a 503 and the NLB simply drops the connection — which is why "503 from the ALB" almost always means "your health check is failing," not "the ALB is broken."
If you remember one operational thing from this page, make it this: the load balancer is rarely what broke — the health check, the security group, or the target-group wiring is. Here is the short list of what actually goes wrong and how to read the symptom.
A health check has more knobs than people expect, and the defaults are not always right. The success matcher defaults to 200 for HTTP target groups — if your health endpoint returns 302 or 204, every target is marked unhealthy and you get a 503 with no obvious cause. The path defaults to /, which often hits your full app stack (and its database) on every probe; a lightweight /healthz checking only liveness is faster and cheaper. Interval and thresholds trade detection speed against flapping — a 5-second interval with a healthy threshold of 2 detects fast but punishes a GC pause harshly.
The meta-lesson: when something is wrong, read the target-group health status first and the security-group rules second. Those two screens explain the large majority of load-balancing incidents, and both are visible in the console seconds after an alert fires.
The ALB is where most teams spend their time, so it is worth going one level deeper on the features that change your architecture rather than just your config file.
Host- and path-based routing is the ALB's superpower. A single ALB can serve api.example.com, app.example.com, and admin.example.com to three different target groups, and within one host route /v1/* to one service and /v2/* to another. That is what lets a small team run many microservices behind one load balancer instead of one per service. Rules evaluate by priority; keep specific patterns above broad catch-alls.
TLS termination means the ALB holds the certificate (issued free and auto-renewing via AWS Certificate Manager), decrypts inbound HTTPS, and forwards plaintext HTTP to your targets inside the VPC — or re-encrypts to the backend if compliance needs end-to-end encryption. ACM plus the ALB removes virtually all certificate toil: no manual renewals, no expiry pages. Use Server Name Indication (SNI) to serve many certificates from one HTTPS listener.
Sticky sessions pin a client to the same target with a cookie (the load-balancer-generated AWSALB cookie or an application cookie). Stickiness is sometimes needed for legacy stateful apps, but treat it as a smell: it undermines even distribution and complicates deploys, since a target you want to drain still has clients pinned to it. The durable fix is to make the app stateless and push session state to ElastiCache/Redis or DynamoDB.
Weighted target groups let one rule send, say, 95% of traffic to the current version and 5% to a new one — the foundation for canary and blue/green releases with no extra tooling. With health checks and CloudWatch alarms you shift weight gradually and roll back instantly by zeroing the new group's weight. It is the cleanest native way to do progressive delivery on AWS for HTTP services.
Cross-zone load balancing spreads requests evenly across targets in all enabled AZs, not just the one the LB node lives in. On the ALB it is on by default and free. On the NLB it is off by default, and turning it on can incur inter-AZ data transfer charges. Leaving NLB cross-zone off when your targets are unevenly distributed across AZs is a classic cause of "one set of instances is on fire while the others idle."
The ALB is the default, but defaulting to it blindly is itself a misconfiguration. Three categories of workload genuinely belong on an NLB, and a fourth on a GWLB.
Reach for the NLB when latency is the product. Because it never parses the payload, an NLB adds only single-digit milliseconds and sustains extreme connection volumes. Real-time gaming backends, MQTT/IoT fleets, latency-sensitive trading-adjacent systems, and gRPC streaming that does not need L7 routing all benefit. If you measure p99 in single-digit milliseconds, the ALB's request parsing is overhead you can shed.
Reach for the NLB when you need a stable IP. The NLB can be assigned static IPs and Elastic IPs — one per AZ. Enterprise customers who must allowlist your service by IP, and partners with rigid firewall change processes, push you toward an NLB. An ALB's IPs are not stable; if a third party must pin an address, that is an NLB job — or an NLB-in-front-of-ALB sandwich to keep both static IPs and L7 routing.
Reach for the NLB for non-HTTP protocols and TLS passthrough. Databases, message brokers, SMTP, custom binary protocols, and UDP services (DNS, QUIC, syslog, game traffic) cannot live behind an ALB at all — it only speaks the HTTP family. The NLB also supports true TLS passthrough, handing encrypted bytes to your backend so the certificate and decryption stay on your instances, which some compliance regimes require.
GWLB is for inserting security appliances. If your security team mandates that all traffic flow through a vendor firewall or IDS/IPS, GWLB plus a GWLB endpoint transparently steers packets through a fleet of those appliances and back, with health checking and horizontal scaling for the fleet. It is a network-architecture building block, not an application load balancer, and usually lives in a dedicated inspection VPC governed by Transit Gateway routing.
A load balancer is one layer in a stack. Its value depends heavily on what sits behind it (the scaling layer) and what sits in front of it (the edge and security layer).
Auto Scaling behind the LB. The standard pattern is an Auto Scaling group registered to a target group: as it scales out, new instances auto-register and receive traffic only once they pass the health check; as it scales in, instances are deregistered with a draining period so in-flight requests finish. Critically, point the group at the target group's health check (ELB health-check type), not just the EC2 status check — otherwise an instance whose app has crashed but whose VM is fine stays "in service" and keeps black-holing requests. For containers, ECS and EKS register tasks/pods to target groups automatically (via the AWS Load Balancer Controller on EKS), so scaling the service scales the backend pool.
AWS WAF in front of the ALB. WAF attaches directly to an ALB (and to CloudFront and API Gateway) and filters requests before they reach your targets — managed rule groups for the OWASP Top 10, rate-based rules against brute-force and basic DDoS, geo-blocking, and bot control. WAF attaches to the ALB, not the NLB, one more reason HTTP services want the ALB: it is the natural place to bolt on a web firewall. AWS Shield Standard guards against common network-layer DDoS for free; Shield Advanced adds higher-tier and cost protection.
CloudFront in front of everything. Putting the CloudFront CDN ahead of your ALB caches static assets at the edge, terminates TLS close to users, absorbs spikes, and shrinks the attack surface — you can lock the ALB's security group to accept traffic only from CloudFront's managed prefix list, making the origin effectively unreachable except through the CDN. For global, content-heavy, or spiky apps that is the difference between an origin that buckles under a launch and one that does not notice it. The full layered shape is CloudFront → WAF → ALB → Auto Scaling/ECS/EKS → health checks; you do not need all of it on day one, but knowing it tells you what to add, and in what order, as traffic and risk grow.
Load-balancer pricing confuses people because the headline hourly rate is tiny and the real cost hides in capacity units. Here is the honest version, with representative 2026 ranges — always confirm against the AWS pricing page for your region.
Every ALB and NLB costs a small hourly charge (representative: roughly $0.022–$0.027 per hour, ~$16–$20/month just to exist) plus Load Balancer Capacity Units (LCUs). An LCU bundles four dimensions at once: new connections per second, active (concurrent) connections, processed bytes, and — for the ALB only — rule evaluations per second. You are billed for the single highest of the four each hour, not the sum, so forecasting cost means figuring out which dimension your workload is bound by.
For a typical small-to-mid web app — a few hundred requests per second, modest payloads — the LB usually lands around $20–$45/month all-in. Byte-heavy workloads (media, large API responses) are processed-bytes-bound and cost more as throughput climbs; connection-churny workloads (short-lived requests, aggressive health checks across many target groups) are connection- or rule-bound. NLBs meter LCUs slightly differently (bandwidth and new/active flows) but the same "pay for the dominant dimension" logic applies.
The surprise line item is data transfer, especially cross-AZ traffic. With NLB cross-zone load balancing enabled, traffic the LB sends to targets in another AZ is billed as inter-AZ transfer in both directions — at scale that can rival or exceed the LB's own charges. Chatty health checks (a 5-second interval times many targets times many target groups) also quietly accumulate connections and, on the ALB, rule evaluations. None of this is large for a small app, but it is what turns a "$30 load balancer" into a "$300 load balancer" once you run dozens of services across three AZs.
| Cost dimension | How it is billed | Who it bites | How to control it |
|---|---|---|---|
| Hourly LB charge | Flat per-hour per LB (~$16–$20/mo) | Everyone, always | Consolidate microservices behind one ALB via host/path rules |
| LCU — new connections | Per new conn/sec (billed if dominant) | Short-lived request churn, no keep-alive | Enable HTTP keep-alive; reuse connections |
| LCU — active connections | Per concurrent conn (billed if dominant) | Long-lived WebSockets / streaming | Right-size; expected for streaming workloads |
| LCU — processed bytes | Per GB through the LB (billed if dominant) | Media, large payloads, high throughput | Cache static assets on CloudFront; compress |
| LCU — rule evaluations (ALB) | Per eval/sec above the included quota | Many rules + high RPS | Prune dead rules; order specific rules first |
| Cross-AZ data transfer | Per GB inter-AZ, both directions | NLB cross-zone on; spread-out targets | Weigh cross-zone vs balance; keep chatty paths in-AZ |
Reading this page tells you what good looks like. Building it correctly the first time — listeners, target-group types, health-check matchers, draining, cross-zone, WAF, the CloudFront origin lock-down — is quick for someone who has done it fifty times and a multi-week yak-shave for someone doing it once. That is the gap CloudRoute closes.
CloudRoute is not an agency and does not do the work itself. It routes you to a vetted AWS partner who designs and implements the load-balancing and ingress layer for your specific stack — ECS, EKS, or plain EC2; single-region or multi-AZ; public or internal — with WAF and the CloudFront edge configured properly. You get the work delivered, not a pile of documentation and a wish of luck.
The part that makes this easy to say yes to: for credit-eligible companies the engagement is often substantially AWS-funded. The partner is paid through AWS partner-funding programs and your AWS consumption during the build is typically credit-covered, so the realistic cost is $0 or close to it. We are honest about the boundary — AWS-funded applies to credit-eligible engagements (typically institutionally-backed startups with a real AWS workload). If you do not qualify, it is still a vetted-partner referral that skips months of hiring and vetting a senior platform engineer, but it is a paid engagement, and we say so up front.
If you are also early enough to be chasing AWS credits, the two efforts compound: the same partner who designs your load balancing can file the credit application that funds it. Start with the $100K AWS credits path if funding is the priority, or jump straight to the startup track if you mainly want the infrastructure work done.
A real load-balancing engagement covers: the right LB type per service, listeners and TLS via ACM, target groups with the correct target type, health checks with sane paths and matchers, deregistration/draining tuned for zero-downtime deploys, cross-zone configured deliberately, Auto Scaling or ECS/EKS registration wired to ELB health, WAF rules in front, and (where it fits) a CloudFront origin locked to the CDN. Done once, correctly, it stops paging you.
The Classic Load Balancer is deprecated and omitted here on purpose — if you are choosing today, you are choosing between these three. Match the row that describes your traffic to the column, and the decision usually makes itself.
| Dimension | Application LB (ALB) | Network LB (NLB) | Gateway LB (GWLB) |
|---|---|---|---|
| OSI layer | Layer 7 (HTTP/HTTPS/gRPC/WS) | Layer 4 (TCP/UDP/TLS) | Layer 3 gateway (GENEVE) |
| Routing logic | Host, path, header, query, method, source IP | Flow hash (5-tuple); no content routing | Transparent steer through appliance fleet |
| Best for | Web apps, APIs, microservices | Low latency, high throughput, TCP/UDP | Inline firewall / IDS / IPS appliances |
| Latency added | Low (parses requests) | Ultra-low (single-digit ms) | Pass-through (appliance dominates) |
| TLS | Terminate (ACM) or re-encrypt | Terminate or true passthrough | N/A (appliance handles) |
| Static / Elastic IP | No (DNS name only) | Yes — one per AZ | Via endpoints |
| WAF attachable | Yes | No | No |
| Cross-zone default | On, free | Off (may incur inter-AZ cost) | On |
| Sticky sessions | Yes (cookie-based) | Yes (source-IP, for TCP) | N/A |
| Typical verdict | The default — start here | When L7 is the wrong tool | Niche security architecture |
Situation: A single Classic Load Balancer left over from the first prototype, fronting a Fargate service via instance targets that no longer matched how Fargate works. Random 502s under any real load, no path-based routing (so every new service meant a new load balancer), TLS certs renewed by hand, and an upcoming enterprise customer who required a static IP to allowlist. The lone backend-leaning engineer had never set up an ALB, a target group, or WAF and did not want to learn it live in production during a launch.
What CloudRoute did: Routed within 20 hours to an EU-Central AWS partner with an ECS/EKS networking track record. The partner replaced the Classic LB with an ALB using ip target groups (correct for Fargate), moved host/path routing onto one ALB so three services shared it, switched certs to ACM auto-renewal, tuned health checks (dedicated /healthz, 200/204 matcher) and deregistration delay for zero-downtime deploys, fronted it with AWS WAF managed rules, and put an NLB-in-front-of-ALB in place to hand the enterprise customer a static Elastic IP. Auto Scaling was repointed to ELB health so crashed tasks drained automatically.
Outcome: 502s gone; deploys became zero-downtime. Three services now run behind one ALB instead of three load balancers, trimming both cost and operational surface. The enterprise customer got their static IP and signed. The whole engagement was credit-eligible — the partner was paid through AWS funding and the AWS consumption was credit-covered — so CloudRoute's commission came from the partner and the customer paid $0.
engagement window: ~3 weeks · founder time: ~5 hours · result: clean ALB/NLB ingress + WAF · cost to customer: $0
CloudRoute routes you to a vetted AWS partner who builds the ingress layer — ALB/NLB, target groups, health checks, autoscaling, WAF, CloudFront. Often AWS-funded for credit-eligible companies, so the customer pays $0.