A senior platform engineer's guide to migrating from Azure Kubernetes Service to Amazon EKS. The honest split: your manifests, Helm charts, and Kustomize bases move almost verbatim — Kubernetes is Kubernetes — while the work concentrates at five cluster edges that are cloud-specific: identity (Azure AD Workload Identity → IRSA or EKS Pod Identity), ingress (AGIC → AWS Load Balancer Controller), storage (Azure Disk/Files CSI → EBS/EFS CSI), registry (ACR → ECR), and autoscaling (Cluster Autoscaler → Karpenter). Plus image and secret migration, the CNI/networking model, and a parallel-cluster cutover with a tested rollback. The migration itself is usually MAP-funded — a vetted partner runs it at low-to-no cost.
People search "aks to eks" expecting a hard migration. The reassuring truth is that the Kubernetes layer is genuinely portable — both AKS and EKS run upstream, conformant Kubernetes, so the API objects your application depends on are identical. The migration is not a rewrite; it is a re-platforming of the cluster's cloud-specific edges around a workload definition that barely changes — and knowing which is which is the whole game.
Separate your repo into two buckets. The first is portable Kubernetes: Deployments, StatefulSets, DaemonSets, Jobs/CronJobs, Services, ConfigMaps, HPAs, PodDisruptionBudgets, NetworkPolicies, RBAC, and the bulk of your Helm charts and Kustomize bases. These reference images, ports, env vars, resource requests, and labels — none of it Azure-specific — so applied to EKS they create the same workloads. Expect ~80% of your YAML to fall here and move with no edits.
The second bucket is the cloud-coupled edges — the five places a manifest names an Azure construct or leans on an AKS add-on and therefore must change: the ServiceAccount annotations bound to Azure AD Workload Identity, the Ingress annotations driving AGIC, the StorageClass/PVC parameters provisioning Azure Disks/Files, the image references pointing at ACR, and the autoscaler that grows the node pool. This page is mostly about that second bucket, because that is where the genuine engineering — and the risk — lives.
Against the 7 Rs, an AKS→EKS move is a rehost/replatform of the platform itself: you are not refactoring the application (the containers are already built and orchestrated), you are re-homing the cluster onto AWS primitives. That is why it is fast relative to a heterogeneous database migration — the unit of work is well understood and the manifests are the same on both ends. The rest of this page is the detail, edge by edge.
This is the edge that most often blocks go-live, and the one teams leave too late. On AKS, Azure AD Workload Identity federates a ServiceAccount to a managed identity via an OIDC trust, and the pod exchanges a token for Azure credentials. On EKS the equivalent is the same OIDC pattern pointed at AWS instead.
EKS gives you two mechanisms. IRSA (IAM Roles for Service Accounts) is the mature one: the cluster has an OIDC provider, you annotate a ServiceAccount with eks.amazonaws.com/role-arn, and its pods assume the IAM role via the SDK's web-identity token flow — almost one-to-one with the Azure AD Workload Identity model. EKS Pod Identity is the newer mechanism: an EKS-managed agent plus a Pod Identity Association removes the per-cluster OIDC-provider and trust-policy boilerplate, so role association is a simple API call. For greenfield EKS in 2026, Pod Identity is the lower-friction default; for tooling/Helm that already expects the IRSA annotation, IRSA is the compatibility-safe choice — many estates use both during transition.
The work is bounded and tedious: for every workload that talked to Azure (Key Vault, Blob, a database via managed identity), inventory the permissions it actually used, author a least-privilege IAM policy, create the role, and either annotate the ServiceAccount (IRSA) or create a Pod Identity Association. The biggest mistake is recreating Azure's broad managed-identity grants as one wide IAM role — it passes a casual review and fails a real audit. The migration is the moment to tighten, not to port the over-grants forward.
This also re-homes node identity: EKS nodes run under a node IAM role, and the managed add-ons (VPC CNI, EBS CSI, etc.) each want their own IRSA/Pod-Identity role. Stand these up in the landing zone so the cluster is functional before any application pod lands.
IRSA / Pod Identity and the per-workload IAM role mapping should be ready in the pilot cluster before the first real workload moves. Identity left to cutover week becomes the thing every other workstream waits on — pods crash-loop on AccessDenied, and you debug IAM under a clock. Inventory every Azure managed-identity grant early; it is the single highest-leverage prep task in an AKS→EKS migration.
On AKS, north-south traffic typically arrives through the Application Gateway Ingress Controller (AGIC) — which programs an Azure Application Gateway from your Ingress resources — or through ingress-nginx behind a LoadBalancer Service. On EKS the like-for-like is the AWS Load Balancer Controller, which provisions ALBs for Ingress and NLBs for Service type LoadBalancer.
This is an annotation rewrite, not a redesign. Your Ingress objects keep their rules, paths, and backends; the controller-specific annotation namespace changes. AGIC annotations (appgw.ingress.kubernetes.io/*) become alb.ingress.kubernetes.io/* — listen ports, health-check paths, SSL policy, target-type (ip for VPC-CNI pod IPs vs. instance), and ingressClassName. TLS shifts from Application Gateway listener certs (often Key Vault-sourced) to ACM certificates referenced by ARN. If you ran ingress-nginx on AKS, the simplest path is to keep it on EKS fronted by an NLB — near-zero rewrite — and adopt the ALB controller later for native WAF/ALB features.
Validate two behaviors explicitly. Target type: with VPC-CNI giving every pod a routable IP, target-type: ip lets the ALB target pods directly (skipping the node-port hop) — usually what you want, but it ties target health to pod readiness, so get your readiness probes honest. Health checks and timeouts: Application Gateway and ALB have different default probe intervals, deregistration delays, and idle timeouts, so a service healthy on AGIC can flap on an ALB until you align the health-check path and grace periods. Re-test every external endpoint before shifting production traffic.
DNS is the lever that makes cutover safe (section VII): publish the ALB/NLB behind a Route 53 record and shift weight gradually rather than repointing everything at once. WAF/rate-limiting policies on Application Gateway map to AWS WAF web ACLs on the ALB — re-author the rules; they do not translate automatically.
Stateless workloads carry zero storage baggage. The moment you have PersistentVolumeClaims, you have two distinct problems: re-pointing the StorageClass at an AWS CSI driver (easy) and physically moving the data inside the volume (the part that actually takes time, because a PV is cloud-specific and does not migrate by reapplying YAML).
The driver mapping is clean. Azure Disk CSI → Amazon EBS CSI for ReadWriteOnce block volumes (databases-in-cluster, write-ahead logs, single-writer state). Azure Files CSI → Amazon EFS CSI for ReadWriteMany shared file storage. You create new StorageClasses referencing ebs.csi.aws.com or efs.csi.aws.com, map the performance tier (Azure Premium SSD → EBS gp3/io2, setting IOPS/throughput explicitly on gp3), and update PVC storageClassName references. The EBS/EFS CSI drivers themselves want IRSA/Pod-Identity roles — another reason identity (section II) comes first.
The data is the work. A PersistentVolume bound to an Azure Disk is an Azure resource and cannot be reattached to EKS — you provision a fresh EBS/EFS-backed PVC and copy the bytes, choosing the method by data type. For in-cluster datastores (PostgreSQL, MySQL, MongoDB), use the application's own replication or dump/restore, not a block copy. For opaque file volumes, use a sync job (an rsync/rclone pod reading from the Azure side over the parallel-run link), Velero backup/restore (which also moves the Kubernetes objects), or AWS DataSync for large file sets landing in EFS.
Honest verdict: stateless-only clusters skip this section and migrate in days; stateful clusters spend most of their budget here, with each cutover a small maintenance window (quiesce writes, final sync/replicate, switch the app to the EKS volume, verify). Strongly consider externalizing in-cluster databases to managed RDS/Aurora/DocumentDB while you are touching the storage layer — the broader Azure→AWS estate migration usually does.
Your images live in Azure Container Registry; your pods pull from it. On EKS they pull from Amazon Elastic Container Registry. This edge is the most mechanical of the five — but the manifest and secret rewiring around it is where small mistakes cause crash-loops, so it is worth doing deliberately.
The registry. Create ECR repositories, then move the images. The brute-force path is docker pull/push, but for many repos and tags use crane or skopeo to copy image-to-image without a local docker daemon — faster and scriptable across all tags. For ongoing builds, repoint CI to push to ECR directly; for a transition window, ECR pull-through cache lazily mirrors upstream images. Turn on ECR image scanning while you are there — a free security win.
The manifests. Image references change from myregistry.azurecr.io/app:tag to <account>.dkr.ecr.<region>.amazonaws.com/app:tag. If your charts templatize the registry, this is a one-line values change per environment; if references are hard-coded across manifests, introduce that indirection now. Crucially, with node roles granting ECR pull permission you delete the imagePullSecrets AKS needed for ACR — EKS nodes authenticate to ECR via their IAM role, so the pull-secret is a stale credential to remove.
The secrets. On AKS these often come from Key Vault via the Secrets Store CSI driver. On EKS the equivalent is AWS Secrets Manager or SSM Parameter Store via the AWS provider for the same CSI driver, or synced into native Secrets by External Secrets Operator. Re-create each secret in Secrets Manager/SSM, point the SecretProviderClass at the AWS provider, and grant the workload's IRSA/Pod-Identity role read access to exactly those secrets. Never carry secret values across in a manifest or git diff — re-create them and let the CSI/operator inject them at runtime, as on Azure.
The last edge is how the cluster grows; the substrate underneath it is the network. AKS scales node pools with the Cluster Autoscaler — EKS can too, but the AWS-native answer in 2026 is Karpenter, a genuinely different and better model — and underneath, the AWS VPC CNI changes how pod networking and IP allocation work.
Autoscaling. The Cluster Autoscaler grows and shrinks predefined node pools. Karpenter instead provisions right-sized EC2 capacity on demand from pending-pod requirements: you define NodePools and EC2NodeClasses (instance families, capacity type, AZs, limits), and it picks instances that fit the unscheduled pods, consolidates underutilized nodes, and handles Spot natively. You translate node-pool definitions into Karpenter NodePools/EC2NodeClasses — a small amount of new authoring that pays off in better bin-packing and lower cost. You can run the Cluster Autoscaler first for a faithful lift, but most teams go straight to Karpenter because the cost win is the reason to be on EKS.
Networking — the CNI. Understand this substrate difference before sizing the cluster. The default AWS VPC CNI gives every pod a real VPC IP — pods are first-class on the network (great for ALB target-type: ip and security-group-per-pod), but pod density is bounded by ENI/IP limits per instance type, so you must size subnet CIDRs to avoid exhausting addresses. If your AKS cluster used Azure CNI Overlay or kubenet, the IP-planning math differs; budget VPC/subnet CIDRs for your real pod count (or use VPC-CNI prefix delegation to raise per-node density). NetworkPolicies are enforceable via the VPC CNI's network-policy support or Calico — re-test them, because enforcement engines differ at the edges.
Load balancing, DNS, observability. Service type LoadBalancer provisions an NLB via the AWS Load Balancer Controller. CoreDNS runs on both, so in-cluster discovery is unchanged; external DNS automation maps to ExternalDNS pointed at Route 53. Azure Monitor / Container Insights → CloudWatch Container Insights or a Prometheus/Grafana stack (Amazon Managed Prometheus + Managed Grafana), which most teams keep portable across the move.
A good AKS→EKS cutover never flips a switch. You stand up EKS alongside the live AKS cluster, run both in parallel, shift traffic gradually with weighted DNS, and keep a tested rollback for every wave. Here is the shape of a typical 4–10 week migration — faster than a full-estate move because the unit of work (a cluster of mostly-portable manifests) is well bounded.
The principle is run-in-parallel, shift-by-weight, roll-back-by-revert: because the manifests are portable, the EKS cluster can be made functionally identical and validated under real traffic before it owns 100% of it.
Inventory the AKS estate — namespaces, workloads, PVCs, Ingress, secret sources, managed-identity grants, the add-ons in use (AGIC, Key Vault CSI, autoscaler), and external dependencies — and classify workloads as stateless (fast) vs. stateful (the critical path). Output a target EKS architecture (topology, node strategy, VPC/subnet CIDR plan, IRSA vs. Pod Identity) and a costed plan. This is the MAP "Assess" deliverable, typically AWS-funded.
Stand up EKS inside a proper AWS landing zone (multi-account org, correctly-sized VPC, guardrails, centralized logging), install the platform add-ons (VPC CNI, EBS/EFS CSI, AWS Load Balancer Controller, Karpenter, metrics-server), wire IRSA/Pod Identity and node roles, and connect the parallel-run path back to Azure (VPN or Direct Connect). Migrate one representative workload end-to-end to prove the toolchain and write the runbook. Also MAP-funded.
Move stateless namespaces first because they are reversible. Copy images to ECR, apply the (mostly unchanged) manifests with the five edges rewired, and validate each service against the new ALB/NLB. Bring EKS up behind a Route 53 weighted record alongside AKS and shift traffic 5% → 25% → 100% per service, watching CloudWatch and error rates. Rollback is instantaneous — revert the DNS weight to AKS.
Stateful workloads cut over last and individually. Stand up the target (managed RDS/Aurora/DocumentDB if externalizing, or an EBS-backed StatefulSet if staying in-cluster), seed it, and keep it in sync via the application's replication. Each cutover is a short maintenance window: quiesce writes on Azure, drain the final sync, switch the connection to EKS, verify, resume. Keep the Azure source read-only as a rollback target for a defined window.
Run the full test suite against EKS, watch a billing cycle, let Karpenter consolidation and gp3 right-sizing settle the cost, then decommission the AKS cluster — which stops the Azure bill. MAP "Modernize" credits often fund this optimization (Spot via Karpenter, moving in-cluster state to managed services). Don't skip decommission: a half-running AKS cluster is pure cost.
Every wave has a tested rollback. Stateless workloads roll back by reverting the Route 53 weight to AKS — seconds. Stateful workloads roll back by keeping the Azure datastore read-only and re-pointing the connection string. The partner rehearses the runbook in the pilot, before the production window — a migration without a rehearsed rollback is a bet, not a plan.
These don't appear in the tidy service-mapping table but reliably cost a first-time team a day each. A partner who has done AKS→EKS before front-loads all of them; a first-timer discovers them one crash-loop at a time.
You can run an AKS→EKS migration in-house, and because the manifests are portable a strong platform team can. Most still shouldn't carry the whole thing alone — the edges (IRSA mapping, the AGIC→ALB rewrite, the stateful-data copy) are where time and incidents hide. The better question is who runs it and who pays.
The mechanism that makes this nearly free is the AWS Migration Acceleration Program (MAP), which runs in three phases — Assess (readiness + TCO, often fully funded), Mobilize (the EKS landing zone + pilot cluster, funded), and Migrate & Modernize (the production cutover, where AWS credits a meaningful share of the cost). The partner is paid through MAP and AWS engagement funding, and credits scale with workload size. Honest framing: MAP applies to qualifying migrations — typically tied to a committed level of post-migration AWS spend, not a blank check — but for a real AKS estate moving to EKS, the assessment is commonly free and a large share of the migration cost is covered. The mechanics match the broader Migration Acceleration Program path and tie into the AWS credits and POC-funding programs.
This is where CloudRoute fits. We don't run migrations — we route you to a vetted, MAP-eligible AWS partner, matched to your stack (Kubernetes/EKS, your specific add-ons, any in-cluster databases), region, and timeline. You get someone who has done the Azure AD Workload Identity → IRSA translation and the AGIC → ALB rewrite before, the migration is largely MAP-funded, and CloudRoute is paid by the partner — so the routing costs you nothing. The migration persona page walks through how an engagement runs end to end. If the Kubernetes move is part of a wider Azure exit, start at Azure to AWS; if you're on GKE, see GKE to EKS.
The condensed mapping for an AKS→EKS migration. The Kubernetes API objects (left, unlisted) carry over unchanged; this table is the cloud-specific edges and add-ons that actually change. "Effort" is the honest verdict — mechanical, replatform, or re-engineer.
| AKS concept | EKS equivalent | Primary tool / approach | Effort |
|---|---|---|---|
| Manifests, Helm, Kustomize | Same (portable Kubernetes API) | kubectl / helm / kustomize apply — minimal edits | Mechanical |
| Azure AD Workload Identity | IRSA or EKS Pod Identity | OIDC trust + per-workload IAM roles/policies | Replatform → re-engineer |
| AGIC (App Gateway Ingress) | AWS Load Balancer Controller → ALB | Rewrite Ingress annotations; ACM for TLS | Replatform |
| ingress-nginx + Azure LB | ingress-nginx + NLB (or ALB controller) | Keep nginx behind an NLB; near-zero rewrite | Mechanical |
| Azure Disk CSI (RWO) | Amazon EBS CSI (gp3/io2) | New StorageClass + copy PV data (app/Velero) | Replatform (data copy is the work) |
| Azure Files CSI (RWX) | Amazon EFS CSI | New StorageClass + DataSync/rsync for data | Replatform |
| Azure Container Registry (ACR) | Amazon ECR | crane/skopeo copy or pull-through cache; drop imagePullSecrets | Mechanical |
| Key Vault (Secrets Store CSI) | Secrets Manager / SSM (CSI or ESO) | Re-create secrets + AWS provider SecretProviderClass | Replatform |
| Cluster Autoscaler + node pools | Karpenter (NodePools / EC2NodeClasses) | Translate node-pool config to Karpenter constraints | Replatform |
| Azure CNI / kubenet | AWS VPC CNI (pod = VPC IP) | Plan subnet CIDRs / prefix delegation; re-test NetworkPolicy | Re-engineer (IP planning) |
| Azure Monitor / Container Insights | CloudWatch Container Insights or AMP + Grafana | Re-home dashboards/alerts | Replatform |
Situation: Acquirer standardized on AWS and wanted the Kubernetes estate on EKS within the quarter. The platform team knew Kubernetes well but had never run an EKS migration: they were unsure about Azure AD Workload Identity → IRSA, nervous about the AGIC → ALB ingress rewrite across 18 external endpoints, and worried about moving the in-cluster PostgreSQL without a long downtime. The platform lead was at capacity, and nobody had sized the VPC for VPC-CNI pod density.
What CloudRoute did: Routed within 24 hours to a MAP-eligible partner with an EKS track record. MAP Assess (AWS-funded) produced the target EKS topology, a VPC/subnet CIDR plan, and an IRSA-plus-Pod-Identity decision in under a week. Mobilize built the EKS landing zone (VPC CNI, EBS/EFS CSI, AWS Load Balancer Controller, Karpenter), wired IRSA + node roles, and ran a pilot end-to-end behind a Direct Connect link. Images copied ACR→ECR with crane; the 24 services' manifests applied with the five edges rewired (AGIC→ALB, Key Vault CSI→Secrets Manager, imagePullSecrets dropped); stateless services shifted behind Route 53 weighting 5%→25%→100%; the in-cluster PostgreSQL was externalized to RDS for PostgreSQL via logical replication and cut over in a short window.
Outcome: Full cluster cutover in 7 weeks, zero unplanned downtime (the single database window was under 8 minutes). Karpenter + Spot dropped steady-state compute cost ~28% vs. the old AKS node pools. MAP funded Assess + Mobilize and credited ~45% of the migrate-and-modernize cost against a post-migration AWS commitment. AKS decommissioned in week 8, ending the Azure spend. CloudRoute was paid by the partner — the customer paid $0 for the routing.
timeline: 7 weeks · downtime: <8 min (1 db window) · compute saving: ~28% (Karpenter + Spot) · MAP-funded: ~45% of migrate cost · routing cost to customer: $0
CloudRoute routes you to a MAP-eligible AWS partner matched to your Kubernetes stack and region. Assessment is often free; the migration is largely MAP-funded on qualifying workloads. Customer pays $0 for the routing — no procurement, no discovery theater.