A migration fails in the gaps between steps — the dependency nobody mapped, the cutover with no rollback, the source you forgot to decommission while it kept billing. This is the actual end-to-end checklist a vetted partner runs: the pre-migration foundation, the per-wave loop you repeat for every workload, and the post-migration cleanup that turns a lift into real savings. It is the same runbook AWS MAP funds — so for qualifying migrations a partner executes it at little-to-no cost to you.
Most "AWS migration checklist" articles dump 40 bullet points in a flat list. That is not how a migration runs. A migration is a series of <strong>waves</strong>: you batch applications by how tightly they are coupled, migrate a wave end-to-end, validate it in production, then start the next. Low-dependency systems go first to build muscle and prove the landing zone; the crown-jewel systems go last, once the team has done it five times already.
That gives the three-phase shape this page follows. Phase 1 — Pre-migration is the foundation, built once before any workload moves, and every wave depends on it. Phase 2 — Per-wave is the loop: nine gates from assessment to a cutover with a tested rollback. Phase 3 — Post-migration is the cleanup teams under-fund: optimization, right-sizing, decommissioning the source, and knowledge transfer so the team can run the platform without the partner.
Throughout, treat each item as a literal checkbox with an owner and an exit criterion — "done" means a named person signed off against a defined condition, not "we talked about it." The comparison table near the end collapses the whole thing into a phase → tasks → owner grid you can lift straight into a tracker. This is the runbook, not a substitute for someone who has run it — which is why the CloudRoute model has a vetted partner execute it for you (MAP-funded where the migration qualifies), so the judgment calls are bought, not improvised.
Nothing moves until these seven are done, and they are sequenced — discovery feeds dependency mapping, which feeds the TCO, which feeds the MAP application; the landing zone, account structure, and security baseline are the platform every wave lands into. Rush this phase and you pay for it in every subsequent wave. Each item has an exit criterion — treat it as the definition of "checked."
You cannot migrate what you cannot see. Run AWS Application Discovery Service (feeding Migration Hub) to inventory every server, container, database, and store, capturing each asset's p95 utilization (not just average), OS/runtime versions, listening ports, and 2–4 weeks of network-connection data — the real traffic, not the architecture diagram someone drew in 2021. Exit criterion: a reconciled inventory where every asset is tagged with a candidate disposition under the 7 Rs — Retire, Retain, Rehost (lift-and-shift), Relocate (move the hypervisor, e.g. VMware), Repurchase (move to SaaS), Replatform (lift-tinker-shift), or Refactor. Most workloads land on Rehost or Replatform first; refactor selectively, later.
The highest-leverage, most-skipped item on the checklist. From the network-connection data, build a dependency graph — which service talks to which, on which port, how often — to surface the integrations nobody documented: the cron job on a forgotten host writing to prod, the internal API three teams depend on, the license server. These dependencies define your move groups: tightly-coupled systems must migrate in the same wave, or you create the "split-brain" anti-pattern with half a chatty system on AWS and half on-prem. Migration Hub Strategy Recommendations help, but the manual reconciliation with app owners is irreplaceable. Exit criterion: a signed-off wave plan where each application is assigned to a wave, every cross-wave dependency is identified, and there are no "we will figure that out at cutover" unknowns — the single document that prevents the most common day-of-cutover failure.
Model the current run-rate (hardware refresh, colo, current-cloud bill, licensing, ops labor) against the projected AWS run-rate using the AWS Pricing Calculator and the utilization data — size to real p95, not to the over-provisioned on-prem boxes, or you lift-and-shift the waste and conclude "AWS is expensive." Include the costs teams forget (egress, NAT/load-balancer hours, backup storage, the source-and-target parallel run during migration), then layer the savings levers: Savings Plans/RIs on the baseline, Graviton, and license elimination (Oracle/SQL Server → Aurora PostgreSQL via SCT + DMS). Exit criterion: a TCO model with a defensible projected monthly AWS spend (the number the MAP application is sized against) and a payback timeline the business has agreed to.
The AWS Migration Acceleration Program (MAP) funds a partner-run migration in three phases that mirror this checklist: Assess (the discovery/dependency/TCO work above — frequently funded, so the assessment costs little), Mobilize (landing zone + pilot wave), and Migrate & Modernize (the production waves). AWS credits a share of the cost that scales with migration size, filed by a partner through the AWS Partner Network. Honest framing: this applies to qualifying migrations — those with a meaningful committed AWS spend after migration (sized against the step-3 TCO); smaller workloads that miss the bar get a de-risked, fixed-scope partner cutover instead, and either way the customer does not pay CloudRoute. Exit criterion: a MAP record filed by the partner (Assess phase opened) with the workload sized and projected post-migration spend documented. See the AWS credits cluster cross-links for the funding mechanics end to end.
A landing zone is the secure, multi-account, governed environment every wave migrates into — never hand-build a single account (you will regret it the moment the second team needs isolation). Use AWS Control Tower (orchestrating AWS Organizations, centralized logging, guardrails, and account vending) to establish the network backbone (VPCs, multi-AZ subnets, a Transit Gateway topology, and the hybrid link — Direct Connect or Site-to-Site VPN — back to the source during migration) plus the centralized log/audit accounts. Exit criterion: a deployed landing zone with at least the core accounts (management, log archive, audit/security, shared-services networking) provisioned via infrastructure-as-code, and hybrid connectivity to the source up and tested.
Inside the landing zone, define the AWS Organizations OU and account structure before workloads arrive: separate accounts per environment (prod/staging/dev) and per workload, grouped into OUs with Service Control Policies enforcing org-wide rules. This buys a hard blast-radius boundary (a dev mistake cannot touch prod), clean per-team billing via cost-allocation tags and Cost Explorer, and clean access boundaries via IAM Identity Center. Retro-fitting it after you have crammed everything into one account is painful and sometimes requires re-migrating. Exit criterion: an OU/account map agreed with security and finance, accounts vended through Control Tower's account factory, and a tagging standard published (owner, environment, cost-center, and wave tags on every resource).
The security baseline is the set of org-wide controls on before the first workload arrives, so nothing migrates into ungoverned space: CloudTrail org-trail to the central log account, AWS Config with conformance packs, GuardDuty and Security Hub org-wide, encryption-by-default (KMS at rest, TLS in transit), and IAM Identity Center with no long-lived root or IAM-user keys. Encode it as guardrails (Control Tower controls + SCPs) so it cannot be silently disabled in a member account; this is also the foundation for any compliance posture (SOC 2, ISO 27001, HIPAA, PCI) — far cheaper now than retro-fitted during an audit. Exit criterion: baseline controls deployed org-wide and verified in Security Hub, with the central log account receiving CloudTrail/Config/VPC-flow logs from every member account.
Every per-wave gate in Phase 2 assumes the foundation exists: waves are defined by the dependency map, sized against the TCO, funded by the MAP record, landed into the landing zone and account structure, and governed by the security baseline. Skip straight to "lift the first server" and you inherit all seven gaps in every wave.
This is the loop, run in order for each wave then repeated: assessment, target design, infrastructure-as-code, data sync, test cutover, validation, production cutover, DNS, and the rollback plan. The discipline: do not advance until the current gate's exit criterion is met, and do not start wave N+1's cutover until wave N is validated in production. The rollback plan is listed last but written first — you do not begin a cutover you cannot reverse.
Re-confirm the 7-R disposition per application now that you know more than during global discovery — a system tagged "Rehost" often becomes "Replatform" once you see the managed-service equivalent (self-managed MySQL → RDS/Aurora) is barely more effort. Lock the runtime versions, data volume, downtime window, and success metrics. Exit criterion: a per-application disposition sheet with the cutover and rollback windows agreed with the business owner.
Design the target per workload. Rehost maps a VM to EC2 (right-sized to p95); Replatform swaps a component for a managed service — containers to ECS/EKS or App Runner, self-managed databases to RDS/Aurora, queues to SQS, caches to ElastiCache. Define networking, the data tier, autoscaling, the load-balancer and TLS, backups, and observability hooks. Exit criterion: a target architecture diagram and resource list per workload, reviewed against the AWS Well-Architected pillars.
Define the target as code — Terraform, CloudFormation, or the CDK — never by clicking in the console. IaC is what makes the test and production cutovers identical environments and lets you stand the target up, tear it down, and re-apply without drift; parameterize per-environment values so the same templates produce staging and prod. Exit criterion: the wave's target infrastructure provisioned in a non-prod account from version-controlled, peer-reviewed IaC, with a destroy/re-apply proven clean.
Choose the replication tool for what you are moving: Application Migration Service (MGN) for block-level continuous replication of whole servers (rehost); Database Migration Service (DMS) for databases, plus the Schema Conversion Tool (SCT) for heterogeneous moves (Oracle/SQL Server → Aurora PostgreSQL) where schema conversion is the genuinely hard part; DataSync or Transfer Family for bulk files. The downtime-minimizing pattern is continuous replication: bulk-copy first, then sync the delta via CDC (change data capture) so at cutover you flush only the last few seconds, not terabytes. Exit criterion: replication caught up, lag within the cutover window, and a verified data-integrity check (row counts / checksums) source-to-target.
Perform a non-disruptive test cutover into an isolated test subnet — MGN is built for exactly this, launching test instances from the replicated data without touching the source. Boot the app on AWS, point synthetic traffic at it, and run the full functional and performance suite. This is where you find the hardcoded source IP, the missing env var, the security-group rule you forgot — on a Tuesday afternoon, not in the 2 a.m. production window. Exit criterion: the app passes its full test suite on AWS, and the team has a timed, written cutover runbook produced from the test run (every command, in order, with who runs it).
Validation is the explicit gate between "it booted" and "it works." Define acceptance criteria up front: functional correctness, performance within SLO, data integrity, integration health (every dependency from the map responds), security posture (the workload inherits the baseline), and observability (alarms fire, dashboards populate). In the test cutover it is the rehearsal; right after the production cutover it is the real gate that decides go/rollback. Exit criterion: a written validation checklist with pass/fail per item and a named decision-owner who calls go-live or rollback against it.
Execute the test-cutover runbook in the agreed window. The low-downtime sequence: freeze writes on the source (read-only / maintenance mode), let replication flush the final delta, run the data-integrity check, launch the production services on AWS from the caught-up replication, run validation, then take traffic. Stateless tiers can be near-zero-downtime; stateful databases usually need a short write-freeze window. Exit criterion: production workload live on AWS, validation passed, source still intact and reachable (do not decommission yet — that is Phase 3, after a soak period).
Traffic moves when DNS moves. Lower the TTL on the relevant records 24–48 hours before cutover so the change propagates fast (a 300-second TTL means a fast rollback; a one-day TTL set the day before strands you for a day). Use Route 53 with health checks and prefer weighted records to shift traffic gradually (10% → 50% → 100%) so you can watch error rates and back out before all users are affected. Exit criterion: TTLs pre-lowered, the change executed and propagation confirmed, traffic flowing to AWS with healthy error rates, and TTLs restored once stable.
Every cutover needs a tested way back. Because the source is left intact and TTLs are low, rollback is usually "shift DNS weights back and unfreeze the source" — but only before the AWS side has accepted writes the source has not seen. Define the point of no return explicitly (the moment the AWS database accepts production writes that did not also land on the source) and the rollback procedure for both before and after it; for data-divergence cases it may require replaying the delta back via reverse replication. Exit criterion: a written, owner-assigned rollback runbook with the point-of-no-return defined and trigger conditions agreed before the cutover begins. No rollback plan, no cutover.
For each wave: assess → design → IaC → sync → test cutover → validate → cutover → DNS → (rollback ready throughout). Validate wave N in production before you start wave N+1. Low-dependency waves first; crown jewels last.
Gate 4 (data sync) is where the wrong tool choice quietly costs you a week. The rule of thumb: <strong>MGN</strong> for whole servers (rehost), <strong>DMS + SCT</strong> for databases (especially heterogeneous), <strong>DataSync/Transfer Family</strong> for files and objects, and Migration Hub as the dashboard tracking it all. The table maps "what I am moving" to "which AWS service does it" — with the honest note on where each one bites.
| What you are moving | AWS tool | Mechanism | Where it bites |
|---|---|---|---|
| Whole VMs / physical servers (rehost) | Application Migration Service (MGN) | Agent-based, block-level continuous replication + test launch | Agent install access; bandwidth for initial sync |
| Homogeneous DB (e.g. Postgres → Postgres/RDS) | DMS | Full load + CDC (change data capture) | Large LOBs; long-running transactions during cutover |
| Heterogeneous DB (Oracle/SQL Server → Aurora) | DMS + Schema Conversion Tool (SCT) | Schema convert, then full load + CDC | Schema conversion of procs/triggers — the real time sink |
| Bulk files / NAS / object data | DataSync | Parallelized, scheduled, verified transfer | Throughput tuning; permissions/ACL mapping |
| Ongoing / SFTP file feeds | Transfer Family | Managed SFTP/FTPS/FTP into S3 | Re-pointing existing partners to the new endpoint |
| VMware estate (relocate) | VMware Cloud on AWS / Outposts | Hypervisor-level relocation, minimal re-architecture | Licensing/cost model; longer-term refactor still pending |
| Whole-program tracking | Migration Hub + Discovery Service | Inventory, dependency data, wave/status dashboard | Garbage in, garbage out — depends on discovery quality |
The migration is not done when the workload is live on AWS — it is done when the source is off, the bill is optimized, and the team can run the platform without the partner. There are four items here, and two of them (right-sizing and decommissioning the source) directly decide whether the migration saves money or just relocates the spend. This is also the phase teams most often leave half-finished.
A pure rehost gets you onto AWS but captures little of the upside. After a soak period, selectively modernize the workloads that justify it: move steady-state compute to Graviton (ARM) for price/performance, containerize onto ECS/EKS where it cuts ops load, replace self-managed components with managed services, and adopt managed scaling so you stop paying for idle capacity. This is the "Modernize" half of MAP's "Migrate & Modernize" — frequently part of the funded scope. Exit criterion: a prioritized modernization backlog with the high-value items scheduled and the rest explicitly deferred.
Lift-and-shift inherits the source's over-provisioning. With real AWS utilization data, right-size via AWS Compute Optimizer and Cost Explorer recommendations, then lock in the steady-state baseline with Savings Plans or Reserved Instances (commonly 30–60% off on-demand for committed usage) and schedule non-prod to shut down nights and weekends — typically where the biggest, fastest post-migration savings live. Exit criterion: right-sizing applied (or consciously rejected), Savings Plans/RIs purchased against the validated baseline, and a monthly cost-review cadence established. See the cost-optimization cluster for the deeper playbook.
The single most expensive thing teams forget. Once the workload has soaked long enough to trust (typically 1–4 weeks past cutover, per risk), decommission the source: confirm it is truly idle (no straggler traffic in the logs), take a final archival snapshot for retention, then power off the old servers, terminate old-cloud resources, and cancel the colo/license/SaaS contracts. Until you do, you pay for source and target simultaneously — the migration costs double rather than saving anything. Exit criterion: source for each completed wave verifiably powered off / terminated, contracts cancelled, and the parallel-run cost off the bill — the gate that converts the TCO model from projection to reality.
A migration that leaves the team unable to operate the platform has not finished. Hand over the runbooks per workload, the documented IaC repos, the architecture diagrams, the observability/on-call setup, and the security/compliance posture, then run the team through an incident drill on the new platform. The CloudRoute partner's job is to make itself unnecessary — you should be able to run, scale, and troubleshoot the estate without them. Exit criterion: runbooks and IaC version-controlled in your repos, the team has independently deployed a change and handled a drill, and the MAP record is closed out.
This checklist is real work — weeks to months of it, with judgment calls that go expensively wrong when improvised. The honest question is not "what are the steps" but "who runs them, and who pays."
CloudRoute routes you to a vetted AWS Advanced or Premier tier partner matched to your source (Heroku, GCP, Azure, on-prem/VMware, Oracle, SAP) and stack. They run the entire checklist above — discovery, dependency mapping, the landing zone, every wave's cutover and rollback, the decommission, the handover.
The funding mechanism is the AWS Migration Acceleration Program (MAP). For qualifying migrations — those with a meaningful committed AWS spend after migration — AWS funds the Assess phase and credits a large share of Mobilize and Migrate. The partner is paid through MAP, which is why the migration can land at little-to-no cost to you; CloudRoute is paid a referral commission by the partner, never by you.
The honest caveat: MAP funding applies to qualifying migrations only. Smaller workloads that miss the committed-spend bar are not MAP-funded — in that case the engagement is a fixed-scope, de-risked cutover run by people who have done it before, still routed and vetted at $0 cost to you, just not AWS-funded. We tell you which bucket you are in before you commit, not after. Either way, you do not run this checklist alone from a blog post: you get someone who has run it many times, AWS funding it where it qualifies, and a rollback plan written before every cutover.
The whole checklist collapsed into a phase / tasks / owner grid. "Owner" is the role accountable for the gate; on a CloudRoute-routed engagement the partner runs the execution while your team owns the business-side sign-offs (windows, acceptance, decommission approval).
| Phase | Key tasks (the checkboxes) | Owner |
|---|---|---|
| Pre — Discovery | Inventory all servers/DBs/storage; capture utilization + 2–4 wks network data; tag each asset with a 7-R disposition | Partner (you: confirm scope) |
| Pre — Dependency map | Build dependency graph; define move-groups/waves; eliminate cross-wave unknowns | Partner + app owners |
| Pre — TCO | Model current vs AWS run-rate; size to p95; include parallel-run + egress; layer Savings Plans/Graviton/license savings | Partner + finance |
| Pre — MAP application | File MAP record; open Assess phase; size workload + projected post-migration spend | Partner (APN-filed) |
| Pre — Landing zone | Control Tower; core accounts; VPC/AZ network + Transit GW; hybrid link to source | Partner (you: security review) |
| Pre — Account structure | OU/account map per env + workload; SCPs; cost-allocation tags; IAM Identity Center | Partner + security + finance |
| Pre — Security baseline | CloudTrail/Config/GuardDuty/Security Hub org-wide; KMS encryption-by-default; guardrails as code | Partner (you: compliance sign-off) |
| Wave — Assess | Confirm 7-R per app; lock versions, data volume, downtime window, success metrics | Partner + business owner |
| Wave — Target design | Per-workload target architecture; Well-Architected review; networking/data/scaling/observability | Partner |
| Wave — IaC | Terraform/CloudFormation/CDK; provision in non-prod; peer review; prove destroy/re-apply | Partner |
| Wave — Data sync | MGN / DMS+SCT / DataSync; continuous replication + CDC; verify integrity; measure lag | Partner |
| Wave — Test cutover | Isolated test launch; full test suite on AWS; produce timed cutover runbook | Partner (you: UAT) |
| Wave — Validation | Acceptance criteria: functional, SLO, data integrity, integrations, security, observability | You (decision-owner) + partner |
| Wave — Cutover | Freeze writes; flush delta; integrity check; launch on AWS; validate; take traffic | Partner + you (go/no-go) |
| Wave — DNS | Pre-lower TTLs; Route 53 weighted shift 10→50→100%; confirm propagation + error rates | Partner |
| Wave — Rollback plan | Written before cutover; define point-of-no-return + triggers; reverse-replication path if needed | Partner + you (sign-off) |
| Post — Optimization | Graviton; containerize where it pays; managed services; modernization backlog | Partner (you: prioritize) |
| Post — Right-sizing | Compute Optimizer; downsize; Savings Plans/RIs on baseline; schedule non-prod off-hours | Partner + finance |
| Post — Decommission source | Soak; confirm idle; final archival snapshot; power off source; cancel colo/license/contracts | You (approve) + partner |
| Post — Knowledge transfer | Hand over runbooks + IaC; incident drill; close MAP record | Partner → you |
Situation: Colo lease renewal looming, hardware overdue for refresh, and a board mandate to "be on AWS before the next funding round." No internal migration experience, an undocumented dependency web between the order-management monolith and a dozen internal services, and an Oracle reporting database nobody wanted to keep paying license for. Biggest fear: a botched cutover taking down the order pipeline.
What CloudRoute did: Routed within 22 hours to a US-East Premier partner with on-prem + heterogeneous-DB track record. Partner opened a MAP Assess phase, ran Application Discovery Service for 3 weeks, and built the dependency map that surfaced two integrations the team did not know existed. Foundation (Control Tower landing zone, 4-account structure, security baseline) built in weeks 3–5. Four waves: (1) stateless internal tools — rehost via MGN; (2) the PostgreSQL estate — replatform to Aurora via DMS; (3) the order-management monolith + its mapped dependencies — rehost with a weighted Route 53 cutover and tested rollback; (4) the Oracle reporting DB — heterogeneous move to Aurora PostgreSQL via SCT + DMS (the schema conversion was the longest single gate). Every wave: test cutover, validation gate, rollback written first.
Outcome: All four waves live across ~5 months, zero unplanned downtime on the order pipeline (wave 3 cut over inside a 12-minute write-freeze window). Colo decommissioned and the Oracle license cancelled post-migration, removing the parallel-run double-spend and the licence fee. MAP-funded: AWS credited the bulk of the migration cost against the committed post-migration spend, and the partner was paid through MAP. Steady-state AWS spend right-sized ~38% below the initial lift via Compute Optimizer + a 1-year Savings Plan. Customer paid $0 to CloudRoute.
waves: 4 · timeline: ~5 months · unplanned downtime: 0 · Oracle license: eliminated · MAP-funded · cost to customer (CloudRoute): $0
CloudRoute matches you to an AWS Advanced/Premier partner who runs the discovery, the landing zone, every wave's cutover and rollback, and the decommission. For qualifying migrations, AWS MAP funds it. You pay $0 to CloudRoute.