GitOps means Git is the single source of truth for what runs in your cluster, and an in-cluster agent continuously reconciles reality to match. On AWS that almost always means Argo CD or Flux running on Amazon EKS, with app-of-apps, sealed or external secrets, progressive delivery, and drift detection. This page walks the model, the real tool decision, the patterns that matter at scale — and how a vetted AWS partner stands it up for you, often AWS-funded if you qualify for credits.
GitOps is not "we keep our YAML in a repo." Plenty of teams do that and still deploy by running commands from a laptop or a CI runner. GitOps is a stricter operating model with four properties, and the properties are what deliver the benefits — auditability, recoverability, and a deploy process that converges instead of drifting.
The canonical definition (from the OpenGITOPS / CNCF working group) has four principles. Declarative: the entire desired state of the system is expressed declaratively — Kubernetes manifests, Helm values, Kustomize overlays — not as a sequence of imperative steps. Versioned and immutable: that desired state is stored in Git, so every change is a commit with an author, a timestamp, a diff, and a review; you can roll back to any previous state by reverting. Pulled automatically: software agents running inside the cluster pull the desired state from Git, rather than an external system pushing changes in. Continuously reconciled: those agents observe the live system, compare it to the desired state, and act to converge the two — forever, on a loop, not just at deploy time.
The fourth property is the one people miss, and it is the one that matters most. In a push model, a CI pipeline runs kubectl apply (or helm upgrade) once, at the moment of deploy, and then walks away. If something changes the cluster afterwards — a panicked manual edit during an incident, a half-finished kubectl scale, an operator that mutated a resource — nothing pulls it back. The cluster has drifted from what anyone thinks is deployed. In a pull model, the reconciler is always watching: it notices the drift and either reports it or actively reverts it, depending on how you configure sync policy. Git is not just where you store config; it is the thing the cluster is continuously trying to become.
The practical consequences are concrete. Your audit trail is git log — who changed what, when, and why, with a PR review attached. Disaster recovery becomes "point a fresh cluster at the same repo and let it reconcile" rather than a runbook of manual steps. Access control tightens: humans rarely need direct kubectl write access to production at all, because the path to change production runs through a pull request, not a kubeconfig. And the difference between "what we think is running" and "what is actually running" shrinks toward zero, because the system is engineered to keep them equal.
The honest framing for this page: GitOps is genuinely excellent for declarative, Kubernetes-shaped workloads, and it is the default operating model for serious EKS platforms in 2026. It is not free. It adds a control plane to run and a repo structure to design, and it is a poor fit for things that are not declarative or not Kubernetes (more on that below). The tooling — Argo CD, Flux — is mature and commoditized. What is not commoditized is the design: the repository topology, the secrets strategy, the promotion flow across environments, and the progressive-delivery setup. That design is the work, and it is exactly what a good AWS partner does for you in a couple of weeks.
Both Argo CD and Flux are CNCF graduated projects. Both implement real pull-based GitOps on Kubernetes. Both run perfectly well on Amazon EKS. Choosing between them is the first architectural decision, because it shapes your repo structure, your team's daily workflow, and how you do progressive delivery. The good news: there is no wrong answer here, only a better-fit answer for your team.
Argo CD is an application-centric, UI-first reconciler. You model your system as a set of Application resources, each pointing at a path in a Git repo and a destination cluster/namespace. Its web UI is its signature feature — a live, visual diff of desired-vs-live state, a topology view of every resource an app owns, sync status at a glance, and one-click manual sync or rollback. That console is genuinely useful for developers who are not Kubernetes experts, and it is a major reason Argo CD tends to win at product teams. Argo CD also brings the broader Argo ecosystem: Argo Rollouts for canary and blue-green, ApplicationSets for templating many apps across many clusters, and the app-of-apps pattern for managing the platform itself declaratively.
Flux is a leaner, more composable, GitOps-purist toolkit. Rather than one big application object plus a UI, Flux is a set of focused controllers — source-controller (fetches Git/Helm/OCI artifacts), kustomize-controller (applies Kustomize), helm-controller (manages Helm releases), notification-controller (alerts and webhooks), image-automation controllers (auto-bump image tags from a registry). It has no first-party UI by default (you observe it via flux CLI, your own dashboards, or a third-party UI like Weave GitOps / Capacitor). That minimalism is the point: fewer moving parts, a small attack surface, clean multi-tenancy, and image automation built in. Flux tends to win at platform teams running many clusters who want everything driven by controllers and CI, with no human clicking a sync button.
For progressive delivery the pairing differs. Argo CD's natural companion is Argo Rollouts (canary/blue-green via a drop-in Rollout workload). Flux's natural companion is Flagger (canary/blue-green/A-B driven by metrics, working with your ingress or service mesh). Both are excellent; you generally pick the one that matches your reconciler so you stay in one ecosystem.
| Dimension | Argo CD | Flux |
|---|---|---|
| CNCF status | Graduated | Graduated |
| Built-in web UI | Yes — strong, live diff + topology | No first-party UI (CLI + 3rd-party dashboards) |
| Mental model | Application objects + UI | Composable controllers (the Flux Toolkit) |
| Multi-app / fleet templating | ApplicationSets + app-of-apps | Kustomization tree + per-tenant repos |
| Image auto-update | Via Argo Image Updater (add-on) | Built in (image-automation controllers) |
| Progressive delivery | Argo Rollouts (canary / blue-green) | Flagger (canary / blue-green / A-B) |
| Best fit | Product teams wanting a console + visibility | Platform teams wanting minimal, fully-automated GitOps |
| Multi-tenancy | Projects + RBAC, UI-scoped | Namespaced reconcilers, repo-per-tenant |
The single biggest determinant of whether GitOps stays sane as you grow is the repository topology — how you split application code from deployment manifests, how you separate environments, and how you bootstrap the platform itself. Get this wrong and every new service or environment becomes a copy-paste sprawl; get it right and onboarding a new app is a small, reviewable diff.
Start with the cardinal rule: separate application source from deployment config. Your app code (the Dockerfile, the service) lives in its own repo and is built by CI into an immutable image pushed to Amazon ECR. A separate config repo (or a clearly separate area) holds the Kubernetes manifests that the GitOps controller reconciles. CI builds and pushes the image, then opens a small commit/PR to the config repo bumping the image tag; the GitOps controller sees that commit and rolls it out. This split is what keeps "build" and "deploy" cleanly decoupled and keeps the controller watching one source of truth.
For environment separation, the two durable patterns are a directory-per-environment layout (Kustomize bases + per-env overlays for dev/staging/prod, often in branches or folders) and a cluster-per-environment layout (a separate EKS cluster per environment, each tracked by the reconciler). Kustomize overlays are the workhorse for keeping a single base while patching replica counts, resource limits, and config per environment, so you are not duplicating manifests. Promotion between environments is then just a change to the higher overlay — typically a PR that bumps the staging image tag to the one already validated in dev, then prod to the one validated in staging.
App-of-apps is how you manage the GitOps platform with GitOps instead of bootstrapping it by hand. You define one root Argo CD Application — the "app of apps" — that points at a directory of child Application manifests. Each child manages a real workload or platform component (ingress controller, cert-manager, external-secrets, your services). Adding a new app to the platform becomes a single file in that directory and a PR; Argo CD reconciles the root, sees the new child, and starts managing it.
The payoff is that your entire platform — every controller, every add-on, every team's apps — is itself declarative and version-controlled. A fresh cluster bootstraps by installing Argo CD and pointing it at the root app; everything else cascades. At fleet scale, ApplicationSets generalize this: one generator templates many Applications across many clusters or namespaces (e.g. "deploy this stack to every cluster tagged prod"), which is how you avoid hand-writing an Application per cluster.
Flux expresses the same idea through a tree of Kustomization resources. A bootstrap Kustomization points at an infra path that defines further Kustomizations (with dependsOn ordering so cert-manager comes up before things that need certificates, for example), which in turn point at per-app or per-tenant paths. The result is the same as app-of-apps: the platform manages itself, ordering is explicit, and onboarding is a small reviewable change. Multi-tenancy is typically modeled as a repo (or path) per tenant, each reconciled into its own namespace with scoped RBAC, so teams can self-serve without stepping on each other.
The most common GitOps failure mode is not a tooling problem — it is a repo-structure problem. Teams put app code and deploy config in the same repo, trigger a reconcile on every app commit, and create a feedback loop (CI bumps the tag → reconciler deploys → image automation bumps again). Keep app source and deploy config separate, let CI open a tag-bump PR into the config repo, and the loop disappears. Designing this once, correctly, is most of what a GitOps implementation engagement actually buys you.
GitOps says "everything in Git." Secrets say "never put me in Git in plaintext." Reconciling those two is the single trickiest part of a GitOps setup, and getting it wrong is how credentials end up in a repo's history forever. There are two mainstream answers on AWS, and they solve the problem from opposite directions.
You cannot commit a plaintext Kubernetes Secret to Git — base64 is encoding, not encryption, so anyone with repo access reads it. The two durable patterns either (a) encrypt the secret so the ciphertext can safely live in Git, or (b) keep the secret out of Git entirely and have the cluster fetch it at runtime from a real secrets store.
Bitnami Sealed Secrets runs a controller in-cluster that holds a private key. You encrypt a secret locally with the matching public key (via the kubeseal CLI) into a SealedSecret custom resource, and that ciphertext is what you commit to Git. The controller is the only thing that can decrypt it, and it produces the real Secret in the cluster at reconcile time. It is simple, self-contained, and keeps the GitOps purity intact (the encrypted secret really does live in Git). The trade-off: you must back up and protect the controller's private key (lose it and you cannot decrypt; leak it and everything is exposed), rotation is more manual, and it scales less gracefully when many secrets are managed centrally.
The External Secrets Operator (ESO) is the more common choice on AWS at scale. You commit an ExternalSecret manifest that references a secret living in AWS Secrets Manager or AWS Systems Manager Parameter Store — no ciphertext in Git at all, just a pointer. ESO authenticates to AWS (ideally via IAM Roles for Service Accounts — IRSA — or EKS Pod Identity, so no static AWS keys), pulls the real value, and materializes a Kubernetes Secret in the cluster. The secret of record lives in a purpose-built store with its own audit log, rotation, and access policy; Git only ever holds a reference. This is usually the right default for teams already standardized on AWS Secrets Manager, and it composes cleanly with the rest of your IAM story.
A practical rule of thumb: choose External Secrets + AWS Secrets Manager when you want a single source of truth for secrets with native rotation and IAM-scoped access (most AWS-centric teams), and choose Sealed Secrets when you want a zero-dependency, fully-in-Git approach for a small footprint and are comfortable owning the key lifecycle. Either way, the non-negotiable is the same: no plaintext secret, and no long-lived AWS access key, ever touches the repo.
A plain GitOps sync replaces pods using Kubernetes' default rolling update — fine for low-stakes services, risky for anything where a bad version hurts. Progressive delivery layers a controlled, observable, automatically-abortable rollout on top of GitOps, so a new version is exposed to a slice of traffic, watched against real metrics, and rolled back automatically if it misbehaves — all still driven from Git.
With Argo CD the standard tool is Argo Rollouts. You replace a Deployment with a Rollout resource that declares a strategy — canary (shift 10% → 25% → 50% → 100% with pauses) or blue-green (stand up the new version alongside the old, then flip traffic). Between steps, Rollouts runs analysis: it queries a metrics provider (Amazon Managed Prometheus, CloudWatch, Datadog) for your success criteria — error rate, p95 latency, a custom business metric — and only proceeds if the new version is healthy. If the analysis fails, it automatically aborts and rolls back to the previous version. Because the Rollout spec lives in Git, the entire rollout policy is versioned and reviewable like everything else.
With Flux the equivalent is Flagger, which drives the same canary / blue-green / A-B patterns by progressively shifting traffic (via your ingress controller or a service mesh) and gating each step on metrics. The mechanics differ but the principle is identical: never flip 100% of traffic to an unproven version, and let objective metrics — not a human watching a dashboard at 2am — decide whether to advance or abort.
On AWS the traffic-shifting layer is usually an ingress (the AWS Load Balancer Controller fronting an ALB), or a service mesh (Istio, Linkerd, or AWS App Mesh) when you need finer-grained routing. Either way, progressive delivery is what turns "we deploy via GitOps" into "we deploy via GitOps and a bad release self-heals before customers notice." It is also one of the highest-leverage things a partner sets up, because wiring analysis to the right metrics — and choosing thresholds that catch real regressions without flapping — takes experience.
Two questions come up the moment GitOps works for one app on one cluster: how do I promote across environments and clusters cleanly, and can I manage the AWS infrastructure under the cluster — VPCs, RDS, IAM — the same GitOps way? Both have good answers, with one important honest caveat about where GitOps stops being the right tool.
Multi-cluster / multi-env. A single GitOps control plane can reconcile many clusters: Argo CD registers multiple destination clusters and uses ApplicationSets to fan a stack out across them; Flux runs a reconciler per cluster, each pointed at the right path of a shared repo. Environment promotion is modeled as a Git change — promote by bumping the image tag (or the chart version) in the next environment's overlay, gated by a pull-request approval. The clean version of this is a strict ladder: an image is built once, validated in dev, promoted (same digest) to staging, validated, then promoted to prod. Because promotion is a PR, you get review, audit, and an instant rollback (revert the commit) at every hop — and you are always shipping the same artifact, not rebuilding per environment.
GitOps for infrastructure. You can extend the model below the cluster in two ways. The first is to keep using your IaC tool — Terraform (now BSL-licensed) or OpenTofu (the open fork) or AWS CDK — and drive it from a Git-triggered controller so infra changes flow through pull requests like everything else (tools such as the Terraform/Tofu Controller for Flux, or Atlantis, give you plan-on-PR / apply-on-merge). The second is Crossplane, which lets you provision and reconcile AWS resources (an RDS instance, an S3 bucket, an SQS queue, IAM roles) as Kubernetes custom resources — so the same Argo CD or Flux reconciler that manages your apps also manages your cloud infrastructure, continuously correcting drift the same way it does for workloads. Crossplane is powerful for platform teams building a true internal platform; Terraform/OpenTofu remains the pragmatic default for most teams who already have state and modules.
The honest caveat: GitOps reconciliation shines for declarative, idempotent resources. Some infrastructure operations are genuinely stateful and dangerous to "continuously reconcile" — a database engine-version upgrade, a destructive migration, anything with a real-world side effect that should happen exactly once with a human in the loop. For those, the right pattern is plan-and-approve (Terraform/OpenTofu with a gated apply), not an always-on controller that might re-apply something irreversible. A mature setup uses GitOps reconciliation for the 90% that is safely declarative and gated IaC pipelines for the stateful 10%. Knowing which is which is exactly the kind of judgment a vetted partner brings.
The whole promise of GitOps is that the live system matches Git. That promise is only worth something if you can <em>see</em> when it does not — and get told fast when something has drifted, failed to sync, or been changed out-of-band. Observability is not an afterthought here; it is how you trust the model.
Sync and drift visibility. Both reconcilers expose health and sync state. Argo CD's UI and API show, per application, whether it is Synced or OutOfSync (drifted) and Healthy or Degraded, with a live diff of exactly which fields differ between Git and the cluster. Flux exposes the same through flux get / Kustomization status conditions and Prometheus metrics. You wire these into alerting (Argo CD notifications, Flux's notification-controller) so an OutOfSync or failed reconcile pings Slack or PagerDuty rather than sitting unnoticed — including the case where someone hand-edited production and the reconciler is reporting (or reverting) the drift.
Sync policy is a real decision. Manual sync (a human approves each apply) gives maximum control and is common for production early on; automated sync converges without intervention. With automated sync you also choose self-heal (the reconciler actively reverts any out-of-band change back to Git) and prune (resources removed from Git are deleted from the cluster). Self-heal is the strongest expression of "Git is the source of truth" — it makes manual cluster edits literally not stick — but you turn it on deliberately, per environment, once you trust the pipeline.
Workload observability still applies. Underneath GitOps you run the normal AWS-native stack — Amazon Managed Service for Prometheus + Amazon Managed Grafana (or CloudWatch Container Insights, or Datadog) for metrics, dashboards, and the SLO signals that progressive-delivery analysis depends on. GitOps tells you whether the cluster matches Git; your observability stack tells you whether what is running is actually healthy. You need both, and they reinforce each other: the same Prometheus metrics that power your dashboards are what gate your canary rollouts.
GitOps is excellent, not universal. The most useful thing this page can do is tell you honestly when it pays for itself and when it is ceremony you do not need yet — and then how to get it implemented without hiring a platform team.
GitOps is clearly worth it when you are already on Kubernetes (EKS) and running more than a couple of services; when more than one or two people deploy and you need an audit trail and review on every change; when you run multiple environments or clusters and want clean, reviewable promotion; when compliance (SOC 2, ISO 27001) wants change-management evidence and least-privilege access to production; and when you want fast, safe rollback by reverting a commit. For a team in that shape, GitOps is close to a no-brainer — it is the default operating model for serious EKS platforms in 2026.
GitOps is probably overkill when you are not on Kubernetes at all (if you are on ECS Fargate, Lambda, or App Runner, a good CI/CD pipeline is the right tool — see our CI/CD-on-AWS reference — not a Kubernetes reconciler); when you are a single developer or a tiny team shipping one small service where the control plane is more overhead than the audit trail is worth; or when your workloads are mostly imperative or stateful in ways that fight continuous reconciliation. GitOps adds a control plane to operate; if the benefits above do not apply to you yet, that overhead is not free, and it is fine to wait.
Here is the honest CloudRoute tie-in. GitOps is one of those things that is straightforward in a demo and fiddly in production — the repo topology, the secrets strategy, IRSA/Pod Identity, progressive-delivery thresholds, self-heal and prune policy per environment, multi-cluster promotion. CloudRoute does not implement it directly. We route you to a vetted AWS partner who stands up the whole GitOps control plane on EKS — Argo CD or Flux, the app-of-apps or Kustomization tree, External Secrets or Sealed Secrets, Argo Rollouts or Flagger, drift alerting — and hands it to your team running and documented. For credit-eligible companies the engagement is frequently AWS-funded (the partner is paid through AWS partner programs and your AWS spend is credit-covered), so the customer pays $0 or low cost. For everyone else it is a vetted-partner referral that skips the hiring-and-vetting slog — you get a senior platform engineer's output without a senior platform engineer's headcount.
If you qualify for AWS credits (typically institutionally-funded startups), the GitOps implementation is often substantially AWS-funded — the partner is paid through AWS partner programs and your AWS spend runs on credits, so your out-of-pocket can be $0. If you are not credit-eligible, it is a straight vetted-partner referral: a fixed-scope GitOps build by an engineer CloudRoute has already vetted, with no recruiting cycle. We will tell you which bucket you are in before any work starts. See $100K AWS credits and the startup path.
Before the Argo-vs-Flux question comes a more fundamental one: do you even want GitOps (pull-based reconciliation), or is a classic push-based pipeline the better fit? This is the decision that should come first, because GitOps is a Kubernetes-shaped answer — if you are not on Kubernetes, the honest recommendation is often a push-based pipeline instead.
| Variable | Push-based CD (classic pipeline) | Pull-based GitOps (Argo CD / Flux) |
|---|---|---|
| Who applies changes | CI runner pushes (kubectl/helm apply) into the cluster | In-cluster agent pulls from Git and reconciles |
| Credentials direction | CI holds cluster/cloud creds (outside-in) | Agent runs inside cluster; no external write creds (inside-out) |
| Drift handling | None after deploy — cluster can silently drift | Continuously detected; optionally self-healed back to Git |
| Source of truth | The pipeline run / whatever was last applied | Git, always — the cluster converges to it |
| Rollback | Re-run pipeline / re-deploy previous artifact | Revert the commit; reconciler converges back |
| Best for | ECS / Lambda / App Runner, non-Kubernetes, simple setups | Kubernetes (EKS), multi-env/cluster, compliance, fleets |
| Audit trail | Pipeline logs (CI system) | git log + PR review on every change |
| Main cost | Drift + credential sprawl as you scale | A control plane to run + repo topology to design |
Situation: Growing fast on EKS but deploying with helm upgrade run from a CI job and the occasional manual kubectl during incidents. No real audit trail, frequent config drift (staging and prod had quietly diverged), and a SOC 2 auditor asking for change-management evidence and least-privilege access to production. Their one infra-literate engineer was ~70% on product and could not own a GitOps build on top of that.
What CloudRoute did: Routed within 24 hours to an EU-Central partner with EKS + GitOps track record. Partner stood up Argo CD with an app-of-apps topology, split the single repo into app-source vs config repos, moved secrets to External Secrets backed by AWS Secrets Manager via IRSA (zero static keys), added Argo Rollouts canaries gated on Amazon Managed Prometheus metrics, and wired OutOfSync/sync-failure alerts into Slack. Dev→staging→prod promotion became a reviewed PR bumping the same image digest; prod got automated sync with self-heal on.
Outcome: Live in ~3 weeks. Drift went to zero (self-heal reverts out-of-band edits); every prod change is now a PR with review and an instant revert-to-rollback path; direct human kubectl write to prod was removed, which closed the auditor's least-privilege and change-management findings. Because the company was credit-eligible, the engagement was AWS-funded and the customer paid $0 — CloudRoute's commission was paid by the partner from AWS engagement funding.
engagement window: ~3 weeks · reconciler: Argo CD + Rollouts · secrets: External Secrets + Secrets Manager (IRSA) · drift: self-healed · cost to customer: $0 (credit-eligible)
CloudRoute routes you to a vetted AWS partner who stands up the control plane, repo topology, secrets, progressive delivery, and drift alerting — then hands it over documented. Often AWS-funded for credit-eligible companies, so the customer pays $0.