gitops on aws · 2026 reference + build

GitOps on AWS — the model, Argo CD vs Flux, and who implements it.

GitOps means Git is the single source of truth for what runs in your cluster, and an in-cluster agent continuously reconciles reality to match. On AWS that almost always means Argo CD or Flux running on Amazon EKS, with app-of-apps, sealed or external secrets, progressive delivery, and drift detection. This page walks the model, the real tool decision, the patterns that matter at scale — and how a vetted AWS partner stands it up for you, often AWS-funded if you qualify for credits.

source of truth
Git
reconcile model
pull-based
typical setup
2–4 wks
cost if credit-eligible
$0
TL;DR
  • GitOps is a specific operating model: the desired state of your system lives in Git as declarative manifests, and an in-cluster controller (Argo CD or Flux) continuously pulls that state and reconciles the live cluster to match. Nobody runs `kubectl apply` by hand against production; the cluster converges to what is in Git, and drift is detected and corrected automatically.
  • On AWS the substrate is almost always Amazon EKS. The main decision is Argo CD vs Flux — Argo CD gives you a strong web UI, a powerful app-of-apps pattern, and Argo Rollouts for canary/blue-green; Flux is leaner, more GitOps-purist, composes cleanly with the Flux Toolkit controllers, and is a graduated CNCF project well suited to fully-automated multi-tenant fleets. Most teams that want a console pick Argo CD; platform teams optimizing for minimal moving parts pick Flux.
  • You can implement this yourself, or CloudRoute can route you to a vetted AWS partner who stands up the GitOps control plane, the repo structure, secrets handling, progressive delivery, and multi-cluster promotion for you. For credit-eligible companies the engagement is often AWS-funded, so the customer pays $0; otherwise it is a vetted-partner referral that skips the hiring and vetting slog.
the model

IWhat GitOps actually is — Git as source of truth, pull-based reconciliation

GitOps is not "we keep our YAML in a repo." Plenty of teams do that and still deploy by running commands from a laptop or a CI runner. GitOps is a stricter operating model with four properties, and the properties are what deliver the benefits — auditability, recoverability, and a deploy process that converges instead of drifting.

The canonical definition (from the OpenGITOPS / CNCF working group) has four principles. Declarative: the entire desired state of the system is expressed declaratively — Kubernetes manifests, Helm values, Kustomize overlays — not as a sequence of imperative steps. Versioned and immutable: that desired state is stored in Git, so every change is a commit with an author, a timestamp, a diff, and a review; you can roll back to any previous state by reverting. Pulled automatically: software agents running inside the cluster pull the desired state from Git, rather than an external system pushing changes in. Continuously reconciled: those agents observe the live system, compare it to the desired state, and act to converge the two — forever, on a loop, not just at deploy time.

The fourth property is the one people miss, and it is the one that matters most. In a push model, a CI pipeline runs kubectl apply (or helm upgrade) once, at the moment of deploy, and then walks away. If something changes the cluster afterwards — a panicked manual edit during an incident, a half-finished kubectl scale, an operator that mutated a resource — nothing pulls it back. The cluster has drifted from what anyone thinks is deployed. In a pull model, the reconciler is always watching: it notices the drift and either reports it or actively reverts it, depending on how you configure sync policy. Git is not just where you store config; it is the thing the cluster is continuously trying to become.

The practical consequences are concrete. Your audit trail is git log — who changed what, when, and why, with a PR review attached. Disaster recovery becomes "point a fresh cluster at the same repo and let it reconcile" rather than a runbook of manual steps. Access control tightens: humans rarely need direct kubectl write access to production at all, because the path to change production runs through a pull request, not a kubeconfig. And the difference between "what we think is running" and "what is actually running" shrinks toward zero, because the system is engineered to keep them equal.

The honest framing for this page: GitOps is genuinely excellent for declarative, Kubernetes-shaped workloads, and it is the default operating model for serious EKS platforms in 2026. It is not free. It adds a control plane to run and a repo structure to design, and it is a poor fit for things that are not declarative or not Kubernetes (more on that below). The tooling — Argo CD, Flux — is mature and commoditized. What is not commoditized is the design: the repository topology, the secrets strategy, the promotion flow across environments, and the progressive-delivery setup. That design is the work, and it is exactly what a good AWS partner does for you in a couple of weeks.

the core decision

IIArgo CD vs Flux on EKS — the decision that anchors everything

Both Argo CD and Flux are CNCF graduated projects. Both implement real pull-based GitOps on Kubernetes. Both run perfectly well on Amazon EKS. Choosing between them is the first architectural decision, because it shapes your repo structure, your team's daily workflow, and how you do progressive delivery. The good news: there is no wrong answer here, only a better-fit answer for your team.

Argo CD is an application-centric, UI-first reconciler. You model your system as a set of Application resources, each pointing at a path in a Git repo and a destination cluster/namespace. Its web UI is its signature feature — a live, visual diff of desired-vs-live state, a topology view of every resource an app owns, sync status at a glance, and one-click manual sync or rollback. That console is genuinely useful for developers who are not Kubernetes experts, and it is a major reason Argo CD tends to win at product teams. Argo CD also brings the broader Argo ecosystem: Argo Rollouts for canary and blue-green, ApplicationSets for templating many apps across many clusters, and the app-of-apps pattern for managing the platform itself declaratively.

Flux is a leaner, more composable, GitOps-purist toolkit. Rather than one big application object plus a UI, Flux is a set of focused controllers — source-controller (fetches Git/Helm/OCI artifacts), kustomize-controller (applies Kustomize), helm-controller (manages Helm releases), notification-controller (alerts and webhooks), image-automation controllers (auto-bump image tags from a registry). It has no first-party UI by default (you observe it via flux CLI, your own dashboards, or a third-party UI like Weave GitOps / Capacitor). That minimalism is the point: fewer moving parts, a small attack surface, clean multi-tenancy, and image automation built in. Flux tends to win at platform teams running many clusters who want everything driven by controllers and CI, with no human clicking a sync button.

For progressive delivery the pairing differs. Argo CD's natural companion is Argo Rollouts (canary/blue-green via a drop-in Rollout workload). Flux's natural companion is Flagger (canary/blue-green/A-B driven by metrics, working with your ingress or service mesh). Both are excellent; you generally pick the one that matches your reconciler so you stay in one ecosystem.

Argo CD vs Flux on Amazon EKS · 2026 practitioner view
DimensionArgo CDFlux
CNCF statusGraduatedGraduated
Built-in web UIYes — strong, live diff + topologyNo first-party UI (CLI + 3rd-party dashboards)
Mental modelApplication objects + UIComposable controllers (the Flux Toolkit)
Multi-app / fleet templatingApplicationSets + app-of-appsKustomization tree + per-tenant repos
Image auto-updateVia Argo Image Updater (add-on)Built in (image-automation controllers)
Progressive deliveryArgo Rollouts (canary / blue-green)Flagger (canary / blue-green / A-B)
Best fitProduct teams wanting a console + visibilityPlatform teams wanting minimal, fully-automated GitOps
Multi-tenancyProjects + RBAC, UI-scopedNamespaced reconcilers, repo-per-tenant
Neither is "better." Argo CD optimizes for human visibility and developer self-service; Flux optimizes for composability and hands-off automation. A common hybrid is Argo CD for app teams plus Flux for low-level platform add-ons — but for most startups, pick one and standardize. CloudRoute partners most often deploy Argo CD when a team wants a console, Flux when a platform team wants the leanest possible control plane.
repo topology

IIIApp-of-apps and the repository structure that scales

The single biggest determinant of whether GitOps stays sane as you grow is the repository topology — how you split application code from deployment manifests, how you separate environments, and how you bootstrap the platform itself. Get this wrong and every new service or environment becomes a copy-paste sprawl; get it right and onboarding a new app is a small, reviewable diff.

Start with the cardinal rule: separate application source from deployment config. Your app code (the Dockerfile, the service) lives in its own repo and is built by CI into an immutable image pushed to Amazon ECR. A separate config repo (or a clearly separate area) holds the Kubernetes manifests that the GitOps controller reconciles. CI builds and pushes the image, then opens a small commit/PR to the config repo bumping the image tag; the GitOps controller sees that commit and rolls it out. This split is what keeps "build" and "deploy" cleanly decoupled and keeps the controller watching one source of truth.

For environment separation, the two durable patterns are a directory-per-environment layout (Kustomize bases + per-env overlays for dev/staging/prod, often in branches or folders) and a cluster-per-environment layout (a separate EKS cluster per environment, each tracked by the reconciler). Kustomize overlays are the workhorse for keeping a single base while patching replica counts, resource limits, and config per environment, so you are not duplicating manifests. Promotion between environments is then just a change to the higher overlay — typically a PR that bumps the staging image tag to the one already validated in dev, then prod to the one validated in staging.

The app-of-apps pattern (Argo CD)

App-of-apps is how you manage the GitOps platform with GitOps instead of bootstrapping it by hand. You define one root Argo CD Application — the "app of apps" — that points at a directory of child Application manifests. Each child manages a real workload or platform component (ingress controller, cert-manager, external-secrets, your services). Adding a new app to the platform becomes a single file in that directory and a PR; Argo CD reconciles the root, sees the new child, and starts managing it.

The payoff is that your entire platform — every controller, every add-on, every team's apps — is itself declarative and version-controlled. A fresh cluster bootstraps by installing Argo CD and pointing it at the root app; everything else cascades. At fleet scale, ApplicationSets generalize this: one generator templates many Applications across many clusters or namespaces (e.g. "deploy this stack to every cluster tagged prod"), which is how you avoid hand-writing an Application per cluster.

The Flux equivalent — a Kustomization tree

Flux expresses the same idea through a tree of Kustomization resources. A bootstrap Kustomization points at an infra path that defines further Kustomizations (with dependsOn ordering so cert-manager comes up before things that need certificates, for example), which in turn point at per-app or per-tenant paths. The result is the same as app-of-apps: the platform manages itself, ordering is explicit, and onboarding is a small reviewable change. Multi-tenancy is typically modeled as a repo (or path) per tenant, each reconciled into its own namespace with scoped RBAC, so teams can self-serve without stepping on each other.

the topology trap

The most common GitOps failure mode is not a tooling problem — it is a repo-structure problem. Teams put app code and deploy config in the same repo, trigger a reconcile on every app commit, and create a feedback loop (CI bumps the tag → reconciler deploys → image automation bumps again). Keep app source and deploy config separate, let CI open a tag-bump PR into the config repo, and the loop disappears. Designing this once, correctly, is most of what a GitOps implementation engagement actually buys you.

the hard part

IVSecrets in GitOps — Sealed Secrets vs External Secrets

GitOps says "everything in Git." Secrets say "never put me in Git in plaintext." Reconciling those two is the single trickiest part of a GitOps setup, and getting it wrong is how credentials end up in a repo's history forever. There are two mainstream answers on AWS, and they solve the problem from opposite directions.

You cannot commit a plaintext Kubernetes Secret to Git — base64 is encoding, not encryption, so anyone with repo access reads it. The two durable patterns either (a) encrypt the secret so the ciphertext can safely live in Git, or (b) keep the secret out of Git entirely and have the cluster fetch it at runtime from a real secrets store.

Sealed Secrets — encrypt into the repo

Bitnami Sealed Secrets runs a controller in-cluster that holds a private key. You encrypt a secret locally with the matching public key (via the kubeseal CLI) into a SealedSecret custom resource, and that ciphertext is what you commit to Git. The controller is the only thing that can decrypt it, and it produces the real Secret in the cluster at reconcile time. It is simple, self-contained, and keeps the GitOps purity intact (the encrypted secret really does live in Git). The trade-off: you must back up and protect the controller's private key (lose it and you cannot decrypt; leak it and everything is exposed), rotation is more manual, and it scales less gracefully when many secrets are managed centrally.

External Secrets Operator — reference a real store

The External Secrets Operator (ESO) is the more common choice on AWS at scale. You commit an ExternalSecret manifest that references a secret living in AWS Secrets Manager or AWS Systems Manager Parameter Store — no ciphertext in Git at all, just a pointer. ESO authenticates to AWS (ideally via IAM Roles for Service Accounts — IRSA — or EKS Pod Identity, so no static AWS keys), pulls the real value, and materializes a Kubernetes Secret in the cluster. The secret of record lives in a purpose-built store with its own audit log, rotation, and access policy; Git only ever holds a reference. This is usually the right default for teams already standardized on AWS Secrets Manager, and it composes cleanly with the rest of your IAM story.

A practical rule of thumb: choose External Secrets + AWS Secrets Manager when you want a single source of truth for secrets with native rotation and IAM-scoped access (most AWS-centric teams), and choose Sealed Secrets when you want a zero-dependency, fully-in-Git approach for a small footprint and are comfortable owning the key lifecycle. Either way, the non-negotiable is the same: no plaintext secret, and no long-lived AWS access key, ever touches the repo.

safe rollouts

VProgressive delivery — Argo Rollouts and metric-gated canaries

A plain GitOps sync replaces pods using Kubernetes' default rolling update — fine for low-stakes services, risky for anything where a bad version hurts. Progressive delivery layers a controlled, observable, automatically-abortable rollout on top of GitOps, so a new version is exposed to a slice of traffic, watched against real metrics, and rolled back automatically if it misbehaves — all still driven from Git.

With Argo CD the standard tool is Argo Rollouts. You replace a Deployment with a Rollout resource that declares a strategy — canary (shift 10% → 25% → 50% → 100% with pauses) or blue-green (stand up the new version alongside the old, then flip traffic). Between steps, Rollouts runs analysis: it queries a metrics provider (Amazon Managed Prometheus, CloudWatch, Datadog) for your success criteria — error rate, p95 latency, a custom business metric — and only proceeds if the new version is healthy. If the analysis fails, it automatically aborts and rolls back to the previous version. Because the Rollout spec lives in Git, the entire rollout policy is versioned and reviewable like everything else.

With Flux the equivalent is Flagger, which drives the same canary / blue-green / A-B patterns by progressively shifting traffic (via your ingress controller or a service mesh) and gating each step on metrics. The mechanics differ but the principle is identical: never flip 100% of traffic to an unproven version, and let objective metrics — not a human watching a dashboard at 2am — decide whether to advance or abort.

On AWS the traffic-shifting layer is usually an ingress (the AWS Load Balancer Controller fronting an ALB), or a service mesh (Istio, Linkerd, or AWS App Mesh) when you need finer-grained routing. Either way, progressive delivery is what turns "we deploy via GitOps" into "we deploy via GitOps and a bad release self-heals before customers notice." It is also one of the highest-leverage things a partner sets up, because wiring analysis to the right metrics — and choosing thresholds that catch real regressions without flapping — takes experience.

scale + infrastructure

VIMulti-cluster, multi-environment, and GitOps for infrastructure itself

Two questions come up the moment GitOps works for one app on one cluster: how do I promote across environments and clusters cleanly, and can I manage the AWS infrastructure under the cluster — VPCs, RDS, IAM — the same GitOps way? Both have good answers, with one important honest caveat about where GitOps stops being the right tool.

Multi-cluster / multi-env. A single GitOps control plane can reconcile many clusters: Argo CD registers multiple destination clusters and uses ApplicationSets to fan a stack out across them; Flux runs a reconciler per cluster, each pointed at the right path of a shared repo. Environment promotion is modeled as a Git change — promote by bumping the image tag (or the chart version) in the next environment's overlay, gated by a pull-request approval. The clean version of this is a strict ladder: an image is built once, validated in dev, promoted (same digest) to staging, validated, then promoted to prod. Because promotion is a PR, you get review, audit, and an instant rollback (revert the commit) at every hop — and you are always shipping the same artifact, not rebuilding per environment.

GitOps for infrastructure. You can extend the model below the cluster in two ways. The first is to keep using your IaC tool — Terraform (now BSL-licensed) or OpenTofu (the open fork) or AWS CDK — and drive it from a Git-triggered controller so infra changes flow through pull requests like everything else (tools such as the Terraform/Tofu Controller for Flux, or Atlantis, give you plan-on-PR / apply-on-merge). The second is Crossplane, which lets you provision and reconcile AWS resources (an RDS instance, an S3 bucket, an SQS queue, IAM roles) as Kubernetes custom resources — so the same Argo CD or Flux reconciler that manages your apps also manages your cloud infrastructure, continuously correcting drift the same way it does for workloads. Crossplane is powerful for platform teams building a true internal platform; Terraform/OpenTofu remains the pragmatic default for most teams who already have state and modules.

The honest caveat: GitOps reconciliation shines for declarative, idempotent resources. Some infrastructure operations are genuinely stateful and dangerous to "continuously reconcile" — a database engine-version upgrade, a destructive migration, anything with a real-world side effect that should happen exactly once with a human in the loop. For those, the right pattern is plan-and-approve (Terraform/OpenTofu with a gated apply), not an always-on controller that might re-apply something irreversible. A mature setup uses GitOps reconciliation for the 90% that is safely declarative and gated IaC pipelines for the stateful 10%. Knowing which is which is exactly the kind of judgment a vetted partner brings.

know what is true

VIIObservability and drift detection — knowing the cluster equals Git

The whole promise of GitOps is that the live system matches Git. That promise is only worth something if you can <em>see</em> when it does not — and get told fast when something has drifted, failed to sync, or been changed out-of-band. Observability is not an afterthought here; it is how you trust the model.

Sync and drift visibility. Both reconcilers expose health and sync state. Argo CD's UI and API show, per application, whether it is Synced or OutOfSync (drifted) and Healthy or Degraded, with a live diff of exactly which fields differ between Git and the cluster. Flux exposes the same through flux get / Kustomization status conditions and Prometheus metrics. You wire these into alerting (Argo CD notifications, Flux's notification-controller) so an OutOfSync or failed reconcile pings Slack or PagerDuty rather than sitting unnoticed — including the case where someone hand-edited production and the reconciler is reporting (or reverting) the drift.

Sync policy is a real decision. Manual sync (a human approves each apply) gives maximum control and is common for production early on; automated sync converges without intervention. With automated sync you also choose self-heal (the reconciler actively reverts any out-of-band change back to Git) and prune (resources removed from Git are deleted from the cluster). Self-heal is the strongest expression of "Git is the source of truth" — it makes manual cluster edits literally not stick — but you turn it on deliberately, per environment, once you trust the pipeline.

Workload observability still applies. Underneath GitOps you run the normal AWS-native stack — Amazon Managed Service for Prometheus + Amazon Managed Grafana (or CloudWatch Container Insights, or Datadog) for metrics, dashboards, and the SLO signals that progressive-delivery analysis depends on. GitOps tells you whether the cluster matches Git; your observability stack tells you whether what is running is actually healthy. You need both, and they reinforce each other: the same Prometheus metrics that power your dashboards are what gate your canary rollouts.

  • Drift detected (OutOfSync) — Live cluster no longer matches Git. Either someone changed it out-of-band, or a sync failed. With self-heal on, the reconciler reverts it; either way, alert on it.
  • Sync failed / Degraded — A manifest is invalid, a dependency is missing, or a resource will not become healthy. The reconciler reports it instead of silently leaving production half-applied.
  • Reconcile latency — How long between a Git commit and the cluster converging. Worth tracking — a slow or stuck reconciler is a deploy-pipeline outage even when nothing looks broken.
  • Out-of-band write attempts — In a mature setup, humans rarely have direct kubectl write to prod at all — the path to change production is a pull request, so out-of-band writes are the exception you investigate, not the norm.
honest answer

VIIIWhen GitOps is worth it (and when it is overkill) — plus the CloudRoute tie-in

GitOps is excellent, not universal. The most useful thing this page can do is tell you honestly when it pays for itself and when it is ceremony you do not need yet — and then how to get it implemented without hiring a platform team.

GitOps is clearly worth it when you are already on Kubernetes (EKS) and running more than a couple of services; when more than one or two people deploy and you need an audit trail and review on every change; when you run multiple environments or clusters and want clean, reviewable promotion; when compliance (SOC 2, ISO 27001) wants change-management evidence and least-privilege access to production; and when you want fast, safe rollback by reverting a commit. For a team in that shape, GitOps is close to a no-brainer — it is the default operating model for serious EKS platforms in 2026.

GitOps is probably overkill when you are not on Kubernetes at all (if you are on ECS Fargate, Lambda, or App Runner, a good CI/CD pipeline is the right tool — see our CI/CD-on-AWS reference — not a Kubernetes reconciler); when you are a single developer or a tiny team shipping one small service where the control plane is more overhead than the audit trail is worth; or when your workloads are mostly imperative or stateful in ways that fight continuous reconciliation. GitOps adds a control plane to operate; if the benefits above do not apply to you yet, that overhead is not free, and it is fine to wait.

Here is the honest CloudRoute tie-in. GitOps is one of those things that is straightforward in a demo and fiddly in production — the repo topology, the secrets strategy, IRSA/Pod Identity, progressive-delivery thresholds, self-heal and prune policy per environment, multi-cluster promotion. CloudRoute does not implement it directly. We route you to a vetted AWS partner who stands up the whole GitOps control plane on EKS — Argo CD or Flux, the app-of-apps or Kustomization tree, External Secrets or Sealed Secrets, Argo Rollouts or Flagger, drift alerting — and hands it to your team running and documented. For credit-eligible companies the engagement is frequently AWS-funded (the partner is paid through AWS partner programs and your AWS spend is credit-covered), so the customer pays $0 or low cost. For everyone else it is a vetted-partner referral that skips the hiring-and-vetting slog — you get a senior platform engineer's output without a senior platform engineer's headcount.

often AWS-funded — honestly scoped

If you qualify for AWS credits (typically institutionally-funded startups), the GitOps implementation is often substantially AWS-funded — the partner is paid through AWS partner programs and your AWS spend runs on credits, so your out-of-pocket can be $0. If you are not credit-eligible, it is a straight vetted-partner referral: a fixed-scope GitOps build by an engineer CloudRoute has already vetted, with no recruiting cycle. We will tell you which bucket you are in before any work starts. See $100K AWS credits and the startup path.

the model decision

Push-based CD vs pull-based GitOps — when each is right

Before the Argo-vs-Flux question comes a more fundamental one: do you even want GitOps (pull-based reconciliation), or is a classic push-based pipeline the better fit? This is the decision that should come first, because GitOps is a Kubernetes-shaped answer — if you are not on Kubernetes, the honest recommendation is often a push-based pipeline instead.

VariablePush-based CD (classic pipeline)Pull-based GitOps (Argo CD / Flux)
Who applies changesCI runner pushes (kubectl/helm apply) into the clusterIn-cluster agent pulls from Git and reconciles
Credentials directionCI holds cluster/cloud creds (outside-in)Agent runs inside cluster; no external write creds (inside-out)
Drift handlingNone after deploy — cluster can silently driftContinuously detected; optionally self-healed back to Git
Source of truthThe pipeline run / whatever was last appliedGit, always — the cluster converges to it
RollbackRe-run pipeline / re-deploy previous artifactRevert the commit; reconciler converges back
Best forECS / Lambda / App Runner, non-Kubernetes, simple setupsKubernetes (EKS), multi-env/cluster, compliance, fleets
Audit trailPipeline logs (CI system)git log + PR review on every change
Main costDrift + credential sprawl as you scaleA control plane to run + repo topology to design
If you are on EKS with multiple services, environments, or compliance needs, pull-based GitOps is almost always the better operating model. If you are on ECS Fargate, Lambda, or App Runner, a well-built push-based pipeline (see our CI/CD-on-AWS reference) is usually the right call — do not adopt a Kubernetes reconciler just to get GitOps branding. CloudRoute partners build either; the recommendation follows your actual platform.
ready to move off hand-run kubectl?
Get a production GitOps setup on EKS built and handed to your team
Start in 3 minutes →
a recent match

A GitOps rollout on EKS — anonymized

inquiry · seed-to-series-a b2b saas, ~20 services on EKS, EU-Central
B2B SaaS, ~20 engineers, ~20 microservices on a single EKS cluster, deploying by hand-run kubectl/helm from CI

Situation: Growing fast on EKS but deploying with helm upgrade run from a CI job and the occasional manual kubectl during incidents. No real audit trail, frequent config drift (staging and prod had quietly diverged), and a SOC 2 auditor asking for change-management evidence and least-privilege access to production. Their one infra-literate engineer was ~70% on product and could not own a GitOps build on top of that.

What CloudRoute did: Routed within 24 hours to an EU-Central partner with EKS + GitOps track record. Partner stood up Argo CD with an app-of-apps topology, split the single repo into app-source vs config repos, moved secrets to External Secrets backed by AWS Secrets Manager via IRSA (zero static keys), added Argo Rollouts canaries gated on Amazon Managed Prometheus metrics, and wired OutOfSync/sync-failure alerts into Slack. Dev→staging→prod promotion became a reviewed PR bumping the same image digest; prod got automated sync with self-heal on.

Outcome: Live in ~3 weeks. Drift went to zero (self-heal reverts out-of-band edits); every prod change is now a PR with review and an instant revert-to-rollback path; direct human kubectl write to prod was removed, which closed the auditor's least-privilege and change-management findings. Because the company was credit-eligible, the engagement was AWS-funded and the customer paid $0 — CloudRoute's commission was paid by the partner from AWS engagement funding.

engagement window: ~3 weeks · reconciler: Argo CD + Rollouts · secrets: External Secrets + Secrets Manager (IRSA) · drift: self-healed · cost to customer: $0 (credit-eligible)

faq

Common questions

What is GitOps on AWS, in one paragraph?
GitOps is an operating model where the desired state of your system lives in Git as declarative manifests, and an agent running inside your cluster continuously pulls that state and reconciles the live cluster to match it. On AWS that almost always means Argo CD or Flux running on Amazon EKS. Nobody applies changes to production by hand; you change Git via a pull request, and the in-cluster controller converges the cluster to the new desired state — detecting and (optionally) reverting any drift along the way. The benefits are a full audit trail (git log + PR review), easy rollback (revert the commit), least-privilege access to production, and a cluster that provably matches what is in Git.
Argo CD vs Flux — which should I use?
Both are CNCF graduated projects and both run great on EKS, so there is no wrong answer. Pick Argo CD if you want a strong built-in web UI (live diff, topology view, one-click sync/rollback), the app-of-apps pattern, and Argo Rollouts for progressive delivery — it tends to win at product teams who want visibility and developer self-service. Pick Flux if you want a leaner, composable, fully-automated control plane with built-in image automation and clean multi-tenancy, observed via CLI and your own dashboards — it tends to win at platform teams running fleets who want no human clicking a sync button. A common hybrid is Argo CD for app teams plus Flux for platform add-ons, but most startups should pick one and standardize.
How do you handle secrets in GitOps without putting them in Git?
Two mainstream patterns on AWS. Sealed Secrets encrypts the secret locally (with kubeseal) into a SealedSecret resource whose ciphertext is safe to commit; an in-cluster controller is the only thing that can decrypt it. External Secrets Operator (the more common choice at scale) commits only a reference — an ExternalSecret pointing at AWS Secrets Manager or SSM Parameter Store — and pulls the real value at runtime, authenticating via IRSA or EKS Pod Identity so no static AWS keys are involved. Rule of thumb: External Secrets + AWS Secrets Manager when you want one source of truth with native rotation and IAM-scoped access; Sealed Secrets when you want a zero-dependency, fully-in-Git approach and are comfortable owning the encryption key lifecycle. Either way, no plaintext secret and no long-lived AWS access key ever touches the repo.
What is the app-of-apps pattern?
App-of-apps is an Argo CD pattern where one root Application points at a directory of child Application manifests, each managing a real workload or platform component. It lets you manage the GitOps platform itself with GitOps: adding a new app becomes a single file plus a PR, and a fresh cluster bootstraps by installing Argo CD and pointing it at the root app — everything else cascades. At fleet scale, ApplicationSets generalize this by templating many Applications across many clusters or namespaces. Flux expresses the same idea as a tree of Kustomization resources with explicit dependsOn ordering.
How does progressive delivery (canary / blue-green) fit with GitOps?
A plain GitOps sync uses Kubernetes' default rolling update, which is fine for low-stakes services but offers no automatic safety check. Progressive delivery layers a controlled rollout on top: with Argo CD you use Argo Rollouts (a Rollout resource declaring a canary or blue-green strategy), and with Flux you use Flagger. Between rollout steps, the controller runs analysis against real metrics (Amazon Managed Prometheus, CloudWatch, Datadog) and only advances if error rate, latency, and your success criteria hold — otherwise it automatically aborts and rolls back. Because the rollout spec lives in Git, the entire policy is versioned and reviewable like everything else.
Can I manage AWS infrastructure (VPC, RDS, IAM) with GitOps too?
Partly, and you should be deliberate about it. You can drive Terraform (BSL-licensed) or OpenTofu (the open fork) or CDK from a Git-triggered controller (e.g. the Tofu/Terraform Controller for Flux, or Atlantis) so infra changes flow through plan-on-PR / apply-on-merge. Or you can use Crossplane to model AWS resources (RDS, S3, SQS, IAM) as Kubernetes custom resources reconciled by the same Argo CD or Flux control plane. The honest caveat: continuous reconciliation is ideal for declarative, idempotent resources, but some operations (a database engine upgrade, a destructive migration) are stateful and should happen exactly once with a human in the loop — those belong in a gated IaC pipeline, not an always-on reconciler. Mature setups use GitOps for the safely-declarative 90% and gated Terraform/OpenTofu for the stateful 10%.
What is drift, and how does GitOps handle it?
Drift is when the live cluster no longer matches what is in Git — usually from a manual edit during an incident, a half-finished change, or an operator mutating a resource. In a push-based pipeline, drift goes undetected because nothing watches after the deploy. In GitOps, the reconciler continuously compares live state to Git and reports the resource as OutOfSync with a live diff of exactly what differs. If you enable self-heal, it actively reverts the out-of-band change back to the Git-defined state, so manual edits literally do not stick. You wire OutOfSync and sync-failure events into Slack or PagerDuty so drift is surfaced, not silent.
When is GitOps overkill — should everyone use it?
No. GitOps is the right operating model when you are on Kubernetes (EKS) with more than a couple of services, multiple people deploying, multiple environments or clusters, or compliance needs around change management and least-privilege access. It is probably overkill if you are not on Kubernetes at all (on ECS Fargate, Lambda, or App Runner a good push-based CI/CD pipeline is the right tool, not a reconciler), if you are a solo dev or tiny team with one small service, or if your workloads are mostly imperative/stateful in ways that fight continuous reconciliation. GitOps adds a control plane to operate; adopt it when the audit-trail, drift-control, and safe-rollback benefits actually apply to you.
Can CloudRoute implement GitOps for us, and is it really AWS-funded?
CloudRoute does not implement it directly — we route you to a vetted AWS partner who stands up the whole GitOps control plane on EKS (Argo CD or Flux, app-of-apps or Kustomization tree, External/Sealed Secrets, Argo Rollouts or Flagger, drift alerting) and hands it to your team running and documented, typically in two to four weeks. For credit-eligible companies (usually institutionally-funded startups) the engagement is often substantially AWS-funded: the partner is paid through AWS partner programs and your AWS spend runs on credits, so your out-of-pocket can be $0. If you are not credit-eligible, it is a straight vetted-partner referral — a fixed-scope build by a pre-vetted engineer, with no recruiting cycle. We tell you which bucket you are in before any work starts.

Want GitOps running on EKS — Argo CD or Flux, done right?

CloudRoute routes you to a vetted AWS partner who stands up the control plane, repo topology, secrets, progressive delivery, and drift alerting — then hands it over documented. Often AWS-funded for credit-eligible companies, so the customer pays $0.

reconcilerArgo CD / Flux
typical setup2–4 wks
cost if credit-eligible$0
GitOps on AWS — Argo CD vs Flux on EKS (2026 guide) · CloudRoute