Kubernetes GitOps Projects: building a multi-tenant control plane
A walkthrough of how I designed and shipped a production-grade GitOps platform on Kubernetes — ArgoCD for continuous delivery, Helm for packaging, Terraform for the substrate, and policy-as-code keeping tenants inside their lane. Useful as a reference architecture, a portfolio piece, or a DevOps project idea you can adapt for a resume.
Click-ops doesn't scale across teams
A single shared Kubernetes cluster runs faster than ten snowflake ones — until ten teams start pushing changes through the console. Drift between staging and production, mystery RBAC, and silent Helm upgrades turn the cluster into a haunted house. The goal: one git repository as the source of truth, automated reconciliation, and per-tenant guardrails.
The GitOps control plane
- Substrate: Terraform provisions EKS / GKE, IAM, VPC, and the bootstrap namespace.
- Reconciler: ArgoCD runs the app-of-apps pattern; each tenant owns an Application that points at their folder.
- Packaging: Helm charts with values overlays per environment (dev / staging / prod).
- Policy: OPA Gatekeeper + Kyverno enforce namespace quotas, image registries, and required labels.
- Progressive delivery: Argo Rollouts for canary and blue-green with Prometheus analysis.
- Secrets: External Secrets Operator pulls from AWS Secrets Manager / GCP Secret Manager.
- Observability: Prometheus + Grafana + Loki; ArgoCD notifications post deploy events to Slack.
From a pull request to a live pod
- Developer opens a PR against the tenant's app repo.
- CI builds the image, signs it with cosign, and pushes to the registry.
- A bot opens a follow-up PR in the GitOps repo bumping the image tag.
- Reviewers approve; merge triggers ArgoCD to sync the new manifest.
- Argo Rollouts shifts traffic 10% → 50% → 100% while Prometheus watches the SLO.
- If error rate breaches the threshold, the rollout auto-aborts and reverts.
What I'd tell someone copying this for their resume
- Start with one app, one tenant, one environment. The app-of-apps pattern is what scales it later.
- Pin every chart and image by digest. "latest" is the fastest way to recreate the haunted cluster.
- Treat the GitOps repo like production code — required reviews, CODEOWNERS, branch protection.
- Prometheus-driven rollouts catch real regressions; manual canaries don't.
- Document the tenant onboarding in the repo's README. Future-you is a tenant.
Want the full source?
The reference implementations live on GitHub — browse the repositories or jump back to the selected works archive for adjacent projects (service mesh, eBPF security, FinOps).