Ayush Tiwari

01 · The story

About Me

I've spent the last ten years in the space between application code and production, the messy, interesting place where deployments either feel invisible or keep you up at night.

I got into DevOps and platform engineering because I liked making that gap smaller. Fewer manual steps. Fewer surprises on release day. More time for teams to focus on what they're actually building. That instinct, to make things reliable and repeatable, still drives most of what I do.

A lot of my work has started with pipelines. Not because CI/CD is glamorous, but because it's where trust gets built. I've designed application and infrastructure pipelines across Jenkins, GitHub Actions, ArgoCD, and Terraform, from designing environments for Python and Node services to full AWS environments provisioned as code. When a team believes in their path to production, they ship with confidence. That's the bar I hold myself to.

At enterprise scale, Kubernetes stops being a single cluster and becomes a shared foundation: many teams, many tenants, one platform. I've designed and operated multi-tenant EKS environments with proper isolation, RBAC, and guardrails across hundreds of clusters. The hard part isn't the manifests. It's making the platform feel safe and predictable for everyone on it, including whoever is on-call at 2 a.m.

That experience led me to internal developer platforms. I led the architecture of an IDP that moved teams off brittle VM workflows onto Kubernetes, with self-service for developers and consistent standards for the organization. Reusable Terraform modules, GitOps with ArgoCD, sensible paths for Python and Node applications. The goal was straightforward: make the right thing the easy thing.

Cloud cost is where good engineering meets real business impact. I've optimized over $250K in lifetime cloud spend through right-sizing, event-driven automation, and treating cost as a metric you watch, not a surprise you explain. Reliability and efficiency usually turn out to be the same conversation.

None of it holds together without observability and governance. I've built monitoring stacks with Prometheus, Grafana, Loki, and Datadog, and embedded security and compliance checks into CI/CD so problems surface early, not in production. A platform that runs fast but fails quietly isn't a platform I'd want to operate.

If you're reading this, you're probably deciding whether we'd work well together. I care about that. I enjoy mentoring engineers, partnering with product teams, and solving problems that sit at the intersection of people and infrastructure. However you found your way here, whether hiring, collaborating, or just curious, I'm glad you stopped by.

02 · Work

Key Projects

Implementation stories from platforms I have built and operated. Scroll the rail and open a case study for architecture detail.

Scroll horizontally to browse projects, then click a tile to open the case study.

03 · Career

Work Experience

Maturing in-house EKS and OpenShift platforms, expanding into SRE ownership, and bringing Kubernetes-native infrastructure management and AI-assisted automation to enterprise cloud operations.

Maturing in-house EKS and OpenShift platforms to improve reliability and keep uptime stable through maintenance windows, not just during steady state.
Driving adoption of Crossplane for a Kubernetes-native approach to cloud infrastructure, with continuous reconciliation that keeps resource state aligned and drift close to zero.
Took on additional SRE responsibilities to deepen platform knowledge from an operational lens: incident response, reliability patterns, and how platforms behave under real production load.
Expanding the automation footprint by applying AI tooling to repetitive platform tasks, reducing manual hours on work that should not need a human every time.
Building a framework to correlate signals across monitoring and observability tools, shortening the path from alert to root cause during incident troubleshooting.
Working on GPU workload and cost optimization strategies so accelerated compute is allocated based on actual demand, not oversized standing reservations.
Exploring AWS Bedrock AgentCore for platform engineering use cases, including intelligent runbook assistance and faster operational decision-making at scale.

Focused on cloud cost optimization, Terraform platform improvements, and operational readiness for a GenAI MVP on AWS.

Cut cloud spend by roughly 30% through right-sizing and event-driven automation, improving utilization without trading away uptime.
Refactored Terraform modules for reuse and consistency so infrastructure changes moved faster across teams.
Helped shape a GenAI MVP on AWS with Well-Architected patterns built in for multi-tenancy, scale, and data isolation.

Led platform engineering for enterprise SaaS at scale: internal developer platforms, multi-tenant Kubernetes, and the operational backbone behind hundreds of clusters.

Owned platform operations across 25+ AWS accounts, 100+ EKS clusters, 600+ ECS clusters, and 1,500+ EC2 and Fargate nodes supporting multiple SaaS products.
Built infrastructure CI/CD with Terraform for provisioning, patching, and upgrades, with observability wired in through Prometheus, Grafana, Loki, Promtail, and Datadog.
Architected and rolled out an Internal Developer Platform on EKS and Terraform Cloud, giving Python and Node teams self-service workflows that still met security and compliance requirements.
Introduced GitOps with ArgoCD and Bitbucket so platform and application changes moved from commit to cluster with clear, reviewable promotion paths.
Designed multi-tenant EKS with RBAC tied to Azure Entra ID and AWS IAM, so many teams could share clusters without blurring ownership or access boundaries.
Scaled clusters with demand using Karpenter for nodes and KEDA with HPA for workloads, replacing static capacity planning with event-driven growth.
Improved GPU utilization with DCGM, Prometheus Adapter, and HPA so costly hardware was allocated based on real usage, not generous reservations.
Standardized AWS landing zones through reusable Terraform modules, turning one-off environment builds into repeatable, reviewable platform deliveries.
Strengthened the security posture by embedding SAST and DAST in CI/CD (Wiz, SonarQube, Black Duck, Veracode) and runtime protection with Sysdig Falco on container workloads.
Mentored DevOps engineers and partnered with product teams on platform adoption, golden paths, and operational readiness for large-scale on-call coverage.

Built CI/CD, Kubernetes, and cloud automation across polyglot application stacks and multi-cloud migrations for client delivery teams.

Built CI/CD for Django, Node, Angular, and Java applications using AWS CodePipeline, GitHub Actions, and Jenkins.
Ran microservices on Docker Compose and Kubernetes across AWS ECS and EKS, with security baselines applied to cluster and image workflows.
Codified infrastructure in CloudFormation and Terraform so new environments were provisioned the same way every time.
Delivered serverless release pipelines with SAM, API Gateway, and Lambda for teams that needed fast iteration without long-lived servers.
Supported migrations between on-prem, AWS, and Azure while tuning latency and cloud spend for production workloads.

Started in cloud migration, release automation, and observability for enterprise manufacturing systems moving off on-prem infrastructure.

Migrated manufacturing systems from on-prem to cloud while keeping production workflows stable for operations teams.
Introduced blue-green deployments on critical paths so releases no longer meant planned downtime windows.
Built ELK pipelines and Kibana views so web access patterns were visible instead of buried in raw logs.
Set up CloudWatch log streams and handled Linux administration across EC2 fleets during the transition.
Automated VM patching with Ansible when manual maintenance stopped scaling with fleet size.

04 · Stack

Technical Skills

The platforms I operate, the pipelines I automate, and the observability stack I reach for when production gets interesting.

Designing and maturing Kubernetes platforms that stay reliable through upgrades, maintenance windows, and everyday change.

KubernetesAmazon EKSOpenShiftDockerHelmCrossplane

Provisioning and governing multi-cloud environments with infrastructure as code and repeatable patterns.

AWSAzureTerraformCloudFormationAnsible

Shipping software safely with GitOps workflows, automated pipelines, and policy-aware release paths.

ArgoCDGitHub ActionsGitLabJenkinsTeamCityAzure DevOpsBitBucket

Correlating signals across the stack so incidents move from alert to root cause faster.

PrometheusGrafanaDatadogSignalFxSysdigSplunkELKCloudWatch

Linux-first automation, shell glue, and Python when a problem needs more than bash one-liners.

LinuxBashPython

Optimizing GPU workloads and exploring AI-native tooling for platform operations and cost control.

GPU cost optimizationAWS BedrockAgentCoreAI-assisted automation

KubernetesAmazon EKSOpenShiftDockerHelmCrossplaneAWSAzureTerraformCloudFormationAnsibleArgoCDGitHub ActionsGitLabJenkinsTeamCityAzure DevOpsBitBucketPrometheusGrafanaDatadogSignalFxSysdigSplunkELKCloudWatchLinuxBashPythonGPU cost optimizationAWS BedrockAgentCoreAI-assisted automationKubernetesAmazon EKSOpenShiftDockerHelmCrossplaneAWSAzureTerraformCloudFormationAnsibleArgoCDGitHub ActionsGitLabJenkinsTeamCityAzure DevOpsBitBucketPrometheusGrafanaDatadogSignalFxSysdigSplunkELKCloudWatchLinuxBashPythonGPU cost optimizationAWS BedrockAgentCoreAI-assisted automation

05 · Connect

Contact Me

Let's connect and build something amazing together!

Send Email Resume

Email dbcelm@gmail.com Phone +91 (962) 740-3300 GitHub github.com/dbcelm LinkedIn linkedin.com/in/dbcelm

A snapshot of infrastructure scale and measurable outcomes across enterprise platforms.

About Me

Key Projects

Bottlerocket Migration on EKS

Karpenter on AWS EKS

Multi-Account EKS Observability

Argo CD at Scale

VPC CNI to Cilium

EKS Cost Optimization

Dynamic GPU Partitioning

KEDA at Scale

Multi-Tenant EKS

SaaS Throttling on EKS

Terraform to Crossplane

Securing CI/CD Pipelines

Work Experience

TIAA

NICE Actimize

ZS Associates

Codal Inc.

Cognizant

Technical Skills

Platform Engineering

Cloud & Infrastructure

Delivery & GitOps

Observability & SRE

Systems & Scripting

AI & Accelerated Compute

Contact Me