DevOps · Cloud · Platform Engineering

Ayush Tiwari

DevOps & Platform Engineer

Building resilient, scalable infrastructure for enterprise SaaS platforms

kubernetes · eks · openshift · crossplane · aws · terraform · argocd · python

Scroll
Ayush Tiwari, DevOps Engineer
Track record

A snapshot of infrastructure scale and measurable outcomes across enterprise platforms.

0+Years in DevOps & Platform Engineering
0+Production Compute Nodes Managed
0+EKS & ECS Clusters Operated
$0K+Lifetime Cloud Costs Optimized

About Me

I've spent the last ten years in the space between application code and production, the messy, interesting place where deployments either feel invisible or keep you up at night.

I got into DevOps and platform engineering because I liked making that gap smaller. Fewer manual steps. Fewer surprises on release day. More time for teams to focus on what they're actually building. That instinct, to make things reliable and repeatable, still drives most of what I do.

A lot of my work has started with pipelines. Not because CI/CD is glamorous, but because it's where trust gets built. I've designed application and infrastructure pipelines across Jenkins, GitHub Actions, ArgoCD, and Terraform, from designing environments for Python and Node services to full AWS environments provisioned as code. When a team believes in their path to production, they ship with confidence. That's the bar I hold myself to.

At enterprise scale, Kubernetes stops being a single cluster and becomes a shared foundation: many teams, many tenants, one platform. I've designed and operated multi-tenant EKS environments with proper isolation, RBAC, and guardrails across hundreds of clusters. The hard part isn't the manifests. It's making the platform feel safe and predictable for everyone on it, including whoever is on-call at 2 a.m.

That experience led me to internal developer platforms. I led the architecture of an IDP that moved teams off brittle VM workflows onto Kubernetes, with self-service for developers and consistent standards for the organization. Reusable Terraform modules, GitOps with ArgoCD, sensible paths for Python and Node applications. The goal was straightforward: make the right thing the easy thing.

Cloud cost is where good engineering meets real business impact. I've optimized over $250K in lifetime cloud spend through right-sizing, event-driven automation, and treating cost as a metric you watch, not a surprise you explain. Reliability and efficiency usually turn out to be the same conversation.

None of it holds together without observability and governance. I've built monitoring stacks with Prometheus, Grafana, Loki, and Datadog, and embedded security and compliance checks into CI/CD so problems surface early, not in production. A platform that runs fast but fails quietly isn't a platform I'd want to operate.

If you're reading this, you're probably deciding whether we'd work well together. I care about that. I enjoy mentoring engineers, partnering with product teams, and solving problems that sit at the intersection of people and infrastructure. However you found your way here, whether hiring, collaborating, or just curious, I'm glad you stopped by.

Key Projects

Implementation stories from platforms I have built and operated. Scroll the rail and open a case study for architecture detail.

Scroll horizontally to browse projects, then click a tile to open the case study.

Work Experience

TIAA

Senior Specialist - Infra and Cloud

Pune, India

Maturing in-house EKS and OpenShift platforms, expanding into SRE ownership, and bringing Kubernetes-native infrastructure management and AI-assisted automation to enterprise cloud operations.

  • Maturing in-house EKS and OpenShift platforms to improve reliability and keep uptime stable through maintenance windows, not just during steady state.
  • Driving adoption of Crossplane for a Kubernetes-native approach to cloud infrastructure, with continuous reconciliation that keeps resource state aligned and drift close to zero.
  • Took on additional SRE responsibilities to deepen platform knowledge from an operational lens: incident response, reliability patterns, and how platforms behave under real production load.
  • Expanding the automation footprint by applying AI tooling to repetitive platform tasks, reducing manual hours on work that should not need a human every time.
  • Building a framework to correlate signals across monitoring and observability tools, shortening the path from alert to root cause during incident troubleshooting.
  • Working on GPU workload and cost optimization strategies so accelerated compute is allocated based on actual demand, not oversized standing reservations.
  • Exploring AWS Bedrock AgentCore for platform engineering use cases, including intelligent runbook assistance and faster operational decision-making at scale.

NICE Actimize

Specialist DevOps Engineer

Pune, India

Focused on cloud cost optimization, Terraform platform improvements, and operational readiness for a GenAI MVP on AWS.

  • Cut cloud spend by roughly 30% through right-sizing and event-driven automation, improving utilization without trading away uptime.
  • Refactored Terraform modules for reuse and consistency so infrastructure changes moved faster across teams.
  • Helped shape a GenAI MVP on AWS with Well-Architected patterns built in for multi-tenancy, scale, and data isolation.

ZS Associates

DevOps Engineering Lead

Ahmedabad, India

Led platform engineering for enterprise SaaS at scale: internal developer platforms, multi-tenant Kubernetes, and the operational backbone behind hundreds of clusters.

  • Owned platform operations across 25+ AWS accounts, 100+ EKS clusters, 600+ ECS clusters, and 1,500+ EC2 and Fargate nodes supporting multiple SaaS products.
  • Built infrastructure CI/CD with Terraform for provisioning, patching, and upgrades, with observability wired in through Prometheus, Grafana, Loki, Promtail, and Datadog.
  • Architected and rolled out an Internal Developer Platform on EKS and Terraform Cloud, giving Python and Node teams self-service workflows that still met security and compliance requirements.
  • Introduced GitOps with ArgoCD and Bitbucket so platform and application changes moved from commit to cluster with clear, reviewable promotion paths.
  • Designed multi-tenant EKS with RBAC tied to Azure Entra ID and AWS IAM, so many teams could share clusters without blurring ownership or access boundaries.
  • Scaled clusters with demand using Karpenter for nodes and KEDA with HPA for workloads, replacing static capacity planning with event-driven growth.
  • Improved GPU utilization with DCGM, Prometheus Adapter, and HPA so costly hardware was allocated based on real usage, not generous reservations.
  • Standardized AWS landing zones through reusable Terraform modules, turning one-off environment builds into repeatable, reviewable platform deliveries.
  • Strengthened the security posture by embedding SAST and DAST in CI/CD (Wiz, SonarQube, Black Duck, Veracode) and runtime protection with Sysdig Falco on container workloads.
  • Mentored DevOps engineers and partnered with product teams on platform adoption, golden paths, and operational readiness for large-scale on-call coverage.

Codal Inc.

Senior DevOps Engineer

Ahmedabad, India

Built CI/CD, Kubernetes, and cloud automation across polyglot application stacks and multi-cloud migrations for client delivery teams.

  • Built CI/CD for Django, Node, Angular, and Java applications using AWS CodePipeline, GitHub Actions, and Jenkins.
  • Ran microservices on Docker Compose and Kubernetes across AWS ECS and EKS, with security baselines applied to cluster and image workflows.
  • Codified infrastructure in CloudFormation and Terraform so new environments were provisioned the same way every time.
  • Delivered serverless release pipelines with SAM, API Gateway, and Lambda for teams that needed fast iteration without long-lived servers.
  • Supported migrations between on-prem, AWS, and Azure while tuning latency and cloud spend for production workloads.

Cognizant

Programmer Analyst

Chennai, India

Started in cloud migration, release automation, and observability for enterprise manufacturing systems moving off on-prem infrastructure.

  • Migrated manufacturing systems from on-prem to cloud while keeping production workflows stable for operations teams.
  • Introduced blue-green deployments on critical paths so releases no longer meant planned downtime windows.
  • Built ELK pipelines and Kibana views so web access patterns were visible instead of buried in raw logs.
  • Set up CloudWatch log streams and handled Linux administration across EC2 fleets during the transition.
  • Automated VM patching with Ansible when manual maintenance stopped scaling with fleet size.

Technical Skills

The platforms I operate, the pipelines I automate, and the observability stack I reach for when production gets interesting.

01

Platform Engineering

Designing and maturing Kubernetes platforms that stay reliable through upgrades, maintenance windows, and everyday change.

KubernetesAmazon EKSOpenShiftDockerHelmCrossplane
02

Cloud & Infrastructure

Provisioning and governing multi-cloud environments with infrastructure as code and repeatable patterns.

AWSAzureTerraformCloudFormationAnsible
03

Delivery & GitOps

Shipping software safely with GitOps workflows, automated pipelines, and policy-aware release paths.

ArgoCDGitHub ActionsGitLabJenkinsTeamCityAzure DevOpsBitBucket
04

Observability & SRE

Correlating signals across the stack so incidents move from alert to root cause faster.

PrometheusGrafanaDatadogSignalFxSysdigSplunkELKCloudWatch
05

Systems & Scripting

Linux-first automation, shell glue, and Python when a problem needs more than bash one-liners.

LinuxBashPython
06

AI & Accelerated Compute

Optimizing GPU workloads and exploring AI-native tooling for platform operations and cost control.

GPU cost optimizationAWS BedrockAgentCoreAI-assisted automation

Contact Me

v1.0.1