The five abilities
Most of my philosophy is about building teams for product. Infrastructure sits a little outside that, so let me lean in on how I think about it.
When it comes to infrastructure, I build teams that align to three cornerstones: developer productivity, cost effectiveness, and what I call the five abilities. Together they keep platform engineering pragmatic and execution-focused, rather than a pursuit of technical ambition for its own sake.
Three cornerstones
Developer productivity
Faster iteration, less friction. Invest in tooling and automation that reduce cognitive load, make releases effortless, and remove the roadblocks that slow engineers down.
Cost effectiveness
Infrastructure should scale with business needs, not just technical ambition. Right-size cloud spend, avoid waste without sacrificing reliability, and make ROI-driven decisions.
The five abilities
The foundations that decide whether a platform can be trusted and can grow. Five of them, and they’re the rest of this piece.
The five abilities
- Availability, keeping systems up and meeting your SLAs.
- Reliability, resilience and graceful degradation when things go wrong.
- Maintainability, easy to operate, debug, and extend.
- Observability, deep, honest visibility into system health.
- Scalability, infrastructure that grows with demand.
Define success this way and engineering stops being a cost center. Productivity and cost are directly measurable, so infrastructure becomes a business function. And the principles hold whether you’re a startup or an enterprise.
Beyond Google’s SRE
People assume “SRE” means the Google book, but that book is just one take, an idealised, Google-scale, Google-specific model. My approach overlaps with it and then departs in a few ways that matter in the real world.
It’s built for constraints, not just scale. Google assumes you’re scaling to billions of users with near-infinite resources. Most organizations are juggling budget, staffing, and legacy systems instead, so I optimise for practical trade-offs, not ideals. Error budgets don’t solve everything when you’re short-staffed and the business has priorities.
It’s broader than reliability and availability. I treat maintainability, observability, and scalability as first-class concerns, not afterthoughts. It’s vendor-agnostic, AWS, hybrid clouds, whatever fits, rather than assuming one internal toolchain. And it’s experience-driven: I was running platforms before this discipline was called SRE, often in places that never had a dedicated SRE team and relied on engineers wearing several hats.
Most of all, it’s business-aligned. Google framed SRE as a technical discipline; I treat it as a business function, weighing cost, team bandwidth, and long-term sustainability in every call, managed services versus self-hosted, automating the toil that actually hurts rather than chasing a perfect automation model.
Google wrote the book. I’ve lived the trade-offs they theorised about, and that’s why my approach goes a little beyond their definitions.

