Fail Small, IaC Control Planes, and Automated RCA
Ship It Weekly - DevOps, SRE, Platform and Cloud Engineering... por Teller's Tech - DevOps, SRE and Cloud Podcast
Notas del episodio
This week on Ship It Weekly, Brian kicks off the new year with one theme: automation is getting faster, and that makes blast radius and oversight matter more than ever.
We start with Cloudflare’s “fail small” mindset. The core idea is simple: big outages usually come from correlated failure, not one box dying. If a bad change lands everywhere at once, you’re toast. “Fail small” is about forcing problems to stay local so you can stop the bleeding before it becomes global.
Next is Pulumi’s push to be the control plane for all your IaC, including Terraform and HCL. The interesting part isn’t syntax wars. It’s the workflow layer: approvals, policy enforcement, audit trails, drift, and how teams standardize without signing up for a multi-year rewrite.
Third is Meta’s DrP, a root cause analysis platform that turns rep ...