The NBA’s Steph Curry missed almost 55% of his three-point shots this season. Last year, baseball’s Dee Gordon didn’t get a hit two-thirds of the times he stepped up to the plate. SpaceX failed to land their Falcon 9 reusable booster rocket in 80% of their attempts.
That’s one way to look at it. Another way: Steph Curry just ended the greatest three-point shooting season in history, Dee Gordon was the National League’s best hitter, and SpaceX recently made history by landing a rocket on a freaking barge.
If you’ve read what I’ve written, you know I’m fascinated by failure. As Jeff Bezos says, “failure and invention are inseparable twins.” You can’t do something extraordinary without having been defeated a lot on the way.
That attitude starts small. When an engineer makes a mistake that leads to catastrophe, how does your organization respond? Do you point fingers and name and shame? Does your CEO look for a throat to choke? Or do you welcome open discussion, learn from it, get back up and try again?
I’ve always admired Google’s post mortem culture. The approach is simple: when something goes wrong, the team does an in-depth analysis with an emphasis on process over people. They write down everything they learned and share the document as widely as possible to make sure it never happens again. These post mortems are blameless because we assume everyone makes mistakes from time to time. Post mortems aren’t criminal investigations, they’re an affirmative process designed to make us all a little smarter:
A blamelessly written postmortem assumes that everyone involved in an incident had good intentions and did the right thing with the information they had. If a culture of finger pointing and shaming individuals or teams for doing the “wrong” thing prevails, people will not bring issues to light for fear of punishment.
That’s from the new O’Reilly book, Site Reliability Engineering: How Google Runs Production Systems, written by members of Google’s Site Reliability Engineering (SRE) team. The philosophy is shared by lots of other companies, most notably Etsy, whose CTO John Allspaw writes:
A funny thing happens when engineers make mistakes and feel safe when giving details about it: they are not only willing to be held accountable, they are also enthusiastic in helping the rest of the company avoid the same error in the future. They are, after all, the most expert in their own error. They ought to be heavily involved in coming up with remediation items.
Blameless post mortems are indispensable if you value a culture where people can openly discuss their mistakes and learn from them. Don’t create scapegoats, make “error experts.” Google even shared their post mortem template so you can get started the next time the $#*! hits the fan.
Blameless doesn’t just mean not blaming others, it also means not blaming yourself. Jessica Harllee, an Etsy employee, shares her experience with blameless post mortems. “The guilt prevented me from going back on my own and reviewing what actually happened because I didn’t really want to relive it.”
“If you want your team to be audacious, you have to make being audacious the path of least resistance.” Astro Teller shares some of the ways X—formerly Google[x]—encourages open conversation about failure. I especially like his idea for pre-mortems where teams “speak up about things that might someday kill a project or slow it down — and make it business-as-usual for managers to hear and respond to these things.”
Goodbye Coach. Silicon Valley lost a giant over the weekend when the legendary Bill Campbell died. I only met Bill once and exchanged email with him a couple times, but his impact can be felt everywhere. The most poignant remembrance comes from Ben Horowitz, who shows us that above all, it was Bill’s humanity that made him one of a kind. So head to The Old Pro, toss out a few f-bombs, and knock one back for the Coach.
Originally published: April 20, 2016