How to Conduct a Constructive Project Postmortem

PagerDuty discusses and ins and outs of making a retrospective worthwhile.

Written by Erik Fassnacht
Published on Sep. 14, 2021
Brand Studio Logo

“Do you have the bandwidth to take this on?”

“I’ll circle back.” 

“Let’s take this offline.” 

These are just a handful of the modern workplace phrases that make employees flinch. For some, however, the project postmortem takes the cake even more than “per our last discussion.” 

Why? Well, some think of post-mortem meetings as a space for pointing fingers and ruminating on everything that went wrong with a certain project. And then there’s the Latin name itself, which means, somewhat disturbingly, “after death.” 

According to Scientific American, there are three types of bias that get in the way of productive postmortems: fundamental attribution error, overconfidence bias and acceptance bias. What do these three have in common? A focus on the individual over the group, and a resistance to critical analysis regarding things that go well.

The key, it seems, is finding a new way to structure the meeting and a new way to focus critical analysis skills so that blame takes a backseat to constructive progress. To learn more, we sat down with PagerDuty Staff Site Reliability Engineer Rich Lafferty, who shared his ideas on the most productive way to conduct a postmortem — and to view the past without “circling back.”

 

Image of Rich Lafferty
Rich Lafferty
Staff Site Reliability Engineer • PagerDuty

 

What are the first steps teams can take to create a blameless post-mortem process?

The first step is setting a North Star: Here is where we are, here is the destination we want, and here are the principles we’re going to use to get there. Changing how we respond to incidents is culture change, not just process change, and all of the usual caveats about the challenges of culture change apply.

One challenge with getting past blame is that there are a lot of “common-sense” ideas about how to react to an incident: even if it’s not blaming an individual, people default to adding controls, processes, approvals and policies, but it turns out that adding process and approvals isn’t correlated with operational success at all. The thing that’s best at preventing incidents is the creativity, insight and expertise of the operators of the system! 

So maybe the first first step is to improve our understanding of how incidents happen. Sidney Dekker, John Allspaw, Richard Cook and others have researched and written on this at length, for software and especially for critical industries, and often our common-sense ideas are completely wrong.

 

Why is it important to focus on incidents and accidents rather than individual team members when going through a blameless postmortem?

The people and the software form a system. Humans make mistakes, and a well-designed system needs to take that into account just as much as it needs to take into account that a hard drive might fail or a network link might go down. Nobody would consider a distributed system that can’t withstand a network outage to be well-designed, nor should they consider a system that can’t account for human factors.

So if your goal is to make a resilient system — one that can adapt to not only the failures you thought about but the ones that never occurred to you — then you need to build the resilience of both the technical and the human factors elements of the system. Part of that is about learning of the experiences of the participants in the incident, and part of that is about learning of the behavior of the software. So I wouldn’t say you focus on incidents instead of team members; learning from an incident means focusing on both the human and technical parts of the system. Of course that doesn’t mean blaming — blaming a participant is just as ineffective as blaming the software. But it’s critical to learn from the entire socio-technical system involved in the incident.

 

Blaming a participant is just as ineffective as blaming the software.

 

How has the blameless post-mortem process enhanced your project development and execution?

It’s really a separate process from project execution. A project is a thing with a beginning, middle, end and an expected outcome. But in software, there’s a step after the project, which we can call operations or the full-service lifecycle — the idea that you now have a thing which you have to operate until it is no longer useful. Blameless postmortems fit into that operations lifecycle or service lifecycle, and less so the project lifecycle. 

I will say that postmortems provide a feedback loop for subsequent projects, in that when your post-mortem process is focused on learning, you end up learning things that you want to do, and you learn what practices work and don’t, and then that feeds back into your roadmap, and to the way that you might build your system in the next project and so on. 

But more than that, it’s one element of building a pervasive learning culture, so that the organization is always improving and refining its systems and processes, and that transcends the execution of any particular project.

 

PagerDuty is an American cloud computing company specializing in a SaaS incident response platform for IT departments.