A productive post-mortem can’t change the past, but it can almost always alter the future.
As the Allies moved to liberate France in 1944, the American forces became bogged down battling through the infamous Norman hedgerows. The thousands of square miles of dense, high foliage created maze-like barriers which hampered vision, mobility and progress. What’s more, the 15-foot hedgerows were so strongly intertwined that they seemed impossible to break through with traditional methods. For the officers involved, conducting a post-mortem on previous operations — and being open to suggestions from all sources — was critical.
In the end, Sergeant Curtis G. Culin listened to a soldier who suggested putting “saw teeth” on an armored vehicle to cut through the enormous hedges. While others laughed, Culin promptly welded scrap metal to the front of a Sherman tank to create a four-pronged plow device, and then easily drove it through one of the previously impenetrable hedgerows. The result was so impressive that an order was placed to build as many “Rhino tanks” as possible. The maze was broken open, and the rest is history.
In the business world, a reflective post-mortem can break through obstacles and redirect team performance toward far more fruitful projects going forward. To ensure meaningful improvements, it is imperative to gather the team, assess what went right or wrong on both a macro and micro level, and adapt the tactics accordingly. What’s more, listening to the suggestions of everyone involved — a key factor in the scrum model — can lead to effective and innovative solutions that resonate far in the future.
We sat down with four San Francisco tech companies to discuss what they do to make their post-mortems more productive, and how they find solutions to break through those maze-like barriers that held up the previous project. We found out that by sharing ownership, accepting suggestions and conducting important preliminary work before the meeting, a post-mortem can unlock major success going forward.
At Nurx, a platform for accessible healthcare, Engineering Manager Spencer Spezzano knows that stakeholder alignment and cross-team planning before and during a post-mortem can help teams hit their delivery dates with more success and less churn. Furthermore, by democratizing the prioritization of action items, the collaborative environment creates more teamwork and better problem-solving than before.
What does a typical post-mortem look like for your team, and how do you structure those meetings to ensure you're making the most of that time?
Nurx conducts hour-long post-mortems after every major project, whether that’s launching a new service line or adding a major new feature. These meetings are attended by all stakeholders to ensure alignment on the learnings and outcomes.
But rather than waiting until the end of a project to start evaluating what worked and didn’t, we provide space to collect feedback in real-time from the start of a project. We share a Trello board with three sections: “working well,” “can be improved” and “kudos.” Each section orients the individual into the right mental space to evaluate things as they go: “Working well” documents things people think are helping our process, “can be improved” collects areas of opportunity, and “kudos” provides a space to call out wins for our teammates. Managers keep an eye on the Trello board to catch issues that should be addressed immediately, rather than at the post-mortem.
During the post-mortem we go through each section and individuals read out their feedback, and each team member votes on which items in the “can be improved” section should be top priorities. With these items identified the team works together to come up with action plans to improve our process.
What’s one of the most valuable revelations or lessons that has come from a project post-mortem, and how has that helped your team grow?
“Weeks of coding can save you hours of planning,” meaning that being intentional and strategic upfront saves time and prevents inefficiency and frustration. One of the biggest determinants of a successful project is hitting the delivery date, but with large, multi-department projects that can be a challenge. With a new service line we launched in the past, we dove in as a company before thoroughly investigating what was needed from each team, what the dependencies were, and how long everything would realistically take. As a result, we had to push back the launch date.
For subsequent service line launches, we’ve worked to align all stakeholder departments around a high-integrity commit date. We make sure everyone is aware of executables tracked on a project plan spreadsheet, which allows us to quickly identify dependencies and holds everyone accountable. We make dedicated time upfront for all teams to do thorough investigations and work through unknowns to de-risk the delivery date.
This method of upfront cross-team planning was born out of a post-mortem and it has helped transform our process and so we can consistently hit delivery dates with less churn.
It is important that we foster a collaborative environment where people’s voices are heard and recognized.”
What’s one thing you’ve done to improve your post-mortems over time? What were the results?
A key goal of post-mortems is iterating on our process, but since there is only so much time we need to ID iterations that have the biggest impact. In the past, we would spend the post-mortems looking at each area of opportunity on the Trello board to figure out an action plan, but this led to overly long meetings, a lack of focus and ultimately limited improvements on process iteration.
As an alternative, we started a process where we vote on which “can be improved” items to prioritize, and each team member is given a set number of votes. With the introduction of voting, we’ve been able to focus the team and our efforts on high-impact areas. The voting drives buy-in, as everyone gets aligned that while we might have 20 things to improve, we voted to prioritize these top five. We’ve seen our process improve faster which has led to decreased churn and more confidence in hitting our dates.
Another benefit of voting is democratizing the prioritization effort, which helps us move to a bottom-up approach versus a top-down one. It is important that we foster a collaborative environment where people’s voices are heard and recognized.
At Wish, a mobile e-commerce platform, Senior Engineering Manager Andrew Potapov believes that the most important aspects of a successful post-mortem are the completion of preliminary work and finding the right audience to unlock the best and most innovative solutions. When these findings are documented and then shared in a published internal report, the results can be far ranging.
What does a typical post-mortem look like for your team, and how do you structure those meetings to ensure you're making the most of that time?
The most important thing to ensure the post-mortem meeting runs smoothly is to have a post-mortem document prepared and reviewed by the team prior to the meeting. Sometimes, if folks haven’t had a chance to review the document, we will take the first five minutes of the meeting to read the document and make sure everyone is on the same page. The document should describe the timeline of the incident, the impact, the steps taken to resolve the issue as well as the safeguards to be added to prevent similar issues in the future.
The second most important thing is to select the right audience for the meeting and keep the discussion focused on the incident and the resolution. We usually try to keep the attendance limited to key stakeholders from engineering, product and operations. During the meeting, it's extremely important to not get into finger-pointing, but to use this as a learning opportunity for us as an organization. This is especially important with more junior team members, who, if they haven’t been through a production incident previously, can be pretty hard on themselves. We strive for blameless post-mortems and really focus on improving our processes as a team and company.
We strive for blameless post-mortems and really focus on improving our processes as a team and company.”
What’s one of the most valuable revelations or lessons that has come from a project post-mortem, and how has that helped your team grow?
The product our team works on is still early in its lifecycle. While our main product has solid deployment processes and pipelines, the product our team is working on is still developing a lot of its processes and automations.
We discovered an issue recently where we packaged and deployed our staging and production mobile app differently. This led to unexpected problems that were not detectable during the testing stage. For example, the way translation strings were being packaged was different in the two environments. At one point, this caused the production app to crash for our international users. What’s worse, it crashed in such a way as to not be detectable by our monitoring system. The only way we found out about the issue was through negative app reviews. You never want your users telling you your app is broken.
This issue taught us a number of things. First, we needed to have consistency in how we package and release our applications across our environments. Second, we needed much better monitoring and alerting so we can be proactive about fixing bugs. Third, we decided to start prompting our users for app reviews. This caused our app rating to go from 3.7 to 4.8.
What’s one thing you’ve done to improve your post-mortems over time? What were the results?
We’ve done a few things to improve our processes around the post-mortems. First, we created a consistent post-mortem document format to make sure all the aspects of the incident, resolution, learnings and follow-up steps were covered. Second, we started to ensure that we create tickets for each follow-up item and prioritize them appropriately in the upcoming sprints. Third, we started to include some of the more junior members on the team in the discussion for training purposes. Finally, we started publishing our findings to the broader engineering team to hopefully help other teams avoid some of the pitfalls.
Win Raguini is director of engineering at MasterClass, a streaming platform for high-profile instruction and classes. He’s learned that if post-mortems are split into specific categories and the data collection is done ahead of time, both team focus and overall velocity will be increased. Moreover, Raguini found that there is an important holistic need for self-care, particularly in the remote environment, that affects both the success of the project and the long-term potential of the post-mortem.
What does a typical post-mortem look like for your team, and how do you structure those meetings to ensure you're making the most of that time?
We have two types of post-mortems: five-whys meetings after emergency incidents and retrospective meetings that happen after every sprint. Five-why’s meetings are targeted meetings to understand the root problem of a particular incident, usually resulting in a process improvement or more research. To make this meeting as effective as possible, it’s important we stick to the five-whys structure as closely as possible when appropriate with the understanding that not all situations will necessarily take this exact form and adjustments should be made to the structure as needed. Retrospectives are made efficient by talking about the most high-priority items first and then recording action items to be sent out to the team immediately to support accountability.
Retrospectives are made efficient by talking about the most high-priority items first and then recording action items...”
What’s one of the most valuable revelations or lessons that has come from a project post-mortem, and how has that helped your team grow?
The biggest revelation over the past three months have come from realizing that interviews, ad-hoc meetings and task randomizations take up more time than expected, which has affected team velocity. Since then, the team has made more of an effort to record these events as part of the sprint to better understand the frequency of these events and better plan the sprint.
Quite honestly, the most valuable thing we identified was the need for self-care during COVID-19. Learning that everyone was struggling with sleep and perma-WFH, getting sunlight and exercise was good. We made a point of asking folks to figure out how to do good stuff for their future self on Friday’s, then followed up on Monday's to see how it went.
What’s one thing you’ve done to improve your post-mortems over time? What were the results?
On one team, we’ve had a different person lead the post-mortem, which has led to a greater feeling of ownership and accountability.
James Nimlos is the senior software engineer at Webflow, a platform for building websites without coding. He learned that by using a philosophy of shared ownership over critical actions and inactions, the team became more cohesive, responsible and adept at finding solutions.
What does a typical post-mortem look like for your team, and how do you structure those meetings to ensure you're making the most of that time?
Post-mortems have a very simple structure on my team: one person writes the question we need answered, the motivation for the question and sometimes a more concrete description of what they expect to be done with the answer. We have a weekly meeting where we discuss multiple questions with all ranges of impact on how to improve outcomes in the future and how we work together as a team. We value this time because it explicitly maps out workflows and how we should react to situations. It also provides time to share knowledge as well as align assumptions about our work.
We agree that this meeting is important and helps us move confidently when we are working individually. We ensure value in this time by requiring an outcome on each point, which is usually either documentation or code to write so that we don’t repeat history. Each meeting, someone volunteers to facilitate and is responsible for keeping us on topic and ideally within the allotted time. Finally, as an all-remote team, we give ourselves some time to chat at the beginning of each meeting since we don't always have group conversations like this due to proximity.
What’s one of the most valuable revelations or lessons that has come from a project post-mortem, and how has that helped your team grow?
The best revelations come when we recognize a more systemic root cause than our initial assessment. Recently, we solved an issue for a new team member by writing a checklist for common debugging steps with users. However, we expanded our session to examining the meta of how we support users and what we could do better. We immediately started brainstorming around our support workflow and where problems stem from — this led to a three-part strategy and the long-term effect is a better experience for users and less time chasing down bugs for us! However, the largest lesson was remembering to search for more systemic solutions to problems because they have larger residual benefits compared to over-scoped fixes.
The best revelations come when we recognize a more systemic root cause than our initial assessment.”
What’s one thing you’ve done to improve your post-mortems over time? What were the results?
I’ve learned to make explicit statements clarifying what it means to have a “blameless” analysis to ensure we all start with the same assumptions. Blameless doesn’t mean we pretend there weren’t individuals behind the actions, but instead we acknowledge we all share fault for not preventing the incident. Instead of “blameless” we all hold blame because doing nothing is a choice as well.
Choosing to not review code that ended up being the cause of a major incident is also an action, but because there isn’t a log of that decision, it’s frequently overlooked. I’m not saying everyone should review everything, that’s not scalable, but having shared ownership of code means you all share the blame when problems arise. Ultimately it’s not about where blame lies, but what you can do about it. Acknowledging that everyone has some responsibility means you can quickly move beyond reactionary arguments and begin analyzing the problem and what different solutions you can implement to prevent it next time.