In January I wrote about the importance of using Root Cause Analysis at Ministry Health Care as a way to learn from our mistakes. This process is so important to us that we have an employee (Fred) that oversees Root Cause Analysis and facilitates the meetings. Those meetings are generally calm meetings that take place after the IT service interruption is addressed. That is not the case when we are in actual firefighting mode.
We have learned a couple of things about fighting fires, that is, addressing customer impacting service interuptions. We have learned that best way to respond to service interruptions is counter-intuitive and kind of complicated. So, we have done what we usually do when we want to improve something. We created written guidance on how to respond to IT Service Interruptions and we are constantly improving that written guidance.
The primary way we address an IT Service interruption is through the use of a Critical Response Team. The Critical Response Team has two primary goals:
- Cure the service interruption as quickly and completely
- Communicate to our impacted customers in a timely manner that satisfies the information they desire
Prior to developing our Critical Response Team methodology we seemed to fall into the trap that we should not bother the technical resources so they can fix the problem as quickly as possible. This is a huge mistake. Even if the duration of a critical application outage is extended by a great deal of time, it is critical to communicate the relevant facts about the outage to the customer. Time and time again we see that when we handle the communication well, the customers empathize with out plight and thank us for our efforts. If we go dark, we receive a lot of criticism, even if the efforts to resolve the problem were heroic. In essence, we buy ourselves time when we are good communicators.
When we form a Critical Response Team the meetings have three primary agenda items:
- Define the problem.
- Develop an action plan, with clearly defined assignments, to research the problem or resolve it.
- Develop the communications including the message and the audience.
By nature people want to get off the call after number 2 and assume someone else will handle the communication. But we find that the communication must be written during that call while the technical experts are still on the call. This is the only way we get it right and it reinforces the importance of communications.
There are some keys to communicating with customers regarding outages:
- Communication coming from a named individual is critical in how the customer perceives the authenticity of the message. Critical Response Team messages should come from a person, not a generic mailbox.
- Tell the customers that addressing the interruption is our top priority and our team is dropping everything.
- Tell the customers that we know that this is impacting their ability to be efficient and effective and that we feel their pain.
- Tell them everything we know about the effects of the problem on them. Avoid the technical details, write the message from their perspective.
- Let them know that we are sharing everything we know, but things may change as we learn more.
- Provide an estimate about the duration of the outage. IT generally doesn’t like to do this because they think they will be held responsible for estimates given with incomplete information. But the customers need this because this will determine if they go to downtime procedures, if they should arrange overtime or if they should plan to bring in additional staff.
Let me know if you would like a copy of our Critical Response Team approach. As with everything, it is a work in progress. Just like our Root Cause Analysis changes the way we operate in IT, we perform Root Cause Analysis on our response to service interruptions and improve our Critical Response Team approach.