Archive for January, 2012
In January I wrote about the importance of using Root Cause Analysis at Ministry Health Care as a way to learn from our mistakes. This process is so important to us that we have an employee (Fred) that oversees Root Cause Analysis and facilitates the meetings. Those meetings are generally calm meetings that take place after the IT service interruption is addressed. That is not the case when we are in actual firefighting mode.
We have learned a couple of things about fighting fires, that is, addressing customer impacting service interuptions. We have learned that best way to respond to service interruptions is counter-intuitive and kind of complicated. So, we have done what we usually do when we want to improve something. We created written guidance on how to respond to IT Service Interruptions and we are constantly improving that written guidance.
The primary way we address an IT Service interruption is through the use of a Critical Response Team. The Critical Response Team has two primary goals:
- Cure the service interruption as quickly and completely
- Communicate to our impacted customers in a timely manner that satisfies the information they desire
Prior to developing our Critical Response Team methodology we seemed to fall into the trap that we should not bother the technical resources so they can fix the problem as quickly as possible. This is a huge mistake. Even if the duration of a critical application outage is extended by a great deal of time, it is critical to communicate the relevant facts about the outage to the customer. Time and time again we see that when we handle the communication well, the customers empathize with out plight and thank us for our efforts. If we go dark, we receive a lot of criticism, even if the efforts to resolve the problem were heroic. In essence, we buy ourselves time when we are good communicators.
When we form a Critical Response Team the meetings have three primary agenda items:
- Define the problem.
- Develop an action plan, with clearly defined assignments, to research the problem or resolve it.
- Develop the communications including the message and the audience.
By nature people want to get off the call after number 2 and assume someone else will handle the communication. But we find that the communication must be written during that call while the technical experts are still on the call. This is the only way we get it right and it reinforces the importance of communications.
There are some keys to communicating with customers regarding outages:
- Communication coming from a named individual is critical in how the customer perceives the authenticity of the message. Critical Response Team messages should come from a person, not a generic mailbox.
- Tell the customers that addressing the interruption is our top priority and our team is dropping everything.
- Tell the customers that we know that this is impacting their ability to be efficient and effective and that we feel their pain.
- Tell them everything we know about the effects of the problem on them. Avoid the technical details, write the message from their perspective.
- Let them know that we are sharing everything we know, but things may change as we learn more.
- Provide an estimate about the duration of the outage. IT generally doesn’t like to do this because they think they will be held responsible for estimates given with incomplete information. But the customers need this because this will determine if they go to downtime procedures, if they should arrange overtime or if they should plan to bring in additional staff.
Let me know if you would like a copy of our Critical Response Team approach. As with everything, it is a work in progress. Just like our Root Cause Analysis changes the way we operate in IT, we perform Root Cause Analysis on our response to service interruptions and improve our Critical Response Team approach.
This bit of brilliance comes from Ministry’s Northwoods region (yes, we have a Northwoods region – how cool is that?). The supervisor of our desktop support team has three simple goals for every project his team works on:
- Happy Customers
- A bored Project Manager
- A tech released to work on IT Operations because no hardware is breaking and everything was executed to plan
I wish I would have come up with that. Simple, memorable, powerful.
I used to think about the day when I fixed everything so we would stop IT outages. Of course that is silly. Like other healthcare organizations we are adding applications to the portfolio every year as new solutions address previously under automated areas. Most of these are not core parts of the IT architecture, but they are supplemental such as documentation systems for clinical departments (e.g., rehab) and contract modeling systems.
With the increase in the number of applications in the portfolio comes complexity. In addition our infrastructure is becoming much more complicated including a more sophisticated network; changing virtualization technologies; and complex storage.
So, our IT Operations philosophy is to perform a Root Cause Analysis on every critical service interruption. Our Root Cause Analysis asks three things:
- How can we prevent this type of outage in the future?
- How can we detect this type of outage in the future?
- How can we respond to this type of outage more quickly?
The second two questions are important. Even if the cause of the service interruption is s simple fix, sooner or later stuff is going to hit the fan. We want our IT folks to see when it does and already be communicating to our customers how we are fixing the problem before they call us.
Dr. Michael Koriwchak writing for the Wired EMR Practice blog:
“And our EMR use, our quality of patient care and our practice efficiency is for the most part no better. In some ways it is worse. As a result of MU”
I can see how that can happen. It is important that we hear the skeptical and the inspiring. The post is worth the read and the author’s candor is important.