Root Cause Analysis of IT Service Interruptions

I used to think about the day when I fixed everything so we would stop IT outages. Of course that is silly. Like other healthcare organizations we are adding applications to the portfolio every year as new solutions address previously under automated areas. Most of these are not core parts of the IT architecture, but they are supplemental such as documentation systems for clinical departments (e.g., rehab) and contract modeling systems.

With the increase in the number of applications in the portfolio comes complexity. In addition our infrastructure is becoming much more complicated including a more sophisticated network; changing virtualization technologies; and complex storage.

So, our IT Operations philosophy is to perform a Root Cause Analysis on every critical service interruption. Our Root Cause Analysis asks three things:

How can we prevent this type of outage in the future?
How can we detect this type of outage in the future?
How can we respond to this type of outage more quickly?

The second two questions are important. Even if the cause of the service interruption is s simple fix, sooner or later stuff is going to hit the fan. We want our IT folks to see when it does and already be communicating to our customers how we are fixing the problem before they call us.

One thought on “Root Cause Analysis of IT Service Interruptions”

Another thing to consider in Root Cause Analysis is ‘what data should I collect if this happens again – time permitting – that would help us develop designs and processes to avoid this happening again and detect this kind of outage quickly and with adequate diagnostic information.

The urge in systems with chronic problems is to do the equivalent a PC reboot. Although that may work, taking another five minutes to grab the right logs and dump files may help you and the vendors involved figure out what is really going on. A restart cures many ills, and that’s part of the problem with expendiency.

Candid CIO

This is the Blog of Will Weider. This is the place where I share what I have learned through my mistakes and other crazy things in the life of a healthcare CIO.

Root Cause Analysis of IT Service Interruptions

One thought on “Root Cause Analysis of IT Service Interruptions”

Leave a comment Cancel reply

Share this:

Related

One thought on “Root Cause Analysis of IT Service Interruptions”

Leave a comment Cancel reply