Google’s Outage Is A Lesson In Excellent IT Operations

September 13, 2011 at 4:14 pm 4 comments

Google had an outage this week. Google Docs, which I use at home, was down for about an hour. They wrote a post about the outage on their blog.

I think this post is a great lesson in effective IT Operations. Our IT organization is working on improving all of these areas, but we have more work to do to get to Google’s level of kung fu:

  • Effective, transparent communications: They are very transparent about the incident. They want their customers to know that such events are unacceptable; they take it seriously; and they are taking measures to improve service.
  • Change Management: They understand that these problems are almost always caused by unsuccessful changes. By looking for the failed change, their troubleshooting is very quick. They resolved the problem by rolling back the change that caused the problem within 30 minutes.
  • Monitoring: Their monitoring tools uncovered the problem within 30 minutes.
  • Downtime Status: They talk about the Apps dashboard which is a tool for customers to see the status of their services.
  • Root Cause Analysis: They quickly completed a Root Cause Analysis and are quickly moving to implement process based changes to minimize the likelihood of a repeat occurrence.

The fact of the matter is: outages happen. The most successful IT organizations don’t kid themselves about eliminating outages through redundancy or other means. They use the means above to minimize the customer impact.

Entry filed under: IT Operations. Tags: .

Meaningful Use Rant 3: ICD-9 Coding of the Problem List Bring Your Own Device in Healthcare?

4 Comments Add your own

  • 1. Daniel Bobke (@DanielBobke)  |  September 13, 2011 at 4:29 pm

    The excellent point of your piece was that customer impact is the important metric. Percentage uptime or downtime hours are less relevant (unless they are excessive) than the impact of those numbers on your customers.

    Reply
  • 2. sallyhealthcaretech  |  September 29, 2011 at 4:44 pm

    Google is great at letting their customers know what’s going on when there’s a problem with their software. I think they set the standard for consistent customer-business interaction when it comes to communicating and resolving outages.

    Reply
  • 3. Frank Zappo  |  September 30, 2011 at 8:31 am

    I agree wholeheartedly with this article. I think a lot of people working in IT (myself included) need to assess Google’s ability to not only deal with such occurrences, but also reassure their user-base on what happened.
    If we were all to take a page out of Google’s book, then the healthcare business could become far more effective. We also need to implement people in fields where they can do such things, though. We can’t just pretend that the head of a department is going to come out and know what he’s talking about or how to fix it.

    Reply
  • 4. Adapt, Adopt & Invent CIO Blog  |  November 21, 2011 at 6:34 pm

    How to convince superior IT management that transparent communication and change management help deal with such a problem? I found a lot of IT Managers in Cambodia did not understand that. They usually hide the outage detail and have no idea how important change management is even their subordinate point out that it is important for business.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


About Me

This is the Blog of Will Weider, CIO of Ministry Health Care. Ministry Operates 15 hospitals, 47 clinics, a health plan and home care and hospice services. We employ more than 12,000 staff members. Our combined medical groups include more than 650 providers.

This is the place where I share what I have learned through my mistakes and other crazy things in the life of a healthcare CIO.

Follow CandidCIO on Twitter

  • RT @PostCrescent: VIDEO: Hailstone shaped like a sheep found in Appleton. The resemblance is amazing! Take a look! post.cr/19NJklE 20 hours ago
  • If you are in healthcare you should be familiar with the Center for Healthcare Quality and Payment Reform: chqpr.org 21 hours ago
  • If "iOS in the car" meets my expectations, that becomes the primary criterion in the selection of my next new car. 1 day ago
  • RT @amy_weider: Sitting in front of McDonalds to hog the free wifi because we are in Austria and they don't provide free water 4 days ago
  • RT @ValaAfshar: Stop correcting people as a hobby. Smart people don't go out of their way to show how smart they are. 4 days ago

Feeds


Follow

Get every new post delivered to your Inbox.

Join 4,987 other followers

%d bloggers like this: