How NOT to Inform Your Customers of an Outage

Monday, December 8th, 2008

There are a number of different ways to inform your customers of an outage. I’ve previously discussed how 365main and Amazon Web Services did this fairly well in the past. Unfortunately, Limelight Networks customers are hearing about issues with their CDN via GigaOM.

(more…)

The Art of the Post-Mortem

Saturday, July 26th, 2008

I’ve mentioned in the past that the failure of complex systems is an inevitable fact of nature. The corresponding act of human inquisition into the reasons for that failure are equally inevitable. Where I work — and almost every other large installation I’ve seen or been part of — the learnings from these inquisitions are shared for educational reasons. The name for this differs from company to company: some call it a RFO (reason for outage) or an After-Action Report, but for whatever reasons the name for this at AOL is a Post-Mortem.

(more…)