Over the past 18 months there have been an increasing number of high-profile outages amongst providers of cloud computing and Software-as-a-Service (SaaS) vendors. Application vendors Intuit and Workday both suffered extensive service disruptions, which impacted a broad segment of their user community. Infrastructure providers have suffered short outages over the past year as well. Even heavy weights amazon.com and Google each have experienced minor service interruptions demonstrating that no service providers are immune from data center incidents.
Below are a few of the high-profile outages I have learned about over the past year:
- Workday – A SaaS provider of human resources, finance and payroll services experienced corruption in a Network-Attached-Storage (NAS) device, which led to 15-hour outage back in September 2009.
- Google – Also in September, Gmail suffered a performance overloaded when several servers were taken offline for planned maintenance. Numerous corporate customers, which had switched off Microsoft Exchange or Lotus Notes were impacted.
- Salesforce.com – In December 2009, Amazon.com’s EC2 cloud computing service experienced a 5-hour service disruption. The issues were traced to the Amazon Web Services platform running in the company’s Northern Virginia data center.
- Intuit – In June 2010, Intuit suffered a 36-hour outage that impacted users of its QuickBooks Online, Quicken Online and TurboTax sites. One user I spoke to missed payroll because of the lengthy duration of the outage.
- Wikipedia – Experienced a power outage in its Tampa, FL data center on the 4th of July, which impacted site availability globally. Back in March another outage occurred due to overheating in a European data center
Service interruptions in the cloud have a much higher level of visibility than comparable outages suffered in corporate data centers. There are two reasons why.
First, by design public-cloud or SaaS applications service a wide variety of users at a number of different companies. As a result, when an outage occurs the pain is felt by a broad community of users. Unlike traditional enterprise IT users, this multi-company community is not limited in their freedom-of-speech by any internal guidelines established by the CIO. These users can air their frustrations publicly without any concerns of disclosing sensitive corporate information or having their loyalty questioned by superiors. To make matters worse for service providers, these users are able to build relationships and share information in ways never before possible. With today’s hyper-connected, social media-driven world, bad news can spread via word-of-mouse spread in minutes rather than days or weeks. The slightest performance degradation or availability concern will be raised in online forums, microblogs and discussion groups.
Second, buyers of cloud computing and SaaS applications have been conditioned to have higher expectations for service availability. With robust, massively scalable infrastructure platforms based upon blade servers, virtualization software and storage area networks, the cloud promises the ultimate levels of redundancy to its users. Consequently, when a service interruption does occur users are far more critical of providers than they might be with traditional enterprise IT.
What are the implications for product managers responsible for cloud computing and SaaS products? More thoughts in a future post.