Amazon Web Services has been going through a much publicised outage, which has lasted by all appearances more than 12 hours. A range of services including Hootsuite, Reddit, Heroku, Foursquare, Quora and others have all faced major disruptions.
What is interesting is how they have positioned these outages: many have said EC2 is great, but they are having a bit of a problem at the moment. It appears these providers are taking the view that “whew, glad we outsourced our stuff so that it is clear that is not OUR fault that something like this has happened, and we can point to other vendors to prove the case that it wasn’t us – just imagine if we had have done this on our own servers and this happened, we would have been much more at fault!”
Just because systems are moved to the cloud doesn’t mitigate the responsibility to ensure mission critical outages are mitigated. If a business has a use-case that cannot tolerate down time then that business needs to architect their solution in a way that prevents downtime. Cost tradeoffs are always an issue, but if something goes wrong, and the cost of that problem is too high, then perhaps the service isn’t really feasible.
Imagine an airline providing a service where they cut costs on safety in order to offer a cheap service… Doesn’t bear thinking about. Imagine that the airline outsourced their safety inspections to a third party and then wiped their hands of responsibility in the event of a “downtime”. No-one would buy that.
The whole point about the cloud is that it enables you to free your thinking about one provider. Even if you stick with an Amazon only solution, or a Microsoft or Google or Salesforce or Rackspace or whatever solution, you still need to architect things in a way that allows you to accept the consequences of any flaws, no matter how they are caused.
After all, you are the service provider to your customer base – how you decide to deliver that is up to you.
A lot of people are learning a very hard lesson at the moment – there are good ways and bad ways of doing things. For some, a 12 hour outage is hardly a problem, but for others it can ruin lives.
In reading the book Made to Stick by Chip and Dan Heath I learned about an experiment done in 1990 by psychology PhD student Elizabeth Newton, who was able to demonstrate that knowledge can be a curse. In the experiment, some people are asked to tap out a well known song (something iconic like happy birthday) and have someone else predict what the song is merely from the rhythm of the taps. The results were very interesting: the tappers predicted a 50% success rate, but the listeners were only successful 2.5% of the time – that’s one in forty times. Of particular interest was the fact that the people with the song were convinced the listener must be stupid, or not trying hard because the song was so obvious to them. Of course, they had a frame of reference – the song was, after all, in their head.
I feel that where we are with the Cloud at the moment is a bit like that. Firstly the cloud service providers: Cloud providers know how cool their technology and various systems are, but they have difficulty in conveying what is possible to the business community in a way that the business community can really understand what it means to them beyond saving a few dollars. The business community ends up focussing on factors like Capex vs Opex, TCO, security, disaster recovery etc – factors that simply provide a framework for decision making around alternatives for doing exactly the same stuff they have always done. In this light the focus of the cloud is heavily on cost management and other secondary (albeit important) issues.
The Business Community, having been led to see the cloud as merely an alternative to hosting their hardware (or in some cases software) solutions, are trapped into a suckers choice really: do it in house or do it in the cloud (or in a bureau etc…). But what the business community really wants to know (or should want to know) is how do they do it differently. How do they profoundly change the experience their customers have, how do they provision their staff with information that can help them pre-emptively deal with problems. How do they make their business remarkable. This is where the cloud offers exciting new potential and the vendors and the customers are still not talking the same language.
Part of this is a lack of awareness of the problem, part of it is a failure to truly see “The Cloud” for what I believe it to be – THE CLOUD, not the Amazon cloud, the Google Cloud, the Microsoft Cloud, the Salesforce Cloud, the “My Private Cloud” cloud. This lack of unified thinking leads to limited thinking and stifles opportunities.
I was excited to read this week that the IEEE has announced their intention to develop a cloud interoperability standard. Hopefully this will get people thinking in a more unified way, but the building blocks are already there. Any cloud offering worth its salt provides some form of API that enables interoperability so there is no need to wait.
A very small example by way of illustration: after a recent rollout of a new system to tens of thousands of people, a couple of customers reported an issue they were experiencing. The system touched on a number of different cloud-based systems and by viewing these as part of one system, I was able to pre-emptively find 140 other people who had either experienced this problem or were going to experience it. A personalised message was then sent from the system to each of these and they were astonished that we were able to proactively deal with their problem before they had reported it. Most of them never even knew there had been a problem. Looking at this particular instance from any single vantage point and we would only have been able to deal with these people as they inevitably hit the issue. The patterns were only evident in taking a holistic and pattern-based view of their particulars.
The business community and the technology community need to see across the chasm that divides them – their own expertise makes them assume things they shouldn’t assume. When business people can learn what is really possible by synergising in the cloud, and when cloud providers learn just how significant a 10% reduction in debtors days or stock turns means for a business, or a 5% increase in customer referrals means, then and only then will we start seeing what the cloud can do to help us effect real change.