CIOs Moving to the Cloud: The buck still stops with you
Amazon Web Services has been going through a much publicised outage, which has lasted by all appearances more than 12 hours. A range of services including Hootsuite, Reddit, Heroku, Foursquare, Quora and others have all faced major disruptions.
What is interesting is how they have positioned these outages: many have said EC2 is great, but they are having a bit of a problem at the moment. It appears these providers are taking the view that “whew, glad we outsourced our stuff so that it is clear that is not OUR fault that something like this has happened, and we can point to other vendors to prove the case that it wasn’t us – just imagine if we had have done this on our own servers and this happened, we would have been much more at fault!”
Just because systems are moved to the cloud doesn’t mitigate the responsibility to ensure mission critical outages are mitigated. If a business has a use-case that cannot tolerate down time then that business needs to architect their solution in a way that prevents downtime. Cost tradeoffs are always an issue, but if something goes wrong, and the cost of that problem is too high, then perhaps the service isn’t really feasible.
Imagine an airline providing a service where they cut costs on safety in order to offer a cheap service… Doesn’t bear thinking about. Imagine that the airline outsourced their safety inspections to a third party and then wiped their hands of responsibility in the event of a “downtime”. No-one would buy that.
The whole point about the cloud is that it enables you to free your thinking about one provider. Even if you stick with an Amazon only solution, or a Microsoft or Google or Salesforce or Rackspace or whatever solution, you still need to architect things in a way that allows you to accept the consequences of any flaws, no matter how they are caused.
After all, you are the service provider to your customer base – how you decide to deliver that is up to you.
A lot of people are learning a very hard lesson at the moment – there are good ways and bad ways of doing things. For some, a 12 hour outage is hardly a problem, but for others it can ruin lives.