Cloud_Collapse_David_DeWolf_1200x675

Cloud Collapse? Don’t throw the baby out with the bath water…

1024 576 David DeWolf

Today’s EC2 outage took the IT sector and news media by storm. Many people have taken the opportunity to bash the concept of the public cloud and others outsourcing in general. It’s interesting though, some of Amazon’s most visible EC2 customers didn’t jump on the bandwagon, mimicking Quora’s sentiments:

“We’d point fingers, but we wouldn’t be where we are today without EC2.”

Read further and it appears to me that many of the negative comments came from infrastructure-oriented companies who are being impacted by the emergence of the elastic cloud.

Let’s get real, people. No matter how hard we try, 100% reliability and perfect scalability will continue to be approached but never achieved. Face it, despite today’s struggles, Amazon is probably better positioned to manage your IT infrastructure than you are.
No, I’m not claiming that the elastic cloud is right for everyone. There are no golden hammers. But please, let’s not throw the baby out with the bath water. No matter what your infrastructure choice, make sure to plan for failure. If you choose not to, be prepared to live with reality. Planning for failure is good business, not a strategy specific to cloud infrastructures.
What is your cloud failure strategy? What would you not be able to do without services like Amazon’s EC2? Where does it make sense to rely on internal infrastructure or a private cloud instead of the public cloud?
Update, 04/22/2011:

Shortly after this post, I received an email from one of Three Pillar‘s thought leaders, Mike, who has significant experience with the cloud. He pointed out another interesting aspect of this story:

What was not reported here was all the critical sites that stayed up because they properly engineered their application and operational deployment to take advantage of availability zones. High availability applications should distribute their load across multiple availability zones just in case there is a failure in one zone.

The cloud does not magically remove the need for good operations engineering.

Great point, Mike.