Rick Blaisdell, CTO, reflects on the Amazon Cloud outage and how it relates to ConnectEDU’s Cloud preparation.
April 29, 2011 - As the CTO, it seems like I get more attention when things go wrong than when they are going well. Simply performing is always the baseline requirement. Most of the people I interact with know I drink the Cloud Cool-Aid, and when the Amazon outages were posted, people came running from all over asking me if I was nervous that ConnectEDU is also on a Cloud.
There are actually many Cloud providers in the market. We use NaviSite as our provider and are running on the Cisco Unified Compute System (UCS). NaviSite is an enterprise provider, so they not only helped build our Cloud, but they also manage and monitor it. This means we have security experts, maintenance experts and a 24/7 monitoring facility to ensure the system is secure, scalable and reliable.
As a standard, virtualization platforms (Clouds) have built-in failover mechanisms, so when a blade (computer) fails, the Clouds that were running on those blades are automatically moved to another blade. These types of failures happen more often than most people would expect, and when they do, whatever virtualization platform that is used will take care of the issue automatically. Depending on how the system is setup, the user base may never experience an outage. This is how it’s supposed to work. However, in extenuating circumstances, if a company hasn’t put the right number of backups in place, a major failover can cause the system to go into a panic (yes, that’s the technical term). This is when things get really ugly.
So, what can a technology department do to prepare for such a disaster? If the companies that were affected by the Amazon outage had an active live site failover at another location, they would not have experienced a loss of service. This is not inexpensive and everyone should weigh the risks and costs of how much and what type of redundancy they require to provide the uptime expected. The Amazon incident should remind us of what can go wrong in a physical or virtualized environment, and luckily ConnectEDU has made the appropriate precautions to avoid extreme downtimes.
- Rick Blaisdell
Chief Technology Officer
April 29, 2011 - As the CTO, it seems like I get more attention when things go wrong than when they are going well. Simply performing is always the baseline requirement. Most of the people I interact with know I drink the Cloud Cool-Aid, and when the Amazon outages were posted, people came running from all over asking me if I was nervous that ConnectEDU is also on a Cloud.
There are actually many Cloud providers in the market. We use NaviSite as our provider and are running on the Cisco Unified Compute System (UCS). NaviSite is an enterprise provider, so they not only helped build our Cloud, but they also manage and monitor it. This means we have security experts, maintenance experts and a 24/7 monitoring facility to ensure the system is secure, scalable and reliable.
As a standard, virtualization platforms (Clouds) have built-in failover mechanisms, so when a blade (computer) fails, the Clouds that were running on those blades are automatically moved to another blade. These types of failures happen more often than most people would expect, and when they do, whatever virtualization platform that is used will take care of the issue automatically. Depending on how the system is setup, the user base may never experience an outage. This is how it’s supposed to work. However, in extenuating circumstances, if a company hasn’t put the right number of backups in place, a major failover can cause the system to go into a panic (yes, that’s the technical term). This is when things get really ugly.
So, what can a technology department do to prepare for such a disaster? If the companies that were affected by the Amazon outage had an active live site failover at another location, they would not have experienced a loss of service. This is not inexpensive and everyone should weigh the risks and costs of how much and what type of redundancy they require to provide the uptime expected. The Amazon incident should remind us of what can go wrong in a physical or virtualized environment, and luckily ConnectEDU has made the appropriate precautions to avoid extreme downtimes.
- Rick Blaisdell
Chief Technology Officer