August 12th was 512k day – the day when the IPv4 BGP routing table entries reached 512k routes. This matters because there are routers that have a default limit of 512k IPv4 routes, and if these routers haven’t been modified to increase this limit they could crash or fail to load new routes. There are fixes for this problem and not all routers are affected, but even with advanced notice there were still outages and slowdowns when the number of routes passed 512k.
The thing to note about this outage isn’t just that known problems were not addressed, it’s even if your network was configured perfectly, and a remote resource vendor was providing a promised 99.999% uptime, there still could be a service outage from your perspective if there was a problem in the network between you and your provider. This is especially important when considering moving your resources to the cloud. While a cloud based EMR system may provide tremendous benefits at a cost effective rate, a doctor can’t do their job if there is a problem with a hospital’s internet connection and they can’t get test results or medical history.
The 512k limit wasn’t the first network disruption and it will not be the last. As soon as a clever way is developed to manage the internet, an even more clever way will be found to hack it. Basic infrastructure equipment that has been working for years without any issues can become a victim of specifications that are now obsolete. In short, there is no way to guarantee uptime for anything accessed over the internet. Accepting the possibility of downtime is the price you pay for the economies of Cloud computing.
If you can accept the possibility of downtime and Cloud computing is a fit for your company, one main criteria in selecting a Cloud provider should be its availability as seen from your site. Test to verify that you can consistently access the provider with minimal latency during your trial period, and archive the data to create your own availability SLA reports for each prospective vendor. Also collect data on network bandwidth for each provider as well, and extrapolate the data from your test environment up to an estimated usage in a full deployment.
To look at a provider’s performance over a longer term, services such as downdetector.com aggregate user reports of downtime and can provide an archive of previous issues. Downtime from the perspective of previous clients can also provide insight into how a Cloud vendor handles support issues after you’ve implemented their software. Detailed post-mortems of issues are welcome but not the norm. However support through Twitter and updates via Facebook are common and provide a history of previous issues.
Should network outages mean you have to rule out Cloud computing? The possibility of outages is definitely a factor to consider, but how significant those outages are to you will depend on your network, the Cloud provider, and how sensitive you are to losing services.