Up, In The Clouds
True operation mavens know that downtime is inevitable. It’s going to happen, despite your best efforts. A blip, a stumble, some cable will get cut. Increasing the “nines” carries quite the price tag, and may not be the best way to maximize ROI. The plans for disaster recovery needs to be balanced, so that focus isn’t solely on the prevention of catastrophes. Equally important, is the rapid recovery for business continuance. Because that is the true goal of uptime — to serve pages, apps and data, to provide for the customers, and continue the revenue stream. This is no longer an insurmountable task, given the resources and knowledge at hand.
Utility computing is out of the proving grounds. This is the era of clouds. An instance is like a physical server, except it’s not. Elastic or other cloud-based storage is like a disk, except it’s not. Virtual server is like hardware, except it’s not. Instances are ephemeral. Storage is now a network. Virtualization implies multi-tenancy. Understand what the technology represents and assemble the building blocks accordingly. Amazon is not the only game in town. And if it’s too late and you’ve already gone all-in with EC2, have a firm grasp what Availability Zones really mean, and integrate that into the design. No one would rely on a single pipe for bandwidth, or a single power/UPS grid for electricity, so don’t place all the eggs in one US-East-Virginia basket. It’s time to look past yester-year’s data centers. These are the new tools to create some fantastic, resilient and modern infrastructure.
Combine that foundation with automated deployments, it is now rather trivial to stand up another stack at a moment’s notice, anywhere. Grab the latest stable code, push out the approved content for delivery, restore from the current database backups, apply the verified configuration files, and start turning up some [fresh] servers. Execute the smoke tests, re-point DNS and just like snapping your fingers, the sites are all back up and running. Suddenly, the unplanned downtime is about as long as a scheduled maintenance window. There is no need to blame the tripped cords. This is not a dream. Make it the reality already.