Big Step 2: Going Multi-Site
Successful web products don’t just grow, they grow explosively. If people love something about them, they’ll tell everyone they know about them, and they tell their friends, and before you know it, a product in stealth mode is getting used everywhere from Akron, Ohio to Harare, Zimbabwe. It’s around this time that just being on a couple of servers in a rack somewhere isn’t enough. It’s time for the next Big Step in the evolution of a web site’s scale. Today, I’m covering the “why” of Big Step 2, going multi-site.
Big Step 2 is the moment where a web application outgrows the bounds of being hosted in a single physical location. The warning signs that this moment is approaching include the moment that you realize how much money you lost because your colocation provider had an outage, or when you glance at the Google Analytics report for your site and realize that half your users are half a world away from your servers.
But wait, some are saying, lots of great internet companies got really big without ever going into multiple locations. Even AOL was just in one dinky little data center for half its life. Can’t we just grow bigger in one place, and maybe write a check to some data center insurance scheme like SunGard.
The answer is hard maybe.
Multi-site may not be necessary for everyone. It can be expensive, it can force wrenching decisions, and it can force architectural changes that may not be worth a little performance degradation or risk of catastrophe.
Think about how hard it would be to go multi-site, consider the risk versus benefit, and decide for yourself whether you want to go down the route. I think it makes sense for most major sites with a broad geographic focus to consider it for performance reasons, and if your users’ data is important at all to your business, you better have a backup plan that goes that voodoo doll you keep on your bedside table.
Here are the main reasons why I like a bunch of smaller sites over one big site with a disaster site somewhere:
1) cheaper hosting costs
Okay, this one is pretty counterintuitive. Everyone talks about economies of scale, so why should smaller be cheaper? It comes down to demand. The industry is booming again, and large spaces in attractive facilities aren’t easy to come by anymore. If you’re looking for 100,000+ square feet of data center space, you’ll probably end up building something new, perhaps in the middle of no-where.
On the other hand, getting a cage large enough for a few dozen racks is still pretty practical. By spreading out your purchase, you reduce your buying power, but by using the same provider in a bunch of locations, you can probably get most of it back — especially if you’re also buying your transit from them, too.
2) less redundant gear
When you build one big site, you then need someplace to go when the next big hurricane/earthquake/flood/fire/power outage/alien invasion/godzilla attack hits. You’ll probably want to put this other place far enough away that the effects of the bad thing that knocked out your primary site don’t affect your emergency site.
Next, you may just want to put some gear in there. You can economize, and buy replacements when the disaster hits, but then you’ll be down a few days. That may be okay, but if you’re an online business, chances are it isn’t. So now you’re stuck buying a clone of all your normal gear. Plus, you need to make sure every time you buy a widget for site A, you get one for site B. And, to make things even more fun, you’ll need to make sure all that user data from site A end up at site B. That could be pretty tricky or expensive.
You’ve just doubled your operating costs, with very little to show for it aside from a warm feeling in case godzilla attacks your data center. On the other hand, let’s say you have six locations, with just enough extra gear so that if one of the six is attacked by aliens, the rest can handle the extra traffic. We’re talking 2N versus N+1 redundancy. You’ve probably cut your costs significantly by doing this.
3) better user performance
The more locations you have, the more likely that one of them will be close to your end user from a network perspective. You’ll need some sort of GSLB that does localization to make this work correctly, but it can be a big plus, especially if your application does something chatty. You might also be able to take advantage of better costs for network on a local basis (e.g., domestic vs international). In addition to putting fewer miles between you and the end user, you’re also probably putting fewer moving parts. This probably helps with reliability as well, but it certainly provides fewer choke points between you and the web, which will also likely have some performance benefits.
So, with all of this, you might be convinced that it makes sense to go with a distributed operating environment of smaller sites, glued together across geographies using cool networking technologies. But wait — I’ve just talked about the good parts, there are some things that get a lot harder in this environment as well. I’ll cover that next time, in part 2 of Big Step 2.