Reliable DNS Server Queues on the Cheap

The traditional solution to providing a high-availibility Internet service is to throw a bunch of (preferably geographically diverse) servers together and set up a round-robin DNS record that will somewhat balance the load between them. However, if your Internet service supports real-time collaboration between users, as in the case of a bunch of clustered worldwide Jabber servers, a round-robin DNS solution doesn’t work very well, because it dramatically increases your server-to-server traffic as users will end up logging into different servers.

Assuming that scalability isn’t too much of a problem and that your main goal is high availability, what you really want is a failover queue of servers for a single DNS record: first, try server1; if that fails, try server2; if server2 fails, try server3, and so on…

The traditional solution for this is either (a) run your own DNS server and use BIND 9’s obscure rrset-order option to define a round-robin order, or, if you’re uncomfortable with doing that, outsource your DNS to a managed DNS hosting service that supports failover monitoring, such as ZoneEdit or PowerDNS, which can be expensive.

However, Horms suggested a brilliant little solution: use one of the many freely available dynamic dns services. Write a simple script to poll your servers in order, and update the dynamic DNS entry’s IP address to point to whatever the first server you’ve detected as online. You can still point users to the normal hostname in your domain, by adding a CNAME record that points to the hostname provided by the dynamic DNS service.

We’ve used this technique in practice for several weeks and it’s worked extremely well: so well, in fact, that we cancelled the outsourced DNS failover service that we had before, because it didn’t update as quickly or work as reliably as it does now. Writing the monitoring script is the hardest part of this solution, and that’s pretty easy: if you’re really curious, email me and I’ll be happy to lend the monitor script that we use.

blog comments powered by Disqus