2014/05/31

Go-Daddy: Low TTL DNS Resolution Failures

Some of you may have noticed recently that  www.nonfunctionalarchitect.com was not resolving correctly much of the time. At first I thought this was down to DNS replication taking a while though that shouldn't really explain inconsistent results from the same DNS servers (once picked up they should stick assuming I don't change the target (which I hadn't)).

So eventually I called Go-Daddy support who weren't much help and kept stating that "it works for us" suggesting it was my problem. This despite confirmation from friends and colleagues that they see the same issue from a number of different ISPs. They also didn't want to take the logs I'd captured demonstrating the problem or give me a reference number - a far cry from the recorded message in the queue promising to "exceed my expectations"! But hey, they're cheap...

Anyway... I'd set the TTL (Time To Live) on my DNS records to 600 seconds. This is something I've done since working on migration projects where you'd want DNS TTL to be short to minimise the time clients point at the old server (note: you need to make the change at least x1 the legacy TTL value before you start the migration... and not every DNS server obeys your TTL... but it's still worth doing).  This isn't an insane value normally but really depends on whether your nameservers can handle the increased load. I asked the support guy if this was ok and he stated that it was and all looked fine with my DNS records... Cool, except that my problem still existed!

I had to try something so setup a simple shell script on a couple of servers to perform a lookup (nslookup) on Google.com, www.nonfunctionalarchitect.com and pop.nonfunctionalarchitect.com, and set the TTL to 1 day on www and 600secs on pop. This should hopefully prove that (a) DNS resolution is working (Google.com resolves), (b) that I am suffering a problem on www and pop and; with a bit of luck, (c) demonstrate if increasing the TTL makes any difference.

The result shows no DNS resolution fails for either Google.com or www.nonfunctionalarchitect.com. On the other hand pop fails around 10% of the time (12 failures from 129 requests). Here's a few of the results:
pop.nonfunctionalarchitect.com	canonical name = pop.secureserver.net.
pop.nonfunctionalarchitect.com canonical name = pop.secureserver.net.
** server can't find pop.nonfunctionalarchitect.com: NXDOMAIN
pop.nonfunctionalarchitect.com canonical name = pop.secureserver.net.
pop.nonfunctionalarchitect.com canonical name = pop.secureserver.net.
pop.nonfunctionalarchitect.com canonical name = pop.secureserver.net.
** server can't find pop.nonfunctionalarchitect.com: NXDOMAIN
** server can't find pop.nonfunctionalarchitect.com: NXDOMAIN
pop.nonfunctionalarchitect.com canonical name = pop.secureserver.net.
pop.nonfunctionalarchitect.com canonical name = pop.secureserver.net.

I can think of a number of reasons why this may be happening; including load on the Go-Daddy nameservers, overly aggressive DoS counter-measures, or misaligned configuration in nameserver/CNAME configuration etc. Configuration seems ok but I do wonder about the nameservers 1hr TTL versus the CNAMEs 600s TTL. For now it seems more stable at least and I'll do some experimentation with TTL values later to see if I can pin this down.

In the meantime, if you're getting DNS resolution failures with Go-Daddy and have relatively low TTL values set (<1hr) then consider increasing these to see if that helps.

No comments:

Post a Comment

Voyaging dwarves riding phantom eagles

It's been said before... the only two difficult things in computing are naming things and cache invalidation... or naming things and som...