Letter from Jeremiah Farmer, Land F/X CEO
re: Service Outage
As you may know, we had a brief outage of our cloud services on Tuesday, January 29.
The issue was due to our DNS registry, which meant that our website, cloud data and services, and even email system were affected. (DNS is the system by which a network of Internet servers translates the names of websites such as “landfx.com” and “aws.landfx.com” to their numeric IP addresses.)
We received the first reports a little after 2 a.m., and we began alerting our team members a little after 6 a.m. By 8 a.m. Pacific Time, the issue had been resolved – but being a DNS issue, it takes time to propagate. Then here is where events took an unusual turn. A nefarious website operator, ztomy.com, attempted to exploit this DNS change with something called a DNS injection attack, which unfortunately only added to the time required for the correct DNS changes to propagate.
I share these technical details with you in case you might be curious. These are learning events for us all. But as with a Denial of Service attack, protection against these types of incidents is extremely challenging.
Still, we have several lessons learned:
Our automated server health messages did not trigger, as they check each server by IP address. We are in the process of modifying these messages, which will save us critical hours in the event of a future DNS issue.
We are pursuing whatever remedies are possible against ztomy.com. Although it will perhaps be a drop in the bucket, I will also be writing my elected representatives to request action regarding Internet security, and I encourage you to do so as well.
As our cloud servers were up and functional, still replicating across the five worldwide sites, we alerted users of the ability to reference the server by IP address, thus circumventing the DNS resolution entirely. We will also be building this capability into a future release.
And of course, we have implemented several procedures in-house regarding the maintenance of our DNS registry.
Why you don't need to worry:
While any outage, no matter how brief, is still an obvious loss of time and money, it is worth stating how much more resilient our cloud data servers are than local database servers. A hardware or software issue with your local server might result in a downtime of days, and a potential loss of any and all data. Meanwhile, cloud outages like this one typically last at most a few hours, with no data loss at all.
We've got your back.
I’d also like state how proud I am of our team here. The panic whistles that went out in the early morning darkness, those who brought donuts in for everybody, the fluid division of researching the issue and devising a fix – it was a real sight to see. So I certainly hope that your takeaway is, besides an obvious frustration that these things can happen, that you really can rely on this team to do everything possible to keep you up and running.
And as always, my line is always open if you would like to talk with me further about this.