Any computing system can experience outages – even the mighty Amazon Web Services (AWS) cloud, which powers our Cloud installation option. Though unfortunate, outages to our cloud system are rare and exceptionally brief, and the lessons we've learned from each one continue to make clear that the system is far more robust and resilient than the Local Data option. Put simply, a cloud outage pales in comparison with an outage in Local Data. More information
What to do in the event of an outage
If an outage occurs, refresh this page. We will provide information and instructions on the status of future outages here, as well as any necessary instructions for how to deal with the outage.
In general, if you have a Cloud installation, you can easily switch to Local Data during an outage and continue working. Although you won't be able to access new data or content, you'll be able to work with existing Land F/X projects and use the plants, Reference Notes, irrigation equipment, and other data and content that have already been added to those projects. How to switch temporarily to Local Data
After the outage has been resolved, you can then easily switch back to Cloud Data.
Synopsis of past outages
Each time an outage occurs, our team works tirelessly in conjunction with Amazon (when applicable) to resolve the issue. Further, each outage serves as a litmus test that informs our continual efforts to improve the cloud system and help prevent or minimize future outages. Here's what we've learned and implemented from the few outages that have occurred since we implemented Cloud Data in early 2016:
Time to Fix
Lessons Learned/Steps Taken
One of our servers reached capacity, which triggered our failover protection system. The next two servers were consequently overloaded as the result of an overly pessimistic configuration of our failover system.
We have increased the acceptable load of a server before it will be taken out of rotation. In addition, we are implementing additional reserve servers to spool up automatically when any servers hit capacity.
The landfx.com domain registration was locked due to inadvertently not responding to a contact confirmation request from our registrar, Network Solutions. At the time, Network Solutions contracted with ztomy.com to capitalize off of these incidents, and moved landfx.com to a parking page owned by ztomy.com. The outage was about 8 hours (we first started hearing of the issue around 2 a.m.) and then took even longer as ztomy.com's DNS servers were propagating extremely slowly.
We moved our domain registration away from Network Solutions. We opened a complaint with ICAAN regarding ztomy.com, and also threatened to go to the media with a complaint against Endurance International Group, the owner of Public Domain Registry, which shields ztomy.com from direct litigation. The result is that Network Solutions has stopped using ztomy.com for much of its domain parking services. Further, ztomy.com is now completely unreachable as a website. In addition, we found a workaround to reconnect manually to a working server by 9 a.m. on the day of the outage, and had a new step-by-step video of the solution made and sent out to users by 9:30 a.m.
While attempting to configure a test server to investigate methods of making the Cloud Data replication faster, we mistakenly included some configuration information for the primary Cloud Data pool, which caused a replication issue that brought down the entire system.
We implemented safeguards into our test server configuration process designed to prevent this type of outage. Soon after the incident, we configured the test server correctly, which has allowed us to find some tweaks to make Cloud Data even more responsive.
Cloud Data moved to AWS on this day, coinciding with the first outage. Only Local Data users were affected, as the switchover mistakenly also rerouted Local Data license validation calls. As those only occur upon first launching the software for the day, the outage lasted several hours, until we were staffed in the morning and were able to correct it.
As the outage resulted from our switchover to AWS, the lesson learned was to verify that any future switchovers will not affect Local Data users – an unlikely scenario, as we have no plans to move away from AWS.
Why you don't need to worry
While any outage, no matter how brief, can result in the loss of time and money, it's worth stating how much more resilient our Cloud Data servers are than local database servers. A hardware or software issue with your local server might result in downtime lasting several days, as well as data loss. Meanwhile, cloud outages typically last at most a few hours, with no data loss at all.
Further, our team is exceptional in our responses to cloud outages. From the initial panic whistles and calls to action to the fluid division of researching the issue, informing our users, and devising a fix, we're a well-oiled machine. So although we completely understand the frustration that comes with outages, you can count on this team to do everything in our power to keep you up and running.