At 5:00 pm on Wed. February 7th, 2001 COVAD (a major high speed network provider to high speed network providers and ISP's) disconnected all circuits to "DSL Networks" (our high speed network provider) for not paying their bills. I5NET was current on all of our service bills and neither DSL Networks or COVAD warned us about the interruption in service. In fact, when we first inquired about the problem Wed. evening we were told that the phone company had a circuit problem that would be repaired by 10:00 am Thursday morning. Thursday afternoon we were finally told by DSL Networks that they were discontinuing their services and we would need to find an alternate high speed provider. An article in a Friday edition of a Bay Area newspaper was quoted as saying that the spontaneous disconnection of DSL Networks left thousands of ISP's and hundreds of thousands of end users without service (paraphrased).
It was worse than that. The spontaneous disconnection of services left our servers without the ability to obtain or deliver Domain Name Services (DNS). Without DNS, services like e-mail and web page hosting doesn't work. It was the same for thousands of ISP's.
I-5 Network Solutions Response:
Wed. Evening about 6:30 we started to receive calls that people couldn't get connected to the internet on some of the dial-up networks. Those that could establish a connection could not receive e-mail. We traced the problem to a line outage and were told by our high speed service provider that the outage was the result of a major telephone company switching failure. We were also told that the failure would be corrected by 10:00 am the following day. Our Dial-up network provider initially had authentication services on a server which was also served by DSL Networks (and therefore was disconnected at the same time). They moved Wed. evening to relocate the authentication services to a server connected through another provider, and dial-up access was restored. Web hosting and e-mail services remained down. We considered our options at this point and realized the quickest solution was to wait for services to be re-established.
- Explanation of why line outage causes this problem: User e-mail accounts and web hosting services are located on our servers. If the lines connecting our servers to the internet are disconnected, you can't connect to your e-mail accounts or even view your web pages. A major outage like this leaves hundreds of servers on the wrong side of the break. If you were on the same side of the break as our servers, you would have been able to only access our servers and not the rest of the internet. Although, in this failure all end users were able to connect to the rest of the internet and not our servers.
By 1200 Thursday, our services had not been restored and we became more concerned. We spent much of the afternoon trying to obtain more information about the failure and finally were notified that DSL Networks was "going out of business" and that we would have to obtain replacement services on our own.
We immediately started making calls to find out how to obtain replacement services. We needed to move the servers to a different high speed line connection. The major problem of this crisis then became apparent. DNS services for our servers are maintained at the IP addresses of our servers and a redundant location with the co-location facility in Rohnert Park Ca. In the past this level of redundancy had been enough. We've had short outages of line services before, but we had been able to maintain services through simply moving our systems to a temporary line connection and re-identifying our servers with the redundant DNS server at the co-location facility. However, this time, the DNS server at the co-location facility was also disconnected. This meant that if we moved our servers to a new port we would have to re-identify our new IP address with Network Solutions to obtain DNS referrals again.
What? In english this time!! O.K. but it's not easy to describe. When you type in an address like "www.i5net.net" or retrieve mail using a mail server at an address like "mail.i5net.net" or send mail to an address like "email@example.com" your browser or e-mail program initially makes what is called a "name service request". You see those addresses we are all familiar with, like I listed above, are really just aliases for what is called an IP address. For Example: You might say, "I left it at Joe's house". But in order for anyone to really find what you're looking for they would have to know the address of Joe's house and what neighborhood to find it in. That's what a name service request is. A company like "Network Solutions" returns to your browser something like "220.127.116.11". That probably looks familiar, you've seen numbers like that floating around before, but never knew what they were. Domain Name Service (DNS) is like Joe's parents. Once you find your way to Joe's house, you've got to ask his parents where Joe is, or where the object is your looking for. Now, you where told the object is at Joe's house, and Network Solutions told you the address of Joe's house. When you try to get to Joe's house, you find the road to Joe's house has been torn up an is inaccessible. If Joe wants to be accessible to the rest of the world he will either need to re-build the road or move to a new address. If he moves he will need to tell Network Solutions their new address if your ever going to find them. That's the best explanation of DNS and IP's I can think of.
Before this crisis, I5NET servers were located at "18.104.22.168", now we've moved them to "22.214.171.124". Soon we will be making another change as our new high speed line is installed.
By late Thursday night it became apparent that the crises could not be solved from our Sacramento Facility, and we would have to go down to Rohnert Park and try to make the changes from there. I arrived at the co-location facility at about 2:30 am Friday morning. However, I was not able to gain access to the facility until 9:30am the next morning. I spent the rest of the night, making whatever phone calls I could and preparing for the next day. I had very little information on what the co-location facility had accomplished as they had a very busy day also trying to obtain replacement services.
By this time, it was apparent that the quickest solution would be to try to find replacement services that allowed me to keep the IP address we had on our servers before the incident. This would mean we wouldn't have to make any DNS changes to our servers or with Network Solutions. Our fall back plan would be to move the servers to a new high speed line and make changes with Network Solutions.
Upon gaining access to the co-location facility, the news seemed to get worse. The morning paper identified the crisis and how many people were affected. This meant the competition for obtaining replacement services would be extreme.
We contacted COVAD (the high speed provider at the top of the food chain), and they were aware of the process and indicated that they had developed a plan to obtain replacement services for the disconnected customers. They also indicated that we would be able to keep our IP, and the process of switching should take less than a day. This was very good news, and so we pursued this solution. Dealing with COVAD was extremely difficult, and while their tech. support staff were knowledgeable and helpful, their migration support team (the supposed process they had developed to migrate the services for the disconnected users) was useless and ignorant. We spent the better part of the full day of dealing with them, ultimately they just referred us to another high speed provider similar to DSL Networks. None of these types of providers were prepared for the emergency needs of this many High speed connection requests. They were all quoting 5 days minimum to obtain new services, and we wouldn't be allowed to keep our IP address.
The City had no intention of ever repairing Joe's Street. Joe's Family would be forced to move there house and notify Network Solutions of their new address, if they ever wanted to be accessible to the rest of the world.
We ordered new high speed lines which we are expecting to be up by Wednesday. That didn't solve the immediate problem of getting service for our customers. We decided to temporarily move our servers to our Sacramento Address where we could monitor them and access them over the weekend, while we try to get access restored. This changed their IP address which meant we needed to change our registration with Network Solutions in order for the internet to find us.
We do all our Name Services through Network Solutions. They have two ways to make IP changes, a quick way and slow (fax based hand entry) way. The fast convenient and easy to use forms based interface for making changes is automated and would only require about 12 hours to process, the long method was estimated to require 4-days with emergency priority status because of the number of recent requests. The problem with the fast process is that it requires that you confirm your request by e-mail, and only permit the confirmation to come from e-mail addresses they have already confirmed.
This was a huge problem for us as all of our e-mail addresses that we have listed with Network Solutions we hosted on our servers that weren't accessible.
Our co-location facility had gotten one of their requests with Network Solutions into the (slow) fax process early on Friday. Network Solutions had told them their address referrals would be changed by Friday afternoon. We monitored several of Network Solutions name server flushes, but services did not get restored.
We got the servers installed at our Sacramento facility by 9:00 pm. We new that we could assign a new IP from our Sacramento facility and connect the servers through our DSL line there. It wouldn't be a permanent solution as the line speeds wont support long term use. However, once they were connected a limited amount of access was restored. We could access the computers through the internet, but name service had not been restored. We posted an update web page at the new IP address of the servers and started informing our customers of how to access it. We spent much of Friday night and Sat morning re-programming our servers for the new IP information, resulting from the move.
Joe moved his house to a new location, and has four wheel drive ONLY access to it. But since the mail trucks wont deliver there, he still has no fast way to tell Network Solutions his new address so that people can find him.
We tried everything we could think of to get Network Solutions to change our address in a fast method, but couldn't get anything accomplished with them.
Finally, at about 4:00 pm Saturday, our co-location facility got their changes through Network Solutions. We had them make name service changes for us on one of their servers which we had listed with Network Solutions as a possible DNS location.
Joe usually kept tabs with some friends of the family. Network Solutions knew to check with them if they couldn't find Joe, because they usually knew where he was. When Joe's family friends came back home from their vacation, the internet had a way to find Joe. Once Joe told the friends of the family where he was, services started being restored to Joe's house. However, you could only get to Joe's neighborhood through the back road he had built to the friends of his family, and you couldn't get directly to Joe's house.
By 6:00 pm Saturday, we had restored all services except for those associated with the Domain Name "i5net.net". So we moved e-mail to "i5net.com" and provided information on how to access it to our users. Dial-up authentication services went down for a short time as our Dial-up network moved those services back to their servers.
Unfortunately, Network Solutions trusts their own information rather than the information they receive from a third party for addresses which are listed as a Name Server (NS1.i5net.net) which is why "i5net.net" itself remained down but other addresses like i5net.com where available.
Through some trickery on our part, we finally got our Network Solutions Name Service requests for the host i5net.net submitted late Saturday night. When their servers flushed at 2:00 pm Sunday afternoon, full services were restored to our servers. Whahoo!! We were finally VERY relieved and after some minor adjustments to our servers and phone calls, we all got a nap!
The City finished paving the road to Joe's new house Sunday afternoon, and Network Solutions started telling everyone how to find it.
We never thought we would be vulnerable to this kind of outage. Our experiences with other network providers, previously, had made us want to provide a better level of service, that lived up to a higher standard. We have taken huge efforts in the past to prevent these types of disruptions and pledge to continue to make the extra effort.
The climate for Internet service companies and high speed network providers is very volatile right now. 3-major providers have gone out of business in the past month. We can prevent the length of time our servers go down in the event of a disconnection like this again, by obtaining redundant services through multiple providers. It's expensive but we see no other solution. We are in the process of adding the servers and permanent connections to make this a permanent feature of our services.
I hope after reading this you have a better understanding of the efforts we put forward in resolving this situation. Between Wed. evening and Sunday afternoon, we worked around the clock exploring tricks, potential fixes and changes to restore services to our customers. Very little sleep was had. If you have any comments about how we handled this outage, and how we could have better kept you informed, please contact me at : firstname.lastname@example.org
We may have another short outage of e-mail and web hosting as we move some of our servers back to the permanent facility in Rohnert Park. We will try to perform these services late at night, and will post a message on our web pages at www.i5net.net warning of when this will occur.
Thank you for your understanding!
Thomas S. Plummer