Wednesday, September 29, 2010
82% of Enterprise Outages Caused by Power, Hardware or Telecom Service Failure
While 82 percent of the 200 businesses completing the survey felt confident that their IT resources could sustain disruptions and support operations effectively, 97 percent admitted network disruptions had detrimental effects on their businesses in the last year.
Also, about 1800 smaller businesses reported network disruption of four hours or more within the last year. CDW estimates that such network outages cost U.S. businesses $1.7 billion in lost profits last year.
"The survey confirms that while many businesses believe they are prepared for an unplanned network disruption, many are not – and yet the three most common causes of IT outages are addressable," said Norm Lillis, CDW vice president, system solutions. Power loss ranked as the top cause of business disruptions over the past year, with one third of businesses reporting it prompted their most recent disruption. Hardware failures caused 29 percent of network outages, followed by a loss of telecom services to facilities (21 percent). "
The survey also revealed that businesses need to take advanced preparation more seriously and support employees more effectively with network accessibility.
While 53 percent of respondents said employees are instructed or given the option to work from home when a foreseeable network disruption approaches (a weather event, for example), only a third of businesses activate standby communications and network systems to support increased remote access when warned of such an event.
In fact, while respondents reported that, on average, 44 percent of the workforce normally has telework options, they said that only 39 percent of employees could telework during their most recent network outage.
link to full study
Sunday, November 1, 2009
The Way to Deal with an Outage
Junction Networks, for example, had an unexpected outage Oct. 26, 2009 for about an hour and a half, and the company's apology and explanation is a good example of what to do when the inevitable outage does occur. First, apologize.
"We do sincerely apologize for this service interruption. We know that you have many choices for your phone service, and we deeply appreciate your patience and understanding during yesterday's interruption of service. Below are the full details of the service issue."
Then remind users where they can get information if an outage ever occurs again.
"One of the first things we do when a service issue occurs is update our Network Alert Blog and Twitter page with as much information as we have at that time. We then post comments to that original post as we learn more. Our Network Alert blog is here: http://www.junctionnetworks.com/blog/category/network-alerts"
"Our Twitter account is: http://www.twitter.com/onsip."
Junction Networks then provides a detailed description of its normal maintenance activities, which can cause "planned outages" with an intentional shift to backup systems.
"As a rule, Junction Networks maintains three different types of maintenance windows:
1.) Weekend - early morning: The maintenance performed will produce a service disruption and could affect multiple systems.
2.) Weekday - early morning: The maintenance performed may produce a service disruption, but is isolated to a single system.
3.) Intra-day: The work performed should not affect our customers.
All maintenance, even that which is known to cause a service disruption, is not expected to cause a disruption for more than a few fractions of a second. For anything that would cause a more serious disruption (one second or more), backup services are swapped in to take the place of the maintenance system."
The company then explains why the specific Oct. 26 outage happened, in some detail, and then the remedies it applied.
Nobody likes outages, but they are a fact of life. If you think about it, there is a very simple reason. Consider today's electronic devices, designed to work with only minutes to hours to several days worth of "outages" each year. If you've ever had to reboot a device, that's an outage. If you've ever had software "hang," requiring a reboot, that's an outage.
Now imagine the number of normally reliable devices that have to be connected in series to complete any point-to-point communications link. That's the number of applications running, on the servers, switches, routers and gateways, on the active opto-electronics in all networks that must be connected for any single point-to-point session to occur.
Don't forget the power supplies, power grid, air conditioners and potential accidents that can take a session out. If a backhaul cuts an optical line, you get an outage. If a car knocks down a telephone pole, you can get an outage.
Now remember your mathematics. Any number less than "one," when multiplied by any other number less than "one," necessarily results in a number that is smaller than the original quantity. In other words, as one concatenates many devices, each individually quite reliable, the reliability or availability of the whole system gets worse.
A single device with 99-percent reliability is expected to fail 3 days, 15 hours and 40 minutes every year. But that's just one device. If any session has 50 possible devices in series, each with that same 99-percent reliability, the system as a whole is reliable only as the multiplied availabilities of each discrete device.
In other words, you have to multiple a number less than "one" by 49 other numbers, each less than "one," to determine overall system reliability.
As an example, consider a system of just 12 devices, each 99.99 percent reliable, and expected to fail about 52 minutes, 36 seconds each year. The whole network would then be expected to fail about 10.5 hours each year.
Networks with less reliability than 99.99 percent or with more discrete elements will fail for longer periods of time.
The point is that outages can be minimized, but not prevented entirely. Knowing that, one might as well have a process in place for the times when service is disrupted.
Thursday, October 15, 2009
T-Mobile USA Sidekick Data Nearly Fully Recovered
"We plan to begin restoring users’ personal data as soon as possible, starting with personal contacts, after we have validated the data and our restoration plan," Ho says. "We will then continue to work around the clock to restore data to all affected users, including calendar, notes, tasks, photographs and high scores, as quickly as possible."
"We now believe that data loss affected a minority of Sidekick users," Ho added. Despite that good news, two class action lawsuits have been filed against T-Mobile USA, alleging that the company misled consumers into believing that their data was more secure than was the case.
Tuesday, February 12, 2008
Slow Email? BlackBerry Outage
RIM says no messages were lost during the incident, which caused intermittent delivery delays. No explanation for the outage has been given.
Outages of this sort are the reason many of us are giving more thought to backup and redundancy strategies. On a recent business trip, for the first time in my life, I accidentally left my laptop at home, and was going to be gone for 14 days. True, I had the BlackBerry and another mobile as well.
But in my line of work access to the Web is arguably more important than either of those two sorts of devices, as important as they are. Because of Google Documents & Spreadsheets and Google Broswer Sync, I was able to keep working using public terminals and loaned machines, with access to Microsoft Office.
I also learned to live without access to Outlook for a bit. The BlackBerry helped, of course. The lasting change so far is that I have kept using Google Documents more than I have in the past. That's why sampling is so important. Behavior can change.
Monday, February 4, 2008
Another Cable Cut in Persian Gulf
What are the odds four undersea cables are cut in a single week? Whatever those odds, it has happened. First two cables snap off Egypt. Then a separate cable in the Persian Gulf, and now yet another Middle East cable.
In the latest incident, an undersea telecoms cable linking Qatar to the United Arab Emirates was damaged, disrupting services, telecommunications provider Qtel has reported.
The cable was damaged between the Qatari island of Haloul and the UAE island of Das. The cause of the damage is not yet known.
Qtel's loss of capacity seems to be disrupting voice capacity more than Internet services. Qtel says it was operating at 40 percent over the weekend because alternative cables exist. Nevertheless, disruption to Internet and telephone services in the Gulf state is likely to continue for 10 another days or so.
Not since the December 2006 earthquake off Taiwan have so many cables been taken out of service almost at once.Saturday, February 2, 2008
Cable Cuts Highlight Opportunity
Several recent undersea cable cuts that interfered with Internet connections in India and the Middle East might ultimately focus attention on other ways to get call center and business process outsourcing handled. By some reports 20 to 25 percent of outsourced call centers initially were unable to do any work at all while many had only 50 percent of capacity once restoration work began and traffic was rerouted.
At some point, at least some providers and some customers will conclude that if the price is equivalent, it makes more sense to base call centers and other business process outsourcing operations on shore. The issue is how to operate them more efficiently.
Perhaps there is a role here for IP voice interconnections. Though other costs are less malleable, it ought to be possible to create highly-distributed call overflow mechanisms using "voice over private network" IP connections in ways that allow economical call center operations in lots of rural areas that are more protected from cable cuts.
Friday, February 1, 2008
FLAG Telecom Loses Undersea Cable
As a reminder of how important undersea cable redundancy is, FLAG Telecom has lost a cable of its own in Persian Gulf. FLAG, a wholly-owned subsidiary of India's number two mobile operator Reliance Communications, says its Falcon cable was reported cut at 0559 GMT, 56 kms (35 miles) from Dubai on a segment between the United Arab Emirates and Oman.
Thursday, January 31, 2008
at&t Wireless Outage
In case you are having trouble sending and receiving email on your at&t Wireless smart phone, or are unable to get connected using your data card, there is a wireless network outage affecting at&t Wireless users in the Midwest and Southeast.
Taiwan Earthquake Just a Year Ago
Those cable cuts took out much voice and Internet communications in many parts of Asia, as well as 60 percent of capacity between Asia and the United States.
The 2006 Hengchun earthquake occurred on December 26, 2006 at 12:25 UTC (20:25 local time), with an epicenter off the southwest coast of Taiwan, approximately 22.8 km west southwest of Hengchun, Pingtung County, Taiwan, with an exact hypocenter 21.9 km deep in the Luzon Strait ( [show location on an interactive map] 21.89° N 120.56° E), which connects the South China Sea with the Philippine Sea.
Cable Cuts Not That Rare
In the winter of 2000, Telstra, Australia's biggest Internet service provider had a cable cut of its own on Nov. 19, when its Internet backbone cable, sitting in less than 100 feet of seawater about 40 miles off Singapore, was damaged by unknown causes.
Telstra at that time relied on the cable, known as SEA-ME-WE 3 (for Southeast Asia, Middle East and Western Europe) for more than 60 percent of its Internet transmission capacity.
About 23,600 miles long, the cable connected 33 countries, touching places as diverse as Singapore, Malaysia, Thailand, India, Saudi Arabia, Egypt, Djibouti, Turkey, Greece, Italy, Portugal, France and the U.K.
Cable Cut Disrupts India Call Centers
It could take a week or two to fix the cables, in part because of bad weather, some executives say.
Users in India, Egypt, Qatar, Saudi Arabia, the United Arab Emirates, Kuwait and Bahrain are affected by the outages.
Observers think an anchor might have snagged the cables. At least that's what Flag Telecom Group Ltd. now believes. The incident took place 8.3 kilometers (5.2 miles) from Alexandria beach in northern Egypt.
Emirates Integrated Telecommunications Co., the United Arab Emirates' second-biggest mobile-phone company, is working with the cable operators, Flag Telecom and SEA-ME-WE 4, to find out why the cables were cut and to determine when service can be restored.
The outage is a reminder that physical infrastructure, however mundane, underlies all of modern computing and communications. It's also a reminder that if your business or life depends on Internet-based communications, commerce and content, you need a diversity strategy. It costs more money. But so does inability to do your work.
Monday, December 3, 2007
at&t Internet Outage in former BellSouth Areas
Outage reports are posted from Georgia, Florida, Louisiana, South Carolina and Mississippi.
Tuesday, October 9, 2007
T-Mobile Goes Down
It wasn't your imagination: if you use T-Mobile data services, you had no connectivity for as much as four hours on Tuesday. Personally, I thought it was the coverage inside the convention center I am working inside of. Nope. There was an outage. I thought it was the BlackBerry server at one point. But no.
The latest outage just illustrates an important element of digital life: you really can't trust any service or application to remain "always available." Everything is going to crash, or be unusable, for some amount of time. So one either gets used to the idea of periodic outages, or if that isn't satisfactory, you are going to have to back up all your mission critical services, devices, data or applications. Personally, I don't worry too much about application diversity, though most of us have some of that. I do make sure broadband and mobile access, as well as computing devices, are redundant.
Sunday, September 9, 2007
Another Outage for BlackBerry
U.S. Internet-based users of the Research in Motion BlackBerry service might have noticed, and might still be noticing odd behavior from their handhelds. Like, no mail parts of Friday, and then huge dumps of what you thought was archived mail thereafter. If so, it might be because RIM had another outage of some significance last Friday, Sept. 7. That's two significant outages this year.
All of us may someday lament the fact that no service we now enjoy and rely upon has the ruggedness and uptime of the old public switched network.
Wednesday, August 22, 2007
GrandCentral Number Porting Affects 434
GrandCentral has had a few number porting issues of its own. CEO Craig Walker says GrandCentral had issues with 434 customers whose numbers could not be seamlessly transitioned from one underlying supplier to another.
What happened is that a supplier of numbers and connections "sent us a notice that they’d be exiting certain markets and disconnecting some phone numbers in 30 days," says Walker. GrandCentral immediately began porting the numbers to a larger carrier partner. But 434 couldn't transparently be moved.
Those users had to be assigned new telephone numbers in the same area codes they already were using. Going forward, GrandCentral is emphasizing working with large, reliable providers committed to providing these services long term.
"Although this affected only 15 of the local areas where we offer services, out of nearly 8,000, we take this matter seriously and have done everything to make the disruptions as limited as possible," says Walker.
That is the way to handle an unplanned outage.
Wells Fargo Outage Yesterday
U.K. VoIP Provider Also Has Outage
U.K. VoIP provider VoIP.co.uk had an outage of its own last Monday. Users could call other VoIP.co.uk users, but were unable to place or receive calls from users on the public telephone network. Service was out for the better part of a day.
Monday, August 20, 2007
Skype: The Ultimate Windows Externality
"On Thursday, 16th August 2007, the Skype peer-to-peer network became unstable and suffered a critical disruption triggered by a massive restart of our users’ Windows-based computers across the globe within a very short time frame as they re-booted after receiving a routine set of patches through Windows Update," Skype says.
Not everybody buys that explanation. But, if true, it has to rank as the most massive, unexpected software interaction Windows ever has inadvertently caused.
The high number of restarts apparently caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources, prompted a chain reaction, Skype says. Some have argued that the outage proves peer-to-peer networks are inherently unstable.
It's hard to test that assertion since Skype uses a modified P2P architecture with a sign-in process that is more "client-server" and centralized than most other P2P networks.
Some think there was some sort of hacker attack, but Skype denies it. "We can confirm categorically that no malicious activities were attributed."
If the Microsoft routine updates were, in fact, contributory or causal, it would rank as the most significant network-wide interaction anybody ever has seen. Just another example of the way applications are reshaping the way global networks perform.
As some of you know I have recently been dealing with interactions caused by a Vista upgrade, mostly of the "we don't talk to Vista" sort. I will say one thing, however. Vista seems to be much more robust than XP was about handling "hibernation" operations. XP used to become unstable after several hiberation operations, at least on my machines. I have not found that to be the case with Vista.
Friday, August 17, 2007
Skype Outage Not Over
The service had been sporadic but gradually improving during the business day in Asia on Friday, some report.
"There are about 2.5 million people logged in right now, where normally there would be over 8 million, and it's been going on and off every 10 minutes," says Mark Main, senior analyst at Ovum in London.
You may draw your own conclusions about which other application or service providers might benefit, but urges to gloat should generally be suppressed. Nobody whose service uses IP and the public networks is safe from outages or service disruptions.
That's why businesses and networks have redundancy. People who scream and yell about losing their service have only themselves to blame if they didn't build some level of diversity and redundancy even into their personal communications. Use Skype, other IM applications, mobiles, POTS-replacement VoIP, and POTS, email and anything else you can get your hands on. Some of us use multiple mobiles from different providers and multiple broadband providers. But never hang everything on any one service or provider, especially if your business depends on it. Personally, I wouldn't even hang my personal communications on a "single provider" strategy.
Thursday, August 16, 2007
And Cisco Goes Down, Also...
Directv-Dish Merger Fails
Directv’’s termination of its deal to merge with EchoStar, apparently because EchoStar bondholders did not approve, means EchoStar continue...
-
We have all repeatedly seen comparisons of equity value of hyperscale app providers compared to the value of connectivity providers, which s...
-
It really is surprising how often a Pareto distribution--the “80/20 rule--appears in business life, or in life, generally. Basically, the...
-
One recurring issue with forecasts of multi-access edge computing is that it is easier to make predictions about cost than revenue and infra...