Friday, August 17, 2007

DHT Behind Skype Crash?


Not that anybody really claims to know, but there's some thinking that the Skype outage was caused by some failure of the Distributed Hash Tables that Skype Supernodes apparently maintain. Some say "this is normally very slow and done over UDP," so restoration, even once the problem is identified, will take some time. So even as the ability to send instant messages and set up voice sessions is restored, other niceties, such as correct "presence" information, might take a little longer. The immediate problem, some say, is that if a Skype client cannot find a Supernode (and I am not a techie, but understand a DHT corruption would have something very serious to do with that sort of failure mode, then even if a client is authenticated by a central server, the user would not be able to get onto the Skype network.

All I know is that this failure mode would explain why I can communicate using text, and send audio, but my presence shows as "offline," when I am "online." I will test a live conversation tomorrow morning and see what happens.

This is a crisis management professional's dream: when your client is getting lots of bad press, some other bigger event occurs to overshadow it. So Skype now is sucking all the oxygen out of the "I'm mad my VoIP doesn't work" room.

Skype Sorta, Kinda Up

Though my status shows "offline" to Rich Tehrani, my Skype client seems to be up, though sending incorrect status information. Not many contacts seem to be visible at the moment.

Skype Outage Not Over

Skype initially said its outage is over, but that clearly is not the case everywhere, and we are nearing 24 hours since the log-in problem began. Now Skype warns that the outage is likely to continue through Friday. My U.S. log-in still hangs.

The service had been sporadic but gradually improving during the business day in Asia on Friday, some report.

"There are about 2.5 million people logged in right now, where normally there would be over 8 million, and it's been going on and off every 10 minutes," says Mark Main, senior analyst at Ovum in London.

You may draw your own conclusions about which other application or service providers might benefit, but urges to gloat should generally be suppressed. Nobody whose service uses IP and the public networks is safe from outages or service disruptions.

That's why businesses and networks have redundancy. People who scream and yell about losing their service have only themselves to blame if they didn't build some level of diversity and redundancy even into their personal communications. Use Skype, other IM applications, mobiles, POTS-replacement VoIP, and POTS, email and anything else you can get your hands on. Some of us use multiple mobiles from different providers and multiple broadband providers. But never hang everything on any one service or provider, especially if your business depends on it. Personally, I wouldn't even hang my personal communications on a "single provider" strategy.

Thursday, August 16, 2007

Dark Skype


Skype Ltd. early today blamed an unspecified "software problem" for an outage that might make the service unavailable for as long as 24 hours. At 9 a.m. EDT Skype said the outage might last 12 to 24 hours.

Most people are finding it impossible to dial out or open an instant message session with any of their contacts. A "Connecting" message just hangs.

Skype rarely goes offline. The last reported outage resulted in the service going dark for several hours in October 2005.

Fred Pitts Back in Service with TeleBlend


It took 10 days, but TeleBlend customer Fred Pitts FINALLY is back in service.
"My first try to call home this morning continued with the "fast busy" signal; by midmorning, however, it was working," Pitts says. "So, while disappointed to have been without incoming service for such a length of time, I am thankful today that I am back up. I hope everyone else will be back in service soon as well."

A gracious comment, I'd say. At least some disgruntled SunRocket customers who picked TeleBlend as a replacement say they have churned to other providers such as Packet8 and Vonage.

A harrowing experience, to be sure. Perhaps it is only fair to note, though, that of the 60,000 transitioned customers, nearly all made the flash cut without much apparent disruption. Call it 99 percent. But one percent of 60,000 is still 600 customers, and it will be scant comfort to know that (hypothetically) 54,000 customers had no real issues.

That's the devil with mass market services, though, isn't it? Getting 99 percent of things right still generates thousands of trouble tickets (I'm not suggesting TeleBlend had issues with as many as one percent of its accounts, by the way. Just making the point that a very small failure rate in a mass market application or service can result in huge trouble ticket queues.)

Skype apparently still is having a major outage itself today, and as older posts today note, at&t and Cisco have had issues this month as well. S*** happens even to companies as large and sophisticated as Cisco and at&t.

And Cisco Goes Down, Also...

Cisco's main www.cisco.com page was offline at 11 a.m. Pacific Time on Aug. 8 and stayed offline for more than two and a half hours. It returned at about 1:45 p.m. The outage was an unintended byproduct of routine maintenance.

at&t EDGE Network Outage

See what we mean? AT&T Inc. acknowledged a brief outage of its EDGE network Tuesday, Aug. 14, which was blamed on routine router maintenance. The EDGE network was also down on July 2 for about six hours.

EDGE (Enhanced Data Rates for GSM Evolution) is the wide area wireless network that services iPhones and many other devices, providing data service but also carrying voice traffic over the GSM protocol.

Will Generative AI Follow Development Path of the Internet?

In many ways, the development of the internet provides a model for understanding how artificial intelligence will develop and create value. ...