Skip to main content

Why website crashes are unavoidable -- at least for now

John D. Sutter
Experts says websites will continue to crash, even though consumers have increasingly high demands about uptime.
Experts says websites will continue to crash, even though consumers have increasingly high demands about uptime.
  • This week has seen several big website outages
  • WordPress went down Thursday night, crashing more than 10 million blogs
  • Experts say website crashes are somewhat unavoidable, however

(CNN) -- This has been a week of crashing websites.

First it was Twitter, which had "site availability issues" on and off this week. Then, on Thursday night, it was WordPress, the popular blogging platform that supports more than 10 million blogs, all of which went down for several hours because of a coding problem.

Big-name tech blogs such as TechCrunch and GigaOm were yanked offline because of the WordPress glitch. And, for purposes of disclosure, CNN's blogs also were among those that were unavailable for a matter of hours.

Jason Kincaid, a writer at TechCrunch, wrote that he was far from pleased about the service outage. "If you tried to access TechCrunch any time in the last hour or so, you probably noticed that it wasn't working at all," he wrote Thursday. "Instead, you were greeted by the overly cheery notice ' will be back in a minute!' Had we written that message ourselves, there would have been significantly more profanity."

Obviously, no one likes it when websites go down. And, for website owners, even a few hours offline can mean a big hit to revenue from ads. But if you understand how the Internet works, there's one thing you realize quickly: Websites will continue to crash from time to time, and, without a big rethink of the system, there's no way to prevent that completely.

"I think that all services will have downtime," WordPress founder Matt Mullenweg wrote in an e-mail to CNN on Friday. "No matter how much you prepare, have redundant systems, or audit, there will periodically be a black swan event that is completely unlike whatever you've experienced before. It even happens to Google! In these moments of crisis, the key is how the service and the people behind it respond."

Over the past five years, WordPress has been up -- meaning the whole site is functioning normally -- 99.9 percent of the time, Mullenweg said. That translates to about nine hours of downtime per year.

The site does not make guarantees that its service will always work. Instead, "our guarantee to our users is that we'll do our very best to have their blog be completely reliable so they don't have to worry about it," he said.

Rich Miller, editor of the blog Data Center Knowledge, said websites will continue to crash because they are run by mechanical computers, which always fail eventually.

"This is an industry that is built around trying to take every measure possible to ensure websites don't go down and data centers don't lose power," he said. "But there's always Murphy's Law, which is anything that can go wrong will -- and sometimes will in multiple places at once."

The main way websites are trying prevent downtime is by building more data centers, Miller said. Data centers are essentially the brains of the Web -- they're huge warehouses full of computer servers that store information.

Companies tend to store information at multiple sites, so that if a computer server crashes, or if a certain town loses power because of a natural disaster, then the website doesn't go down and information isn't lost.

"At the end of the day you're still dealing with mechanical, electronic systems," Miller said. "These things break and get old and die. And so what you do is you design and build these facilities to try to account for every possible scenario. The math gets challenging when you try to imagine every possible scenario."

He said Twitter and WordPress generally do a good job at preventing crashes.

Christofer Hoff, director of cloud and virtualization solutions at Cisco, said there needs to be a "reasonable resetting of expectations" among consumers about how often websites should go down. Hoff said he has been personally irritated by Twitter's frequent crashes, but he understands that things such as increased traffic because of the World Cup or a tech event such as Monday's iPhone 4 unveiling can cause unforeseen problems that lead to service issues.

"It seems like a really simple set of problems, but the scalability of website operations is a very, very tricky business," he said, "especially when you look at all of the moving parts."

Mullenweg, of WordPress, said Thursday's crash resulted from "a highly unfortunate code error which had some cascading effects." Twitter posted on its Status Blog that "networking problems" caused its trouble, and that the site issues continued on Friday.

Just because websites will continue to crash from time to time doesn't mean that these shortcomings aren't significant. People increasingly rely on Internet services such as WordPress and Gmail to host their work communications and contacts lists. Websites such as TechCrunch lose money and possibly readers if their product isn't available online. And Twitter has become an increasingly vital means for people to communicate in the age of real-time information access.

"We live in a 24-by-7-by-365 world now," Miller said. "That's the expectation folks have now" -- that information will always be available.

Some websites are better than others at meeting these expectations.

A report from the company Pingdom (PDF) segmented out downtime for social networks in 2008. Twitter was at the bottom of the list, meaning it was down that year more often than most. Facebook and others fared better in that report.

Without any solution that's 100 percent effective, Miller said the data-center industry measures website uptime by "how many nines" a website has. Three "9s" -- or 99.9 percent uptime -- is good. The best in the industry manage to create services with five "9s," he said, meaning they're up 99.999 percent of the time.

Usually, these uptimes aren't guaranteed to website users unless they pay to access the site's services. Some paid services operate under signed agreements that guarantee a certain amount of uptime or a penalty will be paid to the consumer.

Perhaps just as important as uptime, Miller said, is the way companies respond to these website crashes when they do occur.

The old model was to ignore them publicly, he said. But with today's real-time Web, he said, that's not an option anymore.


Most popular Tech stories right now