Google's latest search indexing system, "Caffeine," promises search results that are 50-percent fresher than Google used to be able to provide under the old indexing system. In a world where real-time and near-real-time content is boosted by applications such as Twitter, that's important.
"Whether it's a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before," the Google Blog notes. Google noted that faster indexing is needed in part because with the advent of video, images, news and real-time updates, the average webpage is richer and more complex, and user expectations simply are higher.
Searchers want to find the latest relevant content and publishers expect to be found the instant they publish.
The old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks. To refresh a layer of the old index, we would analyze the entire web, which meant there was a significant delay between when we found a page and made it available.
With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before—no matter when or where it was published.
Caffeine lets Google index web pages on an enormous scale, processing hundreds of thousands of pages in parallel Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day, Google says.