Understanding The Google Caffeine Search Index & Algo Update

Google NEVER rests and now, I know why – it’s all that Caffeine! In their continuing quest for world dominance, they have made a pretty significant recent update for their core product – search. Before we even get into the updates, can I just say that with close to ten years in the industry, I’ve never seen a company so good at maintaining and even re-inventing relevancy. I must confess, I have a love/hate relationship with Google. I’m in AWE of how good they are at the technical product side of the business, but am conversely DISGUSTED by how horrible they are at the human side of the business.

Okay, enough of that! Let’s talk enhancements, let’s talk Google Caffeine. A little over a week ago, Google announced that they had developed a new, enhanced web indexing system called Caffeine. As the name may suggest, it’s a more amped-up, hyper indexing system than Google previously used to generate search results.

Before we delve too deeply into the new index, let’s make sure we understand how web searches work. If you remember my earlier post about real-time search, although most searches are in real-time, the content delivered against them rarely is. So, indexing Twitter and including real-time results was the first step in making the search more relevant to the period in time when the search occurred.

When you perform a search on any engine, be it Google, Yahoo! or Bing (I know there are more, but really, are you searching there?), you aren’t actually getting a RIGHT now result. What you are getting is a result based on the sites on the Web that these engines have previously crawled and identified as relevant for the search phrase that you entered. Those results may have been compiled days or even months prior to the time your search was performed. As a matter of fact, the typical guideline that SEO experts and companies used to give clients regarding search optimization was that it may take 3 – 4 months for the engines to crawl the web again for new content and update their indexes. So, the expectation was that you would make the changes today, but wouldn’t see the impact until months down the road when the engines decided to crawl again. That blows, huh?

Over the years, there have been gradual changes to this approach that have moved towards making the search results more relevant to the moment the search was performed. For example, three or four years ago, while at, we made significant changes to the site that incorporated search optimization best practices and were able to see immediate, as in same day indexing for content. At that time, the search engines, with Google being the best at it, were actually using some sort of vertical category segmentation of the index to determine how often they crawled sites. Newspaper sites that added tons of new content regularly that was often time-sensitive or had a short shelf life, e.g. breaking news, were crawled constantly to provide good search results. Or, say there are sites that have newly launched or in-demand consumer goods, e.g. video game systems or video games (Halo anyone?); around the time that the product launches or the demand is high, Google began segmenting that vertical and indexing more often.

As I mentioned, I noticed this occurring around 2006 or so. Fast-forward to 2010 and the web is a totally different place. Videos, photos, blogs, Twitter, Facebook – it’s lousy with user-generated content! Often, the content is time-sensitive and a reaction to topical events occurring around the globe. There are so many recent new stories, e.g. Michael Jackson’s death, the European Volcano Ash Crisis, Iranian Elections, etc., that have broken or been updated via user-generated content and not traditional channels. In order to provide a more relevant experience, Google has been cooking up many updates to its search algorithm.

Which brings me back to Caffeine, which according to Google, will provide search results that are 50% fresher – remember this is hyper, amped-up indexing baby! They say the change was driven not only by the fact that lots of today’s content is user-generated, but also because when you combine all that social content and the publisher developed content, there’s just so much more content period. Here’s Google’s illustration of the old layered way of indexing by vertical and the new Caffeine methodology.

See the old implementation on the left? Multiple layers, with different updating schedules dependent upon the importance of the layer (not sure which color is most important). See the new layout? Look at all that content  – photos and videos, real-time updates, books (I guess this is the illustration of that content) and the user is RIGHT in the middle of it all, not on the outside looking in, like they were before. They are a key part of the experience, at least that’s what I get out of the artwork. Good thing for Google that they are in engaged primarily in the search business and not design work! J

You can check out the rest of the Caffeine post on the Google Blog.

What does this mean to you? I think it means that you will need to develop a more systematic approach to covering all of these key content areas to ensure that you maintain relevancy in the new algorithm. Although you SHOULD already have most of these, consider creating photo galleries for your business on Flickr – syndicate those pictures to not only your website, but also to your social media profiles, e.g. Tweet them, post them on Facebook, post them on your blog – make the experience circular! That’s the huge benefit of combining search and social. Socially shared information drives people to search engines, where inevitably, they will find more socially generated content that will drive them BACK to social. Particularly an approach like I mentioned above. Do the same thing with your videos, your blog posts, your Facebook pages. Share the information in a manner that makes sense, but ONLY when it’s relevant. Otherwise, you’ll just seem silly!

I see this change as not only being about making the searches relevant, but also as upping the power of social media as it relates to other channels, most particularly search.

This is a side note and then we’ll move on, but I’ve also noticed that Google is now utilizing not only more social data, but more of MY social data to power the results I’ve received over the last few weeks. Look at this search I performed before I started writing this post.

Here’s the search:

When I scrolled down to the bottom of the page, look at what I found!

Let’s see what Google has to say about this change after it’s out of BETA. Expect a follow-up from me! Until then, toodles darlings! J

