Thursday, June 30, 2005

The Natural Evolution of Internet Search

Increasingly, I feel that the internet search industry will fragment into specialized niche players. We are already starting to see this with services like Grokker (by Groxis) that takes search information from Yahoo and groups it in a visual presentation allowing users to quickly sift through the vast amounts of search results. Additionally, it is important to note that there are only two major suppliers of query information—Google and Yahoo. Search Engine Watch has this chart that illustrates how Google and Yahoo power (
) almost all the other major search players.

Why do these second-tier searches exist? The answer is simply that they find new ways to present the query information that is supplied by Yahoo and Google. In fact, within internet search we have two different but dependent technologies. It all starts with spidering and indexing webpages. While the spider reads a page’s content, it breaks it down into useable parts and neatly stores them away for future use. This acts similarly to a raw material that is then processed into the final product. The processing analogue in internet search is taking the indexed information and processing it. Each page is given a score for a particular search query; it all amounts to a number. The spidering and indexing is a rather standard process, whereas the algorithm that assigns a number to each page for a given query is unique to each search service. Yahoo has a similar but different ranking algorithm than Google. MSN takes Yahoo’s index and processes it with its own algorithm.

On top of this, a new layer is starting to form that is not yet altogether separate from the ranking algorithm layer although we are starting to see greater division with services like Grokker, Teoma, and Yahoo’s Mindset. What distinguishes these services from ones like Google, Yahoo, and MSN is their focus on presenting and sorting given query results. It is a well-known fact that about 80% of internet search users never make it past the first ten results and the number drops off exponentially as we go deeper into the search results. This essentially means that only the top twenty results matter with a first-generation search service like Google and Yahoo. However, others like Teoma and Grokker have found that although people are not willing to sort through large numbers of search results they are likely to sort through them if they are presented in a few digestible pieces that allow them to drill down to the right answer. The popularity of these searches shows that many people are unsatisfied with standard search results and are willing to take the time to find the answer for which they are really looking; however, they are unwilling to spend a lot of time or energy to do so.

Another indication that internet search is starting to fragment into specialized niches is that there is an explosion of vertical-specific search service. The most noticeable manifestation of this are blog-related searches like Technorati, Bloglines, and Feedster. It was very apparent that this was going to happen given that Google and Yahoo where nowhere to be found when it came to searching for blogs.

Another huge trend in second generation search engine technology is clustering or grouping pioneered by Ask Jeeves and adopted by services like Teoma and Clusty. When we search ‘beatles’ into a first generation search engine like Google, we get all sorts of results related to ‘beatles’ but they are just raw, unrefined results. Google is too dumb to know what ‘beatles’ you are searching for. That is where clustering comes in. It allows you to drill down to the results that you really want by clicking on the clusters that are of interest to you. It is essentially a way for back and forth interaction between the search service and the user. By displaying clusters related to ‘beatles’ the search engine is asking you, ‘which one do you mean?’

Grokker takes clustering or grouping further by making the interaction visually appealing and more powerful at the same time by showing you what sub-groups fit within which larger groups. This gives you a stronger understanding of their true nature and meaning allowing you to drill down with more certainty.

Yahoo’s Mindset addresses the most important distinction among search results by grouping them as commercial and informative. That is often the first thing that users want to resolve by separating the two. For example, if you search for ‘panther’ in the first-generation searches you get many commercial results within the top 20; dumb search engines don’t know if you want the commercial results for Apple’s Panther OS or want to learn about the animal. Yahoo Mindset allows you to resolve this.

Finally, the newest trend is making personalized search engines for everyone. In essence, the user still uses the same old first generation search services, but they are super charged with your personal information. I am not at all convinced that these will actually help you find what you need because my interests change on a moment by moment basis. For example, I studied physics and economics as an undergraduate. How will this help dumb search determine if I am interested in the animal panther or the OS? It seems to me that this is rather a ploy to help Yahoo and Google sell us products. Yahoo has already revealed that itss advertising service will match its ads to user profiles rather than simple keywords.

How does this all tie into the main point of this article? There are countless ways to analyze and present information found, processed and stored on servers. That is why Grokker uses Yahoo’s information, Teoma uses Google’s information. Companies will continue to come up with pioneer ways to search, sort and present data about webpages on the internet and larger companies like Yahoo and Google won’t always be the winning end. The only thing that lies between smaller services like Grokker and having a complete search product is the underlying crawling, processing and storage infrastructure that costs a great deal of money to own and operate. That is why it makes perfect sense to have companies that specialize in the infrastructure sphere and others in the production sphere. If companies arise that specialize in the infrastructure and open their services to any client, this will mean low barriers to entry for many companies that want to focus on the more architecture-driven side of the equation; that is where the most interesting engineering remains unexplored.


At 4:50 PM, Blogger Dave S said...

I found several blog entries using the search term “technorati search”. This comment is a general reply to all those bloggers whose entry implied an interest in finding quality. [More...]

By the way, your article spells "technorati" with an extra C. I often make the same mistake.

At 5:35 PM, Anonymous Anonymous said...

I wonder how they landed on the spelling Technorati. Perhaps Technocrati was taken.

Your analysis leaves out Wikipedia as an information source, as unreliable as it may be.

What I find interesting is that the format of search results people receive is little different from what Webcrawler spit back ten years ago (Grokster (where do they come up with these names?!?) and its relatives aside).

At 2:09 PM, Anonymous Anonymous said...

The fact is that the INTERNET is making people RICH! Shouldn't YOU be one of them? Click here: FIND OUT NOW!


Post a Comment

<< Home