Friday, August 12, 2005

Yahoo's Index is over 50 Times Bigger than Google's

Yahoo!'s recent announcement that its index contains 20 billion pages stirred a lot of controversy. Yahoo!'s claim would put its index at over twice that of competitor Google that advertises its index to comprise of just over 8 billion pages. However, according to a quick bare-bones study that I conducted taking into account Alexa's top 100 anglophone websites.

The analysis was simple and seems to substantiate other claims for specific verticals such as the blogging vertical. I compared the number of referring links for all top 100 sites in Google, Yahoo, and MSN. The premise is trivial--the total number of actual indexed pages is proportional to the number of referring links in each index. (This would not hold true if, say, Google didn't report all of their referring links while Yahoo! did.)

The results substantiated the suspicion that I and other have had, that Google is misrepresenting the size of its index. Indeed, this study points to this assertion with compelling numbers. Granted, a complete and unbiased study should involve a random sample of sites that is large enough to draw statistically significant conclusions. Even so, the numbers in this rudimentary analysis are so convincing that not only does it seem that Yahoo!'s index is larger, but it is very likely larger by at least one if not two orders of magnitude.

For the top 100 anglophone websites as dictated by Alexa, the ratios are:

  • Yahoo! index to Google index = 51.0 to 1
  • MSN index to Google index = 6.5 to 1
  • Yahoo! index to MSN index = 7.8 to 1
This means that if these 100 sites are indicative of the entire statistical make-up of each respective index, Yahoo!'s index is about 50 times bigger than Google's and almost 8 times bigger than MSN's. Additionally, this would also imply that MSN's index is at least 6 times bigger than Google's. How embarrassing for Google.

I have started to analyze a random sample of sites for a more definitive conclusion, but was recently limited by time. I hope to have such as study published on this site soon with all accompanying regression analysis.

Click here for the spreadsheet of referring links analysis.



Regression Analysis Showing Yahoo is 446 Times Bigger than Google

Below are the results of the regression analysis. Because the ratios of referring links varied so much from URL to URL, the standard deviation is high. However, notice that the coefficients are much higher than the simple average; Yahoo to Google index is likely to be 446 times bigger given that referring links are proportional to the size of each index.

(Sorry that its a little jumbled, but I am leaving for an important trip tomorrow and don't have time to format it.)

Yahoo to Google


Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 90.0% Upper 90.0%
Intercept 446.2035 336.6444 1.325445 0.188105 -221.856 1114.263 -112.812 1005.219

MSN to Google

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 90.0% Upper 90.0%
Intercept 111.8625 61.87671 1.807829 0.073701 -10.9298 234.6548 9.11304 214.6119


If you are interested in the actual spreadsheet with this analysis email me at iriesergio@gmail.com .

Thursday, August 11, 2005

Referring Links Analysis

Site Name URL Google Yahoo MSN
Yahoo! www.yahoo.com 539,000 73,500,000 3,923,128
MSN www.msn.com 945,000 3,110,000 1,363,692
Google www.google.com 3,460,000 149,000,000 24,447,845
Passport.net www.passport.net 671 49,800 6,549
Ebay www.ebay.com 211,000 7,110,000 742,150
Microsoft Corp www.microsoft.com 129,000 11,000,000 3,165,326
Amazon www.amazon.com 340,000 2,790,000 820,371
Fastclick www.fastclick.com 10,900 54,400 6,395
Google UK www.google.co.uk 35,800 1,460,000 2,292,703
AOL www.aol.com 43,000 3,630,000 563,621
Go www.go.com 12,500 368,000 134,098
BBC Online www.bbc.co.uk 318,000 16,700,000 807,795
CNN www.cnn.com 158,000 9,170,000 1,195,644
Xanga www.xanga.com 3,830 3,270,000 70,380
My Space www.myspace.com 12,700 2,250,000 135,376
Blogger www.blogger.com 512,000 93,300,000 4,609,802
Ebay UK www.ebay.co.uk 46,600 1,090,000 77,675
Googlesyndication.com www.googlesyndication.com 0 205 3,070
Double Click www.doubleclick.com 1,610 28,500 16,500
Alibaba.com www.alibaba.com 177,000 2,530,000 589,293
Google CA www.google.ca 5,680 641,000 161,530
Casale Media www.casalemedia.com 2,080 192 2,758
Comcast.net www.comcast.net 3,870 122,000 47,106
Craigslist.org www.craigslist.org 20,300 645,000 57,181
The Internet Movie Database www.imdb.com 90,300 10,800,000 876
Gator.com www.gator.com 246 6,490 5,249
Adult Friendfinder www.adultfriendfinder.com 703 8,010 12,737
Yahoo! Search Marketing Solutions www.overture.com 20,800 1,360,000 225,953
MOJO Works www.mediaplex.com 85 11,600 6,933
MapQuest www.mapquest.com 46,700 2,240,000 1,069,090
The Weater Channel www.weather.com 36,300 4,080,000 640,845
Webshots www.webshots.com 495,000 5,510,000 310,437
Dell Computers Online www.dell.com 16,400 607,000 198,438
Tribal Fusion Ad Banner Marketplace www.tribalfusion.com 151 1,720 2,786
Apple Computer, Inc. www.apple.com 84,700 2,930,000 1,170,254
About www.about.com 10,600 645,000 267,343
AdServer www.adserver.com 25 908 1,805
My Way www.myway.com 1,580 120,000 13,521
Wikipedia www.wikipedia.org 68,200 2,360,000 254,649
Match.com www.match.com 62,500 2,810,000 317,659
Searchscout.com www.searchscout.com 29 1,740 2,314
Neopets www.neopets.com 2,940 629,000 112,863
Revenue.net www.revenue.net 15 99 309
Netscape.com www.netscape.com 52,400 2,380,000 2,095,629
888.com www.888.com 671 28,100 10,871
Friendster www.friendster.com 10,100 1,660,000 81,309
Google Australia www.google.com.au 5,430 221,000 74,733
BeInk.com www.belnk.com 10 18 1,533
CNET Download.com www.download.com 387,000 4,640,000 468,441
Mywebsearch.com www.mywebsearch.com 0 33,300 5,289
Atlas DMT www.atdmt.com 60 179 2,005
Monster.com www.monster.com 15,900 485,000 339,448
Ask Jeeves www.ask.com 23,700 6,760,000 356,028
Trafficmp.com www.trafficmp.com 8 310 1,096
Lycos www.lycos.com 171,000 3,730,000 1,305,230
Expedia.com www.expedia.com 82,800 2,940,000 681,178
Net-offers.net www.net-offers.net 0 14 947
Bank of America www.bankofamerica.com 3,360 367,000 95,594
The New York Times www.nytimes.com 138,000 23,700,000 828,559
NBA.com www.nba.com 51,100 1,820,000 369,033
Rediff.com India Limited www.rediff.com 37,000 680,000 65,180
Tripod.com www.tripod.com 16,300 174,000 33,235
Nastydollars.com www.nastydollars.com 282 396,000 5,810
Hewlett-Packard Industrial Ethernet www.hp.com 52,200 1,350,000 660,366
Offeroptimizer.com www.offeroptimizer.com 120 1,220 1,201
Official Site of Major League Baseball www.mlb.com 12,400 1,440,000 298,086
Internet-optimizer.com www.internet-optimizer.com 32 400 231
Altavista.com www.altavista.com 86,500 4,930,000 1,247,949
MSN UK www.msn.co.uk 11,000 114,000 33,285
Reference.com www.reference.com 5,250 45,500 18,818
Macromedia www.macromedia.com 26,700 1,670,000 681,142
Hi5.com www.hi5.com 667 136,000 5,013
SourceForge www.sourceforge.net 50,400 255,000 858,444
Netflix www.netflix.com 8,700 185,000 44,516
Amazon.co.uk www.amazon.co.uk 43,800 331,000 137,164
ZEDO.com www.zedo.com 210 430 7,527
No-IP www.no-ip.com 571 45,600 14,840
ICQ www.icq.com 29,100 1,760,000 447,923
Theplanet.com www.theplanet.com 7,060 1,430,000 103,443
Adobe Systems Inc www.adobe.com 84,800 11,900,000 2,245,273
UPS www.ups.com 13,700 2,160,000 313,928
WindowsMedia.com Media Guide www.windowsmedia.com 1,590 112,000 42,623
Orbitz www.orbitz.com 6,120 215,000 74,057
LiveJournal.com www.livejournal.com 167,000 4,610,000 137,537
RealPlayer www.real.com 90,800 13,900,000 1,173,597
N/A www.wretch.cc 4,840 674,000 10,281
Narrowad.com www.narrowad.com 717 7,380 7,435
Google India www.google.co.in 549 1,440,000 20,229
MiniClip.com www.miniclip.com 6,890 694,000 64,444
Orkut.com www.orkut.com 7,150 490,000 55,209
Excite.com www.excite.com 34,800 1,530,000 830,134
Travelocity www.travelocity.com 9,890 435,000 193,283
IGN www.ign.com 173,000 3,730,000 284,855
CNET.com www.cnet.com 476,000 5,350,000 568,317
CareerBuiler.com www.careerbuilder.com 97,100 1,970,000 243,040
Internet Archive www.archive.org 20,600 847,000 145,608
Symantec www.symantec.com 12,300 1,890,000 544,323
Pogo.com www.pogo.com 2,120 125,000 23,895
United State Postal Service www.usps.com 8,690 2,290,000 207,210
SiteSell.com www.sitesell.com 2,000 8,770 4,005
Total 10,476,302 534,027,885 68,424,428






Yahoo:Google 50.9748464


Yahoo:MSN 7.80463791


MSN:Google 6.53135314

Thursday, June 30, 2005

The Natural Evolution of Internet Search

Increasingly, I feel that the internet search industry will fragment into specialized niche players. We are already starting to see this with services like Grokker (by Groxis) that takes search information from Yahoo and groups it in a visual presentation allowing users to quickly sift through the vast amounts of search results. Additionally, it is important to note that there are only two major suppliers of query information—Google and Yahoo. Search Engine Watch has this chart that illustrates how Google and Yahoo power ( http://searchenginewatch.com/reports/article.php/2156401
) almost all the other major search players.

Why do these second-tier searches exist? The answer is simply that they find new ways to present the query information that is supplied by Yahoo and Google. In fact, within internet search we have two different but dependent technologies. It all starts with spidering and indexing webpages. While the spider reads a page’s content, it breaks it down into useable parts and neatly stores them away for future use. This acts similarly to a raw material that is then processed into the final product. The processing analogue in internet search is taking the indexed information and processing it. Each page is given a score for a particular search query; it all amounts to a number. The spidering and indexing is a rather standard process, whereas the algorithm that assigns a number to each page for a given query is unique to each search service. Yahoo has a similar but different ranking algorithm than Google. MSN takes Yahoo’s index and processes it with its own algorithm.

On top of this, a new layer is starting to form that is not yet altogether separate from the ranking algorithm layer although we are starting to see greater division with services like Grokker, Teoma, and Yahoo’s Mindset. What distinguishes these services from ones like Google, Yahoo, and MSN is their focus on presenting and sorting given query results. It is a well-known fact that about 80% of internet search users never make it past the first ten results and the number drops off exponentially as we go deeper into the search results. This essentially means that only the top twenty results matter with a first-generation search service like Google and Yahoo. However, others like Teoma and Grokker have found that although people are not willing to sort through large numbers of search results they are likely to sort through them if they are presented in a few digestible pieces that allow them to drill down to the right answer. The popularity of these searches shows that many people are unsatisfied with standard search results and are willing to take the time to find the answer for which they are really looking; however, they are unwilling to spend a lot of time or energy to do so.

Another indication that internet search is starting to fragment into specialized niches is that there is an explosion of vertical-specific search service. The most noticeable manifestation of this are blog-related searches like Technorati, Bloglines, and Feedster. It was very apparent that this was going to happen given that Google and Yahoo where nowhere to be found when it came to searching for blogs.

Another huge trend in second generation search engine technology is clustering or grouping pioneered by Ask Jeeves and adopted by services like Teoma and Clusty. When we search ‘beatles’ into a first generation search engine like Google, we get all sorts of results related to ‘beatles’ but they are just raw, unrefined results. Google is too dumb to know what ‘beatles’ you are searching for. That is where clustering comes in. It allows you to drill down to the results that you really want by clicking on the clusters that are of interest to you. It is essentially a way for back and forth interaction between the search service and the user. By displaying clusters related to ‘beatles’ the search engine is asking you, ‘which one do you mean?’

Grokker takes clustering or grouping further by making the interaction visually appealing and more powerful at the same time by showing you what sub-groups fit within which larger groups. This gives you a stronger understanding of their true nature and meaning allowing you to drill down with more certainty.

Yahoo’s Mindset addresses the most important distinction among search results by grouping them as commercial and informative. That is often the first thing that users want to resolve by separating the two. For example, if you search for ‘panther’ in the first-generation searches you get many commercial results within the top 20; dumb search engines don’t know if you want the commercial results for Apple’s Panther OS or want to learn about the animal. Yahoo Mindset allows you to resolve this.

Finally, the newest trend is making personalized search engines for everyone. In essence, the user still uses the same old first generation search services, but they are super charged with your personal information. I am not at all convinced that these will actually help you find what you need because my interests change on a moment by moment basis. For example, I studied physics and economics as an undergraduate. How will this help dumb search determine if I am interested in the animal panther or the OS? It seems to me that this is rather a ploy to help Yahoo and Google sell us products. Yahoo has already revealed that itss advertising service will match its ads to user profiles rather than simple keywords.

How does this all tie into the main point of this article? There are countless ways to analyze and present information found, processed and stored on servers. That is why Grokker uses Yahoo’s information, Teoma uses Google’s information. Companies will continue to come up with pioneer ways to search, sort and present data about webpages on the internet and larger companies like Yahoo and Google won’t always be the winning end. The only thing that lies between smaller services like Grokker and having a complete search product is the underlying crawling, processing and storage infrastructure that costs a great deal of money to own and operate. That is why it makes perfect sense to have companies that specialize in the infrastructure sphere and others in the production sphere. If companies arise that specialize in the infrastructure and open their services to any client, this will mean low barriers to entry for many companies that want to focus on the more architecture-driven side of the equation; that is where the most interesting engineering remains unexplored.

Thursday, June 23, 2005

Sample Referring Link Query on Top 3

I started doing an analysis on referring links in the top three internet search services-Google, Yahoo, and MSN. For my alma mater (www.dartmouth.edu), the scores were:

Google - 5,990
Yahoo - 372,000
MSN - 57,790

I was spinning my wheels a little bit because I got the search syntax wrong for all three initially. After some quick investigation I found the right syntax. For everyone's reference they are:

Google - "link:www.name.com"
Yahoo - "link:http://www.name.com"
MSN - "link:www.name.com"

I am well on my way to analysing the top 100 as ranked by Alexa.com. I will post the results are they come up.

Grokker Extends Offerings with Grokker Research

Groxis (father company of Grokker) announced on Monday, June 20 that it has started a pilot program called Grokker Research that will be available to corporations and academic institutions. Grokker Research is a web-based platform that will allow for a visual presentation of "deep Web" search results. It will allow corporations like Sun Microsystems (and I bet univeristies like Stanford) to search a vast collection of sources like library databases, internet search engines (Yahoo) and subscription services. This service will be free for a trial period to qualifying institutions, and will be a premium service thereafter.

Now this is great news because what we really need is an ability to sort out commercial webpages from informational webpages and to further make sense of them. Grokker in its present form and as Grokker Research will be a quantum leap forward in this pursuit. I am very optimistic about these evolving services along with ones like Teoma, Ask Jeeves and Yahoo’s Mindset.

I am beginning to see how things are shaping up in the next generation of search. Google, Yahoo, MSN Search and the like will basically become suppliers of raw materials to companies that will fabricate different things from them. I think that this is a natural evolution of things. Each search service does not need it’s own website repository. They can buy that from Yahoo or Google. With these resources they can find various ways to use them whether it is a vertical search, a visual search, a search that allows you to choose between commercial and research interest, or any other interesting way to utilize these resources. This makes perfect economic sense and follows every other industry throughout history.

Perhaps I’m sticking my neck out with this day dream, but it really falls in line with how homo sapiens utilize resources through specialization. Specialization is an underlying principle that has shaped every industry and it makes sense that the internet search industry would follow suit. A car maker does not mine the steel, etc. IBM and Apple no longer make all their components. In sum, I’m looking forward to play around with Grokker Research. I really enjoyed the original Grokker and now turn to it for more complicated searches.

Google Indexes Less

John Battelle again writes about Tristan's discovery that Google does a vastly poorer job indexing blogs than Yahoo and Technorati.

This is no news but rather a general fact. Google almost always indexes fewer referring pages for any given URL vis a vis Yahoo or MSN. In fact, it seems that MSN does the best job of indexing, closely followed by Yahoo. I was astounded to discover that this is the case, but it really does hold true. Tristan just uncovered the blogging part of a greater truth--Google does an inferior job indexing pages as compared to Yahoo and MSN.

Why does Google do it this way? Google is not dumb. They realize that 80% of people only look at the first page of results, thus they weight it to the top to reduce the computational and storage loads for their server clusters. The real question is why Google publicly claims otherwise. Either Yahoo and MSN are understating their reach or Google is overstating its. I am not sure where the truth lies, but I'd be interested to find out.

Bookmark and check back next week for an analysis of this topic. I'll work on it this weekend.

Wednesday, June 22, 2005

Expanding Scope

As it turns out, there is not much buzz around search. Of course, this is a relative statement. The important thing is that there is not enough about search to warrant daily posts. On the other hand, there are a lot of topics that are closely related to search engines and how we find information on the net whether it is for commercial interest of informational interest. To that end, I am slightly expanding the scope of this blog to include some of these issues. The main emphasis will, of course, continue to be search engine technology.

Liar, Liar and Pipe Dreamer

Jon Battelle wrote the following regarding Google's plans to launch a pay service, an ad listing service, and a media player:

I recently sent a note to the folks at Google PR. It went something like this:

So, in the last week, it's been

1. Google is starting a Paypal killer.
2. Google is starting a
craigslist killer.
3. Google is starting an
iTunes killer.

So, any thoughts about all of this?

To which Eric Schmidt replied:

"We believe that ecommerce can be improved and we are working on ways to improve the user experience. We are working on things in ecommerce."


It's the typical political rhetoric from Google. I am sitting at the edge of my seat waiting to see just how Google will improve internet commerce taking into account the stellar job they did with Froogle. I mean, Froogle was a real quantum leap in internet commerce.

I have heard rumors on the net that Google was trying to acquire Craigslist. They either failed and will try to go at it alone, or are still trying to buy it. My two cents is that if they go at it alone, they will fail because they won't be able to get the locally-focused following that Craigslist got. Why would anyone switch? They will probably bury it deep behind the main page in a beta receiving whatever drip-through traffic they usually get to their 'other' products.

Regarding iTunes, Money please! There are a ton of music players out there. This is probably meant to lead to GMusic Store or something like that. There are already many players in that game too. Of course, they will make marginal money from these ventures because they have a captive audience from their internet search, but I don't expect that they will make a killing like they did with AdSense.

On the other hand, the internet is not a new market free from traditional economics. In fact, it is probably one of the purest forms of economics in real life due to a low barrier to entry, seemingly endless resources, etc. As competition increases (it will quickly) their margins will be squeezed. As we saw with American car makers, to be 3rd best in an industry now means bleading red. And that is the car industry!

The internet is still the wild west-unexplored and sparcely populated. Look at any other industry a decade or two after it began. Look at television, or radio; things will really become interesting in the upcoming months and years.

Take into account also, that Google does not hold to it's word. If you look at how many linked entries they have to any site, it is vastly less than either Yahoo or MSN. Their claim that they search X number of pages is simply absurd. (Try it- 'link: battellemedia.com' on Google vs. 'link: battellemedia.com' on Yahoo) This pattern holds true for nearly all entries that I have tried. If every site, has fewer links that logically means that Google's index is smaller. Simply put, it's a sham.

Thursday, June 02, 2005

Google's Secret Search Lab

There has been a lot of buzz in search related forums about a secret effort at Google to control the quality of search results. The kicker is that they are not doing this via significant changes in the ranking algorithm or spidering technology, but rather production-line-style human quality control! Are you kidding me? This reminds me of a spaghetti horror flick where the monster is running amuck. Has Google lost its marbles? Why is the world's premier technology company resorting to turn-of-last-century production line efforts?

It seems that we are moving backwards in time to more primitive technologies. Google is now doing what Yahoo! was doing while a small start-up at the dawn of the internet age, when it hired people to actually add by hand new sites to its directory. Luckily, Yahoo soon realized that this was a an insurmountable task. I really can't believe that Eric Schmidt is spending money on this rather than investing in new, paradigm-shifting search technologies.

Not only that, but is Google becoming a self-proclaimed internet censor? Well, why not? They are already doing this with AdSense by denying sites with certain content access to the program. ( Check out the AdSense Program Policies .) When will they realize that it's not the web, it's the sucky search algorithms that are so easily fooled by spammers. Stop buying every digital company under the sun and invest in your bread and butter business; you are not yet Microsoft.

Now onto Henk van Ess's blog about this "secret Google lab." He doesn't seem like the most trustworthy guy, especially since at the same time as he's breaking a big search-related story about Google's secret search quality lab, he is showing Google ads on his webpage. However, even though I approached his site with a great deal of skepticism, I could not deny that the flash animation of the internal Google tool for this supposed search quality team seems quite authentic. The screen shot that he has of the "Rater Hub" also seems incredibly Google'esque. There is also that cache on Google's own search of http://eval.google.com . Check out also this query for more remnants of eval.google.com .

(Now I know that Google has a tendency to cover it's tracks, so if it removes this entry from the index, email me; I saved a screen shot. )

Then there is ample evidence that Google is and has been looking for contractors to do QA work:
QUALITY RATER - (SPANISH, DUTCH, ITALIAN, FRENCH) This is a temporary role offered through Kelly Services. Google Inc. is recruiting part-time, temporary, home-based workers to help with work on a search quality evaluation on a project basis. You would work at your own pace, and the time and length of any particular work session would be up to you. Candidates will evaluate search results and rate their relevance. Thus, all candidates must be web-savvy and analytical, have excellent web research skills and a broad range of interests. Specific areas of expertise are highly desirable. We are looking for smart people who read voraciously and have a wide variety of interests. Raters should have all the following qualifications: Native-level fluency in Dutch, Italian, Spanish, or French In-depth, up-to-date familiarity with the web culture of at least one predominantly Dutch, Italian, Spanish, or French-speaking country. Excellent web research skills and analytical abilities. A high-speed internet connection. Legal eligibility to work in the Netherlands, Italy, Spain or France. Moderate ability to read and write in English. Perfect English is not necessary; however, you must be able to read and write English well enough to use software with an English interface, understand fairly complicated instructions written in English, and make yourself understood in informal written communication. The job involves frequent written communication with fellow Quality Raters. For immediate consideration, please send an ENGLISH text (ASCII) or HTML version of your resume to monsterjobs@google.com Important: The subject field of your email must include Quality Rater - TEMPORARY.

I'm not going to recount all of the contents of Henk's blog, but it does seem like this is fact despite my most heart-felt hope that this is a farce. I am praying for innovation and instead we're getting an old-fashioned assembly line. The search services still suck, are getting more tainted with ads, and more spammed. I hope that we start thinking about taking matters into our own hands like we did with Firefox. Grokker, Teoma, Mindset are steps in the right direction, but we really ought to be at least jogging to make up for lost time.


Wow. I was checking my stats on Statcounter.com and noticed that I've been getting a lot traffic from Technocrati for the search term Grokker. I am not sure how I got on there so quickly, but it gives me even more impetus to hold a regular writing schedule. So, please enjoy Dumb Search. To celebrate, I have a great story.  Posted by Hello

Wednesday, June 01, 2005

Yahoo Mindset

Check out Yahoo Mindset ( http://mindset.research.yahoo.com/ ). I think that this is a great start. The number one vector to divide is the commercial vs. Informational query results since 80% of queries are strictly informational in nature. Mindset allows the user to give more weight to either commercial results or informational results using a slider. I liked it a lot, and although they have to perfect their algorithm, it is going a long way to ushering in a new generation of internet search services.

My feedback to Yahoo re: Mindset submitted on their site:

First, congratulations on beating Google to the punch one more time. I have never been satisfied with Yahoo query results, but after using Mindset I can say that I will probably switch to the Mindset beta in lieu of Google (at least 50/50) because for the first time a search puts the power in the users' hands. I rarely search for products using SE's and have often been frustrated by the numerous irrelevant results for commercial sites that my queries produced. I have tried services like Teoma and Grokker to name just a few. I haven't been fully satisfied with any of them. Grokker, a Yahoo partner, goes a long way to providing results in a digestible manner since only 20% of users go beyond the first page of a classical SERP. However, Grokker is not robust enough. At the most granular level it has one result instead of 20. Teoma's groupings are pretty good, but I think that algorithm behind it isn't very strong and it has a way to go. All that said, the most important vector to split is the commercial vs. Informational results vector. Since 80% of queries are for information-only purposes, it makes sense that we separate the commercial results for the majority of queries. As far as I know, you guys are the only ones to do that effectively to date. I love it. It goes a long way to refining my results.

However, I think that the more power you put in the users' hands, the more you will kick Google's behind and the more you will profit. Since Google is wasting it's time with fruitless conquests do more with the search, Yahoos! For one, put all the advanced functionality on the main UI instead of behind a link. The reason why there is a small click through to the advanced features is not only that people wouldn't use them but also because it is more tedious to click that damn link. Also, make a better graphical GUI for the functionality; most users aren't programmers and shy away from Boolean logic. I think that another great improvement that any search engine can make is adding the ability to add weights to your search terms. For example, if I am looking for treatments for poison oak, I don't want to give "treatments" as much weight at "poison oak" because I will be getting results for non-relevant treatments for things like a hang over.

I think that slider is a great tool. Give users a few fields for words and phrases and give those fields sliders to allow users to determine the weight on each of those words. Think about similar implementations in photo editing SW where you can adjust (give weight) to contrast, darkness, color, shades, etc.

Additionally, it would be great to see a Grokker-like implementation of Yahoo search. I think that it is brilliant how Grokker groups the results. It is simple to user, intuitive, and powerful. The only drawbacks are speed and robustness as mentioned above. However, with increasing connection speeds the quickness becomes less of an issue. The robustness can be very easily solved; just list the top 20 results for the most granular level. Also, it would be nice if clicking on the larger categories gave a compiled list. Buy Grokker and expand it; they are really onto something.

From a user's perspective, I don't care that a search takes a millisecond if I then have to spend a couple of minutes sifting through the jumbled results. Give users improved methods to filter and digest the search queries (like Mindset) and you will win a huge market. Everyone is praying for Search 2.0 . Please take us out of the dark ages!

As a former Google employee, it takes a lot to break my loyalties and say "Good job Yahoo!" but you guys deserve it. Competition and innovation are great for customers and good for business.