Jean Véronis

vendredi, août 12, 2005

Yahoo: 19 billion pages?

Here’s the latest episode in the search engine war: Yahoo! has discretely announced that its search engine now indexes 19.2 billion pages... This is a new step in the firm’s strategy, since it never used to communicate about the size of its index. Google, meanwhile, is still announcing around 8 billion pages on its home page.

Should we believe these figures? Regular readers of this blog will have noticed that, over the past few months, I have mostly stopped mentioning the index sizes claimed by the different search engines: I have more than amply demonstrated that search engines can tell us whatever they like and fudge the numbers as and when it suits them (see my comments on Google, Yahoo and MSN).

Some, such as Google, really do take us for fools, and don’t even go to the trouble of ensuring the internal consistency of their figures. Although the figure announced on the Google home page remains virtually unchanged, for instance, the number of results returned by each request has been increasing quite substantially. With my usual lists of standard search requests, I have been able to see how the total number of results given by Google for these searches has risen by 75% in English and by 8% in French since March (which may confirm the impression held by some that Google is concentrating on the English-speaking world, something I’ve mentioned before). Over the same period, the number on the Google home page has only gone up from 8,058,044,651 to 8,168,684,336... Spot the difference!

Yahoo tries harder to be consistent. The number of results for individual searches has risen threefold for English between March and August, with 2.7 times as many results being returned for French:

These figures tally with the announcement of 19.2 billion pages indexed. In March I estimated the real size of Google’s index to be 5.5 billion pages, and Yahoo’s index to be at least the same size and almost certainly a little larger. Let’s say 6 billion. Multiplying this hypothetical base by 3 gives us 18 billion pages for Yahoo in August, which is in line with the figure they announced.

It’s interesting to compare the number of results returned by Yahoo and Google. In March, I showed how they were comparable (in fact, slightly higher for French with Yahoo). At the time of writing, this difference has grown considerably. The number of results returned by Yahoo is almost three times as high for English as those returned by Google, and more than four times as high for French (which seems to confirm the differences in terms of global strategy between these two search engines). A great majority of French web surfers use Google (far more than in the United States), but they may well be wrong to do so...

All of this should of course be taken with a large pinch of salt. So far, I haven’t quite caught Yahoo red-handed when it comes to fiddling the books, but this could simply be because they are smarter with their figures than their competitors ;-)

Follow up

5 Commentaires:

Anonymous Anonyme a écrit...

Are you counting actual results or the claimed number of results at the top of the page? reports that Yahoo's claimed numbers are up to 5x off of reality, and overall comes to the opposite conclusion that you do...

13 août, 2005 06:20  
Anonymous fuligineuse a écrit...

Commentaire qui n'a rien à voir. Je me trompe ou bien l'en-tête de ce blog a changé ? En tout cas l'actuel - et peut-être nouveau - est très élégant.

14 août, 2005 11:19  
Blogger Jean Véronis a écrit...

Anonymous> the opposite conclusion that you do : note my question mark in the title! I am very suspicious about these self-reported figures, and I have noticed the same problem with pages disappearing. I am trying to assess the situation and I hope I'll be able to post something in the next days.

14 août, 2005 12:13  
Blogger Jean Véronis a écrit...

Fuligineuse> Oui, je me suis amusé à faire un peu de graphisme. L'ancien n'était pas terrible (un truc par défaut de Blogger), mais je n'avais jamais trouvé le temps de m'y mettre...

14 août, 2005 12:14  
Anonymous Anonyme a écrit...

Similar to what the other commenter said, I suspect that Yahoo has inflated their estimations.

This graph shows a nearly vertical increase on about Aug 2.

16 août, 2005 11:40  

