Jean Véronis

Se connecter à moi sur LinkedIn Me suivre sur Twitter Facebook RSS

mercredi, mars 09, 2005

Web: Yahoo doubles its counts!

Read follow up

13 mar - Google adjusts its counts

In my Monday post, in which I have shown that Yahoo indexes more pages than Google, I used the data that I had gathered on February 6th on my series of probe words. The idea behind this decision was to make the results comparable across my series of studies, of which this last one was a closure. My assumption was that the engines had not changed in a major way since then (which seemed confirmed, at least for Google, by this test).

However, I was wrong. Just to make sure, I ran the same queries yesterday, March 8th, and to my great surprise, I saw that Yahoo's results had doubled in a month. I was so puzzled that I ran the computations twice. However, there is no doubt about it. Something major happened.

The scatterplots below give the evolution between February and March. Complete results are here for English, and here for French (The data were obtained at as in February -- there are slight differences at

Yahoo (English) - March ~ February

Yahoo (French) - March ~ February

The fact that the results are multiplied almost exactly by two and that they line up so perfectly on the regression line (with a determination coefficient R2 > 0.99) is extremely troubling. It is extremely unlikely that a natural increase of the index, i.e. obtained by crawling additional pages on the web, would produce such a pattern. Something needs to be explained.

I can see four hypotheses to explain this strange correlation.
  1. Yahoo doubled its index size since early February. However, in this case the too-perfect correlation needs to be explained.
  2. It is a bug. Some programmer mistyped a line of code somewhere. This kind of things happens -- but is is strange that it would go unnoticed.
  3. Yahoo has decided to inflate its counts by 100%. But such an enormous increase, so mathematically perfect, seems a bit silly since it guaranteed that they would be caught.
  4. Yahoo did have a larger index for quite a while, but they were dividing their results by two previously, for strategic reasons, for example waiting for the right marketing moment to make a world-wide announcement.
If, after Google and MSN, Yahoo! were also manipulating their counts, it would be extremely disappointing. So far, it was the only engine that returned coherent counts, as I mentioned in several of my posts, and I had hoped that they were sincere. Yahoo has clearly caught up on Google in terms of size and quality (relevance, freshness, etc.), and is beginning to gain more and more respect among professional users, experts, academics (a good step was the release of a very nice API a few days ago). It would be sad if they ruined this emerging movement by such as stupid move.

Therefore, I do hope that Yahoo's index had really doubled, and that there is a technical explanation for the too-perfect correlation. In any case, Yahoo should probably communicate about this. An explanation is needed to wipe out the doubts. I know they read my blog. If they say anything I'll be pleased to relay.

Read follow up

13 mar - Google adjusts its counts

0 Commentaires:

Enregistrer un commentaire