Jean Véronis
Aix-en-Provence
(France)


Se connecter à moi sur LinkedIn Me suivre sur Twitter Facebook RSS

dimanche, mars 13, 2005

Web: Google adjusts its counts




Read follow up

23 mar - Google: 5 billion "the" have disappeared overnight



In a previous study, I have shown that Google's counts are probably inflated in a substantial way:
The Googlers must have been slightly embarrassed, and since the study was published (Feb. 8th), they have been adjusting the counts in a major way to correct the situation. I checked the same words lists in English and French one month later, on March 8th, and the figures have radically changed.

The count of results for English words on the entire Web has slightly decreased (by a factor of 0.8), whereas the count for French words is stable.


English words (entire Web)

French words (entire Web)

However, at the same time, the counts for searches restricted respectively to English and French pages have increased, by 1.2 for English and 1.4 for French.


English words (English pages)

French words (French pages)

This means that the ratios Web/English and Web/French have changed in a major way. They now reach 84% for English and 78% for French. If we assume that the proportions given by Yahoo are correct, this gives estimates of 90% for the size of the main index in English, and 80% in French. This is a major change from the 60% I reported in early February, and brings Google closer to credible figures, such as Yahoo's. The figure below summarizes the situation.


English

French

It is worth noting that nothing changed much at MSN, neither in absolute counts nor in proportions, and their results seem still inflated in the same way as before [see study on MSN]. Yahoo's proportions are identical, although their absolute counts have recently doubled [see study on Yahoo] -- therefore they are still consistent, as before.

There are of course two hypotheses to explain the changes :
  • Pages that were simply listed as URLs in the supplemental index have now been fully indexed, and as a result the proportion of the main index has considerably increased.
  • The index proportion is still the same, but the extrapolation formulas are being tuned to become progressively more realistic, and to eventually hide the main/supplemental index organisation.
I hope, of course, that the first hypothesis is the right one, but it is impossible to tell without additional tests.

If we believe Google's and Yahoo's new counts, Yahoo still indexes more pages than Google, and the ratio in favor of Yahoo is now
  • 1.6 times for English
  • 1.8 times for French.


English


French



Read follow up

23 mar - Google: 5 billion "the" have disappeared overnight


Libellés :


0 Commentaires:

Enregistrer un commentaire