mercredi, février 23, 2005

Google: Stabilisation of index size

I showed on January 22nd that Google's index size had increased, although the main page still said "Searching 8,058,044,651 pages". Using a number of queries as a probe, I estimated the increase to a factor of ca. 1.13.

The same technique applied a month later, on February 22nd, shows almost no change since January (slope of regression line = 1.01, or 1.14 since November).

The diagram below shows the stabilisation:

If we could trust the original self-reported figure ("Searching 8,058,044,651 pages") this would mean that Google's index has now ca. 9.2 billion pages. However, it seems that this figure includes both the main index (all the words on the page indexed, up to whathever cache limit they are using these days) and the supplemental index of pages that Google knows about, but for which only very few elements (URL, title...) are indexed. The main index is apparently only 60% of the whole, and numbers are probably artifically inflated by 66% to match the size of the whole database (see study here). Given the progression since November, the main index is therefore probably somewhere around 5.5 billion pages.

