Jean Véronis

Se connecter à moi sur LinkedIn Me suivre sur Twitter Facebook RSS

lundi, octobre 03, 2005

Blogs: The last will never be the first

The site Technorati ranks blogs based on their popularity. The metric used is simple: Technorati takes into account the number of sites that point to the blog in question (and not the number of links), over a period of the last six months. I see that Technologies du Langage is ranked 4724th, with 655 links from 210 different sites (it’s this last figure that’s taken into account for the ranking).

Not bad for the old ego! 4724th out of the 18.7 million blogs currently tracked by Technorati is no mean feat! All the more so since the disproportion between languages means that blogs in French are at somewhat of a disadvantage (although, I admit, this one is a bit bilingual). The first blog written in French appears to be Standblog (197th) [please correct me if I’m wrong]. No French blogs appear in the Top 100.

I took a closer look at how this relationship between ranking and number of referring sites works in practice, by carrying out a survey of around one hundred blogs that go from one end of the ranking to the other. Technorati seems a little buggy: sometimes the ranking is mentioned, sometimes it isn’t, but by using the Web interface and the API, I was able to obtain an indication of the ranking for most of the sites I was looking at. As you might expect, the relationship roughly follows a “power law”, i.e. if we put the ranking on one axis, and the number of sites on another, and we put the whole thing in logarithmic coordinates, we get a more or less straight line:

Such an organisation can be found in a large number of fields that have very little in common, such as the vocabulary of a text (the famous Zipf’s law, which I alluded to briefly while on the subject of spam, and to which I must certainly return one day), social relationships, the physical structure of the Internet, or the hypertextual organisation of the Web. It’s so surprising that books have even been written about it...

There’s therefore nothing unusual about the fact that blogs follow this sort of law, but it’s interesting to note that the curve is only linear in the upper part (pink line). From numbers 5000 to 10,000 in the ranking onwards, it lessens to take on a parabolic shape (blue line), moving progressively away from the power law. In a way, there are “too many” blogs who have few incoming links. Unless there’s some horrible Technorati bug, this looks like it’s due to the invasion of spam blogs or "splogs", which as we know now make up an ever-increasing part of the blogosphere (Philip Lensenn recently counted as many as 60% on Blogger – see here). It’s all but impossible (or certainly too costly) for spammers to have hundreds of sites that reference them, but splogs with no references or only a few are legion (just type keywords like “Viagra” or “Babe” in Technorati and see for yourself). This is certainly what’s dragging the curve down.

Splogs or not, the "power law" in question can lead bloggers to despair: it means that a tiny minority of blogs get nearly all the references, while the immense majority of blogs are not quoted (or perhaps even read) by anyone, or certainly by very few people ... In fact, from the 777,745th spot onwards, each blog is only referenced once. Obviously, there has to be a cut-off point, and Technorati doesn’t rank those blogs that are not referenced at all. Nonetheless, we can make an estimate based on the last ten known ranks:

If we extrapolate the curve, we can estimate the number of blogs referenced by a single site to be around 460,000. If we add this number to the previous 777,744, we have an estimate of around 1,235,000 blogs that are referenced by at least one site. That means that 17.5 million blogs are not referenced by anyone, which is more than 93% of all blogs. Does anyone read them? Many of them are undoubtedly spam, as I said before. Others are blogs that have just been created. And others still are blogs of no interest whatsoever that will, in all probability, not last very long at all.

A few (but how many?) will manage to climb up the ranking … They may even make it into the Top 100 one day, but don’t count on it! Technorati recently adopted a limit of six months when calculating referring sites; without a time limit, a new site has virtually no chance of appearing in the upper echelons of the ranking, simply because of the inertia of the “big guys” already in place. Indeed, even with this limit, chances are still virtually nil. Cases like that of Michael Barnett’s blog (interdictor), which came in 90th after just a few months, are complete exceptions and it takes events on a global scale to push up a blog up the slope so quickly (in this case, his coverage of Hurricane Katrina) . Note also that interdictor has already slipped to 100th place (the Top 100 list given by Technorati is out-of-date).

To be in the Top 100, at the time of writing you need to be referenced by 1973 sites. That may not seem much, but it’s not easy (since only 100 blogs have managed it ;-). The table below gives the number of sites you need to be referenced by to make it into the Top 100, 1000, etc:

TopNumber of referring sites

Oh dear! Even if the number of sites talking about Technologies du Langage were to double (go ahead, friends, link to me!), this blog would never get beyond the 1600th place in the world ranking. And what’s more, since that won’t happen overnight, the “big guys” will also be referenced more and more, and the borderline will have moved even further out of reach. It would take a hurricane in the ICT world for me to make it into the Top 100 or even the Top 1000… Giving Google a dressing down will never be enough.

Sniff. I think I’m going to write about celebs instead.

Libellés :

0 Commentaires:

Enregistrer un commentaire