lundi, septembre 29, 2008

Blogs: Turbulence ahead

A big clean is more commonly carried out in Spring... but there's no reason not to do so at the end of the summer! This indeed was my recent advice to Wikio regarding their famous Blog rankings. I told you recently that one of the projects which would be receiving my attention would be the rankings, in collaboration with you all. In fact, I completely reworked it, and, as promised [fr], I will provide you with the algorithm's details in the days to come. My first observation is that there was a significant amount of dust in certain nooks and crannies, which needed a little attention before we could progress and try to improve the rankings as a whole. This is not a criticism: such a ranking is an extremely technical undertaking and even the very big names have troubles with it (Technorati for example [fr]).

So, the various Wikio teams have spent September with broom in hand and the results are likely to ruffle a few feathers... There will surely be some grinding of teeth (there always is: not everyone can be on top), but the engine is now much cleaner. Several of you had noted that there were inactive blogs that had stuck around in the rankings, even though they had not published for a few weeks. Well no more - they're out. I got our developers to create several indicators, one of which flags up publication volume, that allow us to more closely follow the behaviour of the tens of thousands of sources in our database. All such blogs who had not published for four months have thus been jettisoned. Other indicators were a little more difficult to implement, but now in place they allow one to assess the similarity between sources and so address spammers, aggregators and multiple posting (which is sometimes legitimate, but such activity can seriously affect the analysis of backlinks, and thus the rankings as they are based solely on this criterion). So out also with aggregators and other doubles (a lot of the recent work was precisely this, dealing with the enormous presence of source duplication which is a delicate and extensive process).

I also implemented a small change, which has no bearing on the overall principle, but improves the transition from one month to another. Many of you had seen that there was sometimes a yo-yo effect, whereby blogs suddenly lose a large number of positions, or the opposite, they shoot up the rankings like a rocket. This was largely due to the time period used when analysing backlinks. This period as you will know is four months, but say a blog is very heavily buzzed in April, it will then appear high up in the rankings from May to August and then (if it is not further talked about in the mean time), suddenly plummet in September. Not ideal clearly. I thus replaced the straight four-month calculation with a progressive attenuation over nine months. So September's links have a value of 1, August's a value of 1 – 1/9, July's 1 – 2/9 etc. etc. The variations are now a lot more temperate.



Obviously this month there will still be a lot of change in the rankings as many things have been adjusted. The good news is that the clearing out of moribund or spammer blogs has cleared a number of places, and there are thus more blogs on their way up than on their way down. I don't yet wish to reveal the rankings as verifications are still being carried out, but there are some noteworthy and indeed worthy leaps. A few falls as well but that is to be expected. The summer entailed a drop in activity for many blogs but that is true everywhere (you will have likely seen the report on Technorati). It is of course up for analysis, but we hope at least to have provided an improved and cleaner ranking.

