Jean Véronis
Aix-en-Provence
(France)


Se connecter à moi sur LinkedIn Me suivre sur Twitter Facebook RSS

jeudi, septembre 15, 2005

Splogs: Antisplog.net system



Hatem from Antisplog.net has left a comment on my post "Google, Blogger and splogs", asking for my opinion about his site. Antisplog.net is an on-line service launched a few days ago, that enables you to check whether a given URL is likely to be a splog.



As explained here, to use it, you simply send the query:
  • http://www.antisplog.net/check/the_url_to_check
where the_url_to_check is the blog that you're trying to check.

Antisplog.net will return :
  • 1 : if the blog is detected as a SPLOG
  • 0 : if not.
  • 3 : if the URL don't open due to a DNS error, 404 error ... etc
I sent the set of URLs that I borrowed from Philip Lenssen, which I used in my previous post (only 42 respond this morning). The results are quite impressive:

Correct


Normal17

Spam22

Total correct39 (92%)

Wrong


Normal (false positives)2

Spam (false negatives)1

Total wrong3 (8%)


A success rate above 90% is quite impressive for a system that young, especially since, as I noted before, some of these splogs are quite difficult to tell apart from normal ones, even for the human eye. Congratulations then. I'll be following how the system develops with great interest.

If I can give one piece of advice for the future, I would try to decrease the false positive rate (i.e. normal blogs reported as spam). At the moment, this rate is 2/19, i.e. ca. 10% (although of course a precise assessment is difficult on such a small number of URLs). It seems to me quite dangerous to report legitimate blogs as spam, and I would be happier that this rate fall well below 1%, even if the price to pay is to let more splogs through the net.

Of course, spammers monitor all this (see here for instance), and I am pretty sure that they will come up soon with splog- generating software to produce human-looking texts which will be extremely difficult to tell apart from real human texts by automatic means.

Anyway, congratulations again, Hatem, and good luck with your system!

1 Commentaires:

Blogger JoeC a écrit...

Some spammers are already creating splogs with human created text. They just steal text from other sites (Wikipedia being an obvious choice).

But even with actual human created text there are still characteristics splogs do not share with normal blogs. They are much harder to detect by a human unless you recognize the text is stolen, but hopefully AntiSplog.net can identify most of them based on their other spammy characteristics.

16 septembre, 2005 00:13  

Enregistrer un commentaire