vendredi, février 10, 2006

Web: A short study in pornometry (1)

In the current climate, where the trend seems to be towards making it easier to kick people out of office (see here), the administrations should make use of modern tools for the cybersurveillance of their agents. Corporations could also use them (along with graphology and astrology) as a way of deciding whether to fire their “hot” staff … Needless to say, Google – which can be used for everything – can also be used in this domain. How it works is quite simple. Type a person’s name twice in Google: once with the SafeSearch filter activated, once the SafeSearch filter deactivated (don’t forget to put the name between quotation marks) … One subtraction and one division later and voilà, you’re left with the number of pages where the individual in question is mentioned that Google considers to be pornographic! To make things even easier, a team of Jack-the-Lads have even developed a tool based on Google that does exactly that: the Slut-O-Meter.

When you encounter this kind of a tool, the first thing you usually do is type in your own name: the temptation is too strong to resist. Sadly, I’m no exception, and the results were quite clear: I’m a total perv.

Unfiltered, Google recognises a total of 607,000 pages that contain the word veronis (Google seems to have slimmed down a bit, as I was close to 2 million in September – it seems that the search engine was particularly hungry back then and would gobble up anything in sight) and only 376,000 of those pages are “safe”! In other words, nearly 62% of the pages containing my own sweet name are considered pornographic by Big Google! There aren’t too many of us Véronis out there, either, and certainly none of us are porn stars. I am in trouble, in deep trouble.

But hang on a second before you give up on me completely. These are pages that talk about me, but not necessarily pages that I wrote. Let’s look, for instance, at some other celebrities:

"Rocco Siffredi"91.23
"Linda Lovelace"79.65
"Jennifer Lopez"64.24
"Britney Spears"52.89
"George Clooney"28.97
"Brad Pitt"28.80
"George W. Bush"5.93
"Jacques Chirac"3.28

Quite clearly, these people haven’t written all those pages themselves. Some of them are fantasised over by half the planet – others, a little less. I knew I had a fan club [fr], but still, to find myself up there with Britney Spears and Jennifer Lopez is all a bit much …

My first thought was that I’d been spammed. The creators of fake sites, both porn sites and others, like to stuff their pseudo-pages full of all kinds of different words and texts, in order to (try to) trick the search engines … One very common technique consists of automatically gathering the results on Google for certain relevant searches, and simply copying the results into these dummy sites being built. And since I’ve mentioned sex a couple of times [fr: 1 and 2] and that I’m even top of the list for certain risqué searches [fr], I wouldn’t be surprised if my text had been leeched in this way.

Clearly, this is what has happened … And how do I know? Just type in the word veronis along with any other dirty word (for which I’m sure you don’t need a suggestion from me, since that might get me in trouble with my boss the Minister). You’ll find page after page of porno-spam like this:

Delireecom amateur Pages similaires GRATUITewwwesexeinsexeecom films En 2005 ultra perso, decouvrez du partouzes, Pages similaires achat liensedruunaenet googles Vidéo sexe xxx, Pages similaires : avertis liens videos.
hardcoreehtml (+4) vous Pages similaires gratuit, sexe.
shop, Allopasswwwetirez.
moiecomFilles .
pour 21 annuaire sexe .
veronis harde .

I haven’t chosen the “hottest” extracts either (I sometimes feel like there’s someone reading over my shoulder), but as you can see all of this is just rehashed Google results. The spammers haven’t even gone to the trouble of removing the famous “Similar Pages”.

So, does that explain why I’m a cyber-tart? In fact, no, it doesn’t. After carrying out some systematic searches, I discovered that Google only returns a few hundred pages of this sort containing the word veronis. The explanation lies elsewhere: despite my best intentions, I really am the author of these hundreds of thousands of disgusting pages. And here’s the proof. As you know, with Google you can limit your search to one particular site by using the keyword site:



Caught red-handed! I’ve put no fewer than 387,000 – 93,700 = 293,300 pornographic pages on the University’s server. My days are numbered …

But what are these pages, you must be wondering? Long-time readers of this blog may remember that back in April I made available a little search engine (a “concordance program”) for the European Constitution. It’s still online [here and here]. In its desperate quest for new pages with which to pump up the size of its index, Google fell into this involuntary “spider trap” I’d set, and during the summer indexed hundreds of virtual fragments of the Constitution (see here) …

What I didn’t realise was that Google obviously considered this draft Constitution a work of hardcore pornography. Certainly not something to be shown to children: some in France [fr] have even called the document obscene!

Read follow-up

1 Commentaires:

Blogger simple citoyen a écrit...

By the way, the Slut-o'meter has ceassed to exist...
Interesting subject for what it reveals on the intricacies of search indexes.

26 mai, 2010 14:15  

