mercredi, avril 27, 2005

Web: Yes or "Non" to the Constitution ?

As many of you must have heard, there will be a referendum in France on May 29th about the project of European Constitution, and the debate is quite animated at the moment -- with a majority in favor of "Non" in the opinion polls, and desperate efforts from the French president Chirac, and others, to change that trend before the D Day.

I wondered whether the Web says more "Oui" or "Non", when it comes to the European Constitution. It is very easy to check with a search engine like Yahoo (search restricted to pages in French) :

"Constitution européenne" oui -non135 000
"Constitution européenne" -oui non521 000
"Constitution européenne" oui non643 000
"Constitution européenne" -oui -non528 000
"Constitution européenne"1 890 000

The first query returns the pages that contain the phrase "constitution européenne" (quotes are important!), the word oui, but not the word non (the minus sign is an exclusion operator). The second returns pages that contain "constitution européenne", non but not oui, etc... This is the good old Venn diagram that we (used to ?) learn in high school :

Diagramme de Venn

The total is not exact (it should be according to set theory), but search engines do approximations for Boolean queries. Yahoo actually does a quite reasonable job. The error is only about 3%. Google counts are completely bogus as I have shown before on this blog, and therefore it can't be used for this type of study.

If we look at the pages that contain oui or non, without containing the other word, we see that the non pages are 4 times more numerous than the oui pages.

This is a bit surprising, but we have to be careful since non is always more frequent than oui on the Web, all topics together. It happens in many languages (much more so in English as we will see below). Deep negativity of the human being, or hidden linguistic factors? This would be the topic of another post. The results for the Web as a whole are as follows :

oui -non13 500 000
-oui non
40 900 000
oui non12 800 000

Let's not jump too quickly to conclusions. The non pages are always more important than the yes pages (from now on, I will speak only about the pages that do not contain the opposite word), but on the web as a whole, they are only three times more numerous, as opposed to four on pages with "Constitution Européenne". Statisticians use a measure called "odds ratio", which is simply the division of one by the other. Here the odd ratio (in favor of non) is about 4/3, more exactly 3.9 / 3.0 = 1.27. In other words, one has 27% more chances to find a non than a yes when the Web speaks of "Constitution européenne".

One commentator on my French version of this post remarked that the numbers could be biased by the pages about the current opinion polls -- whose findings are in favor of non. Very interestingly, if we subtract the pages that contain poll (i.e. sondage in French), the odd ratio in favor of non is even greater since it jumps to 2.5 ! Vraiment non !

The comparison with the English-speaking Web is striking. Here are the results, still with Yahoo (English pages only this time) :

"European constitution" yes -no5 830
"European constitution" -yes no132 000
"European constitution" -yes -no128 000
"European constitution" yes no99 500
"European constitution"371 000

The total number of pages about "European Constitution" is surprisingly low! We have seen above that it was close to 1.9 million for French, and we know that the French pages are far less numerous on the Web. I made a quick estimate by querying Yahoo with 50 language-independent "words" (http, www, numbers, etc.), according to the technique I described here. The number of French Yahoo pages is about 5.7% of the number of English pages as of today (April 27th), as can be seen on the following diagram (I don't want to be too technical, but the slope of the regression line in pink gives the proportion).

correlation yahoo français-anglais
I would therefore expect 371 000 * 0.057 = 21 147 pages containing "Constitution Européenne" in French. Instead we found 1.9 million, i.e. 90 times more. The conclusion is clear : the debate is quite lively at the moment in France!

As far as yes and no are concerned, we can see that no is much more frequent that yes :
yes -no43 200 000
-yes no1 190 000 000
yes no163 000 000

This unbalance is much more pronounced than in French, since there are 28 times more no's than yes's (this is probably due to the different linguistic role of no in English : for example determiners such as aucun in "aucune loi" translate as no in English : "no bill"). In any case, no is only 23 times more frequent in the pages containing "European Constitution". The odd ratio is this time 1.22 (i.e. 22% more), but in favor of yes.

These amusing statistics thus reveal that France is already the "black sheep" of Europe, at least on the Web, to use President Chirac's own terms ;-)

Anonymous Anonyme a écrit...

La constitution elle-même contient beaucoup de 'non' mais pas de 'oui'...

02 mai, 2005 17:07  
Blogger Jean Véronis a écrit...

Exact. Il n'y a pratiquement jamais "oui" dans les textes juridiques ou administratifs. Il y a 103 occurrences de "non" dont :

* 17 dans des expressions du type "rémunérée ou non"
* 86 comme modifieur d'ajectif ou de nom ("non contractuelle", "non-discrimintation")

On n'a jamais les positions dans lesquelles un "oui" pourrait apparaître, comme Verbe + oui ("voter oui", etc.).

02 mai, 2005 17:18  
Anonymous Anonyme a écrit...

Now that Spain has voted for the constitution it would be interesting to know what was the correlation between the frequency of "constitucion europea" and "si"/"no" and the outcome of the election.

24 mai, 2005 01:50  

