Web: Yes or "Non" to the Constitution ?
As many of you must have heard, there will be a referendum in France on May 29th about the project of European Constitution, and the debate is quite animated at the moment -- with a majority in favor of "Non" in the opinion polls, and desperate efforts from the French president Chirac, and others, to change that trend before the D Day.
I wondered whether the Web says more "Oui" or "Non", when it comes to the European Constitution. It is very easy to check with a search engine like Yahoo (search restricted to pages in French) :
|"Constitution européenne" oui -non||135 000|
|"Constitution européenne" -oui non||521 000|
|"Constitution européenne" oui non||643 000|
|"Constitution européenne" -oui -non||528 000|
|"Constitution européenne"||1 890 000|
The first query returns the pages that contain the phrase "constitution européenne" (quotes are important!), the word oui, but not the word non (the minus sign is an exclusion operator). The second returns pages that contain "constitution européenne", non but not oui, etc... This is the good old Venn diagram that we (used to ?) learn in high school :
The total is not exact (it should be according to set theory), but search engines do approximations for Boolean queries. Yahoo actually does a quite reasonable job. The error is only about 3%. Google counts are completely bogus as I have shown before on this blog, and therefore it can't be used for this type of study.
If we look at the pages that contain oui or non, without containing the other word, we see that the non pages are 4 times more numerous than the oui pages.
This is a bit surprising, but we have to be careful since non is always more frequent than oui on the Web, all topics together. It happens in many languages (much more so in English as we will see below). Deep negativity of the human being, or hidden linguistic factors? This would be the topic of another post. The results for the Web as a whole are as follows :
|oui -non||13 500 000|
|-oui non||40 900 000|
|oui non||12 800 000|
Let's not jump too quickly to conclusions. The non pages are always more important than the yes pages (from now on, I will speak only about the pages that do not contain the opposite word), but on the web as a whole, they are only three times more numerous, as opposed to four on pages with "Constitution Européenne". Statisticians use a measure called "odds ratio", which is simply the division of one by the other. Here the odd ratio (in favor of non) is about 4/3, more exactly 3.9 / 3.0 = 1.27. In other words, one has 27% more chances to find a non than a yes when the Web speaks of "Constitution européenne".
One commentator on my French version of this post remarked that the numbers could be biased by the pages about the current opinion polls -- whose findings are in favor of non. Very interestingly, if we subtract the pages that contain poll (i.e. sondage in French), the odd ratio in favor of non is even greater since it jumps to 2.5 ! Vraiment non !
The comparison with the English-speaking Web is striking. Here are the results, still with Yahoo (English pages only this time) :
|"European constitution" yes -no||5 830|
|"European constitution" -yes no||132 000|
|"European constitution" -yes -no||128 000|
|"European constitution" yes no||99 500|
|"European constitution"||371 000|
The total number of pages about "European Constitution" is surprisingly low! We have seen above that it was close to 1.9 million for French, and we know that the French pages are far less numerous on the Web. I made a quick estimate by querying Yahoo with 50 language-independent "words" (http, www, numbers, etc.), according to the technique I described here. The number of French Yahoo pages is about 5.7% of the number of English pages as of today (April 27th), as can be seen on the following diagram (I don't want to be too technical, but the slope of the regression line in pink gives the proportion).
I would therefore expect 371 000 * 0.057 = 21 147 pages containing "Constitution Européenne" in French. Instead we found 1.9 million, i.e. 90 times more. The conclusion is clear : the debate is quite lively at the moment in France!
As far as yes and no are concerned, we can see that no is much more frequent that yes :
|yes -no||43 200 000|
|-yes no||1 190 000 000|
|yes no||163 000 000|
This unbalance is much more pronounced than in French, since there are 28 times more no's than yes's (this is probably due to the different linguistic role of no in English : for example determiners such as aucun in "aucune loi" translate as no in English : "no bill"). In any case, no is only 23 times more frequent in the pages containing "European Constitution". The odd ratio is this time 1.22 (i.e. 22% more), but in favor of yes.
These amusing statistics thus reveal that France is already the "black sheep" of Europe, at least on the Web, to use President Chirac's own terms ;-)