mardi, novembre 27, 2007

Search: Google-Yahoo Comparison

I have just conducted an assessment of the relevance of the Google and Yahoo search engines (French versions and 70 users (students) each made 20 queries that they chose freely from 10 themes suggested at random (2 queries per theme, 1,400 queries in total):
  • News
  • Animals
  • Geography-travel
  • Literature
  • Music
  • Nature
  • Celebrities
  • Politics
  • Health
  • Sport
Each student was asked to note the quality of the first link proposed by each of the two engines blindly (i.e. no information enabled the engines to be identified). The links were presented in random order to avoid any bias. 2,800 pages were thus viewed and assessed. After examining the pages, the user was invited to enter a mark on a scale from 0 to 5 for each:
  • 0 = Completely dissatisfied with the result
  • 1 = Dissatisfied with the result
  • 2 = Rather dissatisfied with the result
  • 3 = Generally satisfied with the result
  • 4 = Satisfied with the result
  • 5 = Completely satisfied with the result
Google comes slightly ahead of Yahoo, by 0.2 points (3.6 compared with 3.4). The difference is not huge, but it is statistically significant (t test: p = 3.5 x 10-5). Needless to say, this test is only partial, as, for reasons of human cost, it only concerns the first link proposed by each engine. However, it does provide an interesting indication of the relative relevance of each engine. I also note a very slight progression by Google with respect to a comparison I made in April [fr] using the Voilà engine (and not, unfortunately, Yahoo) with exactly the same protocol. However, this difference is not statistically significant.

The detailed examination of links returned is equally instructive. The first link offered by Google and Yahoo is identical in 27% of cases. In a previous study (using a slightly different protocol), conducted in December 2005, the proportion was 24%. The order of magnitude is thus similar.

The most surprising result came from the use of Wikipedia. This use was marginal in December 2005 (see study). At the time, for all 10 results on the first page, 2% of the links proposed by Google and 4% of those proposed by Yahoo came from Wikipedia. On the first link alone, Google offered no Wikipedia results (at least not in our sample) and Yahoo offered 7%.

The strategies have changed completely. Today 27% of Google’s results on the first link alone come from Wikipedia, as do 31 % of Yahoo’s.

How can this sudden interest in Wikipedia by both engines be explained? It is undoubtedly connected with the increasing difficultly engines have in calculating satisfactory ranking. The good old days of PageRank algorithms are over. It was quite well suited to a fairly stable network over time that was quite highly interconnected. The explosion of blogs and news sites has changed the situation considerably. The majority of the Web is now of a volatile and ephemeral nature. In all but exceptional cases, posts and news bulletins are hardly interlinked.

Faced with these difficulties, Wikipedia and a few other reference sites such as Doctissimo, Allociné, and major daily newspaper sites (La Tribune, Le Monde, Le Figaro, etc.) are gilt-edged securities. They have a good image, and on questioning users, I have established that even when Wikipedia and other reference sites don’t reply exactly to the question asked (for example in the news sector), the engine is well appreciated nevertheless. The user generally thinks “it’s not what I’m looking for, but it’s relevant all the same”.

The average mark allocated by users when the result is in Wikipedia is nearly one point higher, in the case of Google and Yahoo, than the mark allocated to other results.

Hence, pushing Wikipedia to the maximum is a paying strategy with relatively low cost. It is, however, dangerous. When users come to realize that, for example using the Firefox search bar, they can search directly in Wikipedia if they want encyclopedic information, in Wikio for news and blogs, in Allociné for movies and so on, the concept (outdated, in my opinion) of the general search engine will have had its day. Its limits are already being felt.

7 Commentaires:

Anonymous Anonyme a écrit...

Bravo pour votre apparition sur Techmeme ( C'était à désespérer de ne voir que des sites US, et toujours les mêmes. Du coup, je me demande bien comment vous arrivez à être dans le radar de Techmeme...

-Stephane Rodriguez

29 novembre, 2007 18:19  
Anonymous Doug Stewart a écrit...

Google is widely perceived by the public as having better quality search results than Yahoo. It was extremely interesting to read your report that indicates otherwise. Are you aware of any studies into the source of this perception?

As to Wikipedia, I've noticed the same thing the last few months. However, I thing that the user ratings assigned to search engines that reply with Wikipedia will depend largely on the nature of the query. Wikipedia is essentially an on-line encylopedia. As such, it is useful for queries of a historical, geographical or scientific nature. On the other hand, it is quite poor with dynamic information (e.g. consumer items) or non-encylopedic information. As a good illustration of the later, if you search on "French Food" because you want to know the history of French cuisine, it is fine. But if you want some recipes, forget Wikipedia.

29 novembre, 2007 22:54  
Blogger Jean Véronis a écrit...

Stephane Rodriguez> je me demande bien comment vous arrivez à être dans le radar de Techmeme -- je n'en ai pas la moindre idée !

30 novembre, 2007 18:05  
Blogger Jean Véronis a écrit...

Doug> The perception of Google being immensely superior to Yahoo is probably due to clever buzz and marketing. In the facts, we see that the difference is not so great (in a previous study I even showed that Yahoo was superior if we took the average of the 10 first SERPs). I am not aware of studies about the sources of quality perception by the public.

Wikipedia is not a good source for many queries, that's true. But as I said, even if you do not find exactly the answer to you question (which is often the case about news for example), it still gives a good image of the engine, since the answer makes sense and has a "reference" flavor. If type "Sarkozy" trying to find information about his new alleged liaison with a journalist, you won't find it on Wikipedia, but you'll find his biography, and you'll say: "Ah ok, makes sense". Same with aubergines or mushrooms: you won't find recipes, but you'll find something that at least looks right, not spam.

04 décembre, 2007 08:16  
Blogger Luk a écrit...

Juste pour signaler que vous avez été cité indirectement sur le Wikipedia Signpost, l'hebdomadaire de la Wikipédia anglophone:

Sinon c'est un article très intéressant. Je me demande combien de temps il faudra a Google pour déscendre de son piédestal

04 décembre, 2007 11:10  
Blogger Jean Véronis a écrit...

merci pour le lien, Luk, je n'avais pas vu.

04 décembre, 2007 11:21  
Anonymous Janos a écrit...

Oui ca me rappelle cet post :
Sinon une nouvelle polémique autour de la trop grande puissance de Google? :

05 décembre, 2007 15:19  

