Jean Véronis

jeudi, mars 04, 2010

Ontologies: Perl is a planet in the solar system

Lately, I've been working on Wikipedia, both an unprecedented human adventure (I wouldn't have bet two cents on its survival a few years ago) and a reservoir of fantastic resources for natural language processing. In particular, it is a huge ontology, i.e. a structured knowledge tree, people have dreamed of building for centuries . I alluded to this in my last slide here [fr]: since the Sumerians via Raymond Lulle, Leibnitz and the Encyclopaedists we have been searching — and the semantic Web is the latest invention that aims to organize Everything.

Wikipedia's knowledge tree is navigable online:
These are interesting URLs to know (I'm thinking of secondary school teachers: what a great source for practical work!).

Maybe you know the Perl programming language — I'm a big fan, but let's leave that to another post. I used the corresponding page in Wikipedia as a test to determine if I could correctly find its place in the Wikipedian knowledge tree using my little homemade programmes.

Let's follow the category links going up through the tree. The links are at the bottom of the page: the Perl page belongs to all these categories:

Ah... so apparently it's not a tree. Or maybe one of those Indian banyans I frequently refer to, whose branches connect and merge... Anyway, as long as there is no loop (I don't wish to be pedantic, but if there is a Directed Acyclic Graph), it is possible to build an ontology. It's common enough:

But it nevertheless requires some care in building the links, and you quickly get lost.

So let's follow the links on our Perl page. It's an American invention. Ok. Back up. To be brief, here is the path I followed at random among all the possibilities:
Perl is a therefore a planet in the solar system. QED.

Don't think that this is an isolated example. It is by far the rule, given the immense complexity of the graph. What a shame... That means that there is a huge amount of work to be done to be able to exploit Wikipedia. At least using automatic means, it is difficult. The whole effort (unprecedented in the history of Humanity, I repeat), should be praised, but to be able to properly exploit the knowledge in it, a little structure will be required...

2 Commentaires:

Anonymous Karadimas Harry a écrit...


J'apprécie (comme toujours) beaucoup vos articles; et ici un autre exemple (il y en a plein) qui va de "Organisation des premiers secours", et qui aboutit à "Président du Mouvement des jeunes socialistes" ...

Je mets cela dans les erreurs de jeunesse, nul doute que cet outil s'affinera, il constitue déjà une mine formidable à la fois pour l'utilisation, mais aussi pour fournir des sujets de recherche !

Harry Karadimas

12 mars, 2010 15:19  
Blogger Jean Véronis a écrit...

Ah oui, bel exemple ! merci, je le ressortirai ;-)

12 mars, 2010 15:26  

