Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: CG: More meat for the ontological stew



Quoting Jean-Luc Delatre <jld@club-internet.fr>:

> Another recent paper, ontologies for "the masses" don't work either:
> 
> http://shirky.com/writings/ontology_overrated.html
> 
> Cheers,
> 
> JLD
> 


Excellent publication, but categories and text-based search can 
benefit from each other rather than being mutually exclusive. 
Category trees are very efficient in annotation, and well-formed 
ontology-based annotations can be used in various ways.
Common-sense categories do not change as rapidly as the web 
changes.

A snip from the publication:
-----
What I think is coming instead are much more organic ways of organizing
information than our current categorization schemes allow, based on two
units -- the link, which can point to anything, and the tag, which is a way
of attaching labels to links. The strategy of tagging -- free-form labeling,
without regard to categorical constraints -- seems like a recipe for
disaster, but as the Web has shown us, you can extract a surprising amount
of value from big messy data sets.
-----


It is true that when trying to find an arbitrary site, google-type 
search is much more effective than any existing category search like 
ODP. However, manual ontology-based annotation is very effective. 
Because the ODP annotation system works poorly, most of the people 
cannot annotate their sites even if they wanted to. If ODP's 
annotation process worked better, it would be a very efficient 
annotation system; all the web users could annotate their sites, and 
the exact and wanted domain of their sites would be known. The 
annotations could be used by search engines. 

Nowadays, a user types a keyword and gets a list of sites ranked by 
an algorithm. If all users would annotate their sites, google could 
present a set of categories after the keyword search so that the user 
could select the domain of the keyword to constrain the search. And 
instead of just a list of categories, why not a list of categories 
that have subcategories, i.e., a category tree. ODP search does 
present a set of categories after the keyword search, but because of 
the mentioned reasons only a small subset of all the web pages are 
annotated in ODP.

The ODP annotation process/system has several disadvantages: 

-ODP categories have personal editors, and accepting a site depends 
of the editors. If an editor does not like a site because of some 
reason, the site is not accepted.

-No info is given to the annotator about the progress of the 
acceptance of a site.

-A site can be set only under one category. This is a completely 
outdated principle.

-Multiple inheritance of categories and/or the representation of 
the inheritance is implemented very unclearly. 

These problems are all solvable with today's techniques, unlike
automatic integration of ontologies and/or automatical creation 
ontologies based on web content. Every user who wants a site to 
be found would gladly annotate it, and in general, people are 
willing to participate to ontologization efforts. The willingness 
of people is a great potential. ODP's present category tree is 
nowhere near to perfect, but even that poor tree (that does aim 
to categorize the whole web) could be used quite well if the 
annotation process was robust. 

The editor-problem is very interesting -a whole branch of ontology 
research. How could we ensure the objectiveness and robustness of the 
annotation process? We should have many editors for one category, 
and the acceptance of a new site or category should be a matter of 
voting between the editors in some way, and the collaboration of 
editors should be managed by the system. To achieve this, we have 
to use ontologies of different purposes and levels of abstraction, 
and not just a swamp of links and tags all around the web. 

-A.Styrman