Re: CG: More meat for the ontological stew
Quoting Jean-Luc Delatre <jld@club-internet.fr>:
> Another recent paper, ontologies for "the masses" don't work either:
>
> http://shirky.com/writings/ontology_overrated.html
>
> Cheers,
>
> JLD
>
Excellent publication, but categories and text-based search can
benefit from each other rather than being mutually exclusive.
Category trees are very efficient in annotation, and well-formed
ontology-based annotations can be used in various ways.
Common-sense categories do not change as rapidly as the web
changes.
A snip from the publication:
-----
What I think is coming instead are much more organic ways of organizing
information than our current categorization schemes allow, based on two
units -- the link, which can point to anything, and the tag, which is a way
of attaching labels to links. The strategy of tagging -- free-form labeling,
without regard to categorical constraints -- seems like a recipe for
disaster, but as the Web has shown us, you can extract a surprising amount
of value from big messy data sets.
-----
It is true that when trying to find an arbitrary site, google-type
search is much more effective than any existing category search like
ODP. However, manual ontology-based annotation is very effective.
Because the ODP annotation system works poorly, most of the people
cannot annotate their sites even if they wanted to. If ODP's
annotation process worked better, it would be a very efficient
annotation system; all the web users could annotate their sites, and
the exact and wanted domain of their sites would be known. The
annotations could be used by search engines.
Nowadays, a user types a keyword and gets a list of sites ranked by
an algorithm. If all users would annotate their sites, google could
present a set of categories after the keyword search so that the user
could select the domain of the keyword to constrain the search. And
instead of just a list of categories, why not a list of categories
that have subcategories, i.e., a category tree. ODP search does
present a set of categories after the keyword search, but because of
the mentioned reasons only a small subset of all the web pages are
annotated in ODP.
The ODP annotation process/system has several disadvantages:
-ODP categories have personal editors, and accepting a site depends
of the editors. If an editor does not like a site because of some
reason, the site is not accepted.
-No info is given to the annotator about the progress of the
acceptance of a site.
-A site can be set only under one category. This is a completely
outdated principle.
-Multiple inheritance of categories and/or the representation of
the inheritance is implemented very unclearly.
These problems are all solvable with today's techniques, unlike
automatic integration of ontologies and/or automatical creation
ontologies based on web content. Every user who wants a site to
be found would gladly annotate it, and in general, people are
willing to participate to ontologization efforts. The willingness
of people is a great potential. ODP's present category tree is
nowhere near to perfect, but even that poor tree (that does aim
to categorize the whole web) could be used quite well if the
annotation process was robust.
The editor-problem is very interesting -a whole branch of ontology
research. How could we ensure the objectiveness and robustness of the
annotation process? We should have many editors for one category,
and the acceptance of a new site or category should be a matter of
voting between the editors in some way, and the collaboration of
editors should be managed by the system. To achieve this, we have
to use ontologies of different purposes and levels of abstraction,
and not just a swamp of links and tags all around the web.
-A.Styrman