Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

SUO: Building the hierarchy



Jon,

Some strategy for distintuishing the identifiers
used in different modules and relating them to
one another is essential.  I'm renaming this
thread "Building the hierarchy" because it
addresses some important general principles.
 
The first point to note is that every module in
the hierarchy should have a unique identifier.
Therefore, all names could be distinguished by
a concatenated string such as "moduleID.localId".
 
Keeping the names distinct is not a problem.
The real problem is to have a systematic way
of specifying the conditions for determining
how and whether identifiers in different
modules could be assumed to be "the same".
 
As you noted in your email, one might assume
that the SUMO names or the OpenCyc names had
been coordinated with one another so that
we could assume:
 
 1. Two names with the same spelling in two
    modules that had been extracted from
    some larger module would have the same
    intended referents.
 
 2. Two names, even with the same spelling, in
    indepedently developed modules could not be
    assumed to have the same intended referents.
 
The purpose of the lattice of theories (or at least
a generalization hierarchy that represents some
finite excerpt from or some finite step toward
such a lattice) is to relate the names in
different modules.  
 
At the start of the joint project, we begin with
a one-node hierarchy:  just the universal theory
T at the top (the one with no axioms at all).
Then we would put two big nodes on two separate
branches under T:  for example, OpenCyc on the
right branch and SUMO on the left branch.
 
The next step would be to extract smaller modules
(or microtheories) from the two big theories.
Each of the smaller modules would be more general
than the larger module from which it was extracted.
Therefore, all the SUMO modules would lie on some
branch between T and SUMO, and all the OpenCyc
modules would lie on some branch between T and
OpenCyc.  The hierarchy at this step is shown in
the attached file, which I also put on my web site:

   http://www.jfsowa.com/figs/suohier.gif

Following are some general principles, which apply
to this hierarchy and any others that are organized
as a generalization-specialization hierarchy:

 1. Every module is consistent with all its
    generalizations (i.e., those that lie on
    any upward path between it and the top).

 2. Every module is consistent with all its
    non-absurd specializations (i.e., those that
    lie on any downward path between it and the
    absurd theory at the bottom).

 3. Any two modules whose only common specialization
    is the absurd theory at the bottom are assumed
    to be inconsistent (unless proven otherwise,
    in which case their merger could be added to
    the hierarchy on a branch below each of them).

 4. Any two modules whose only common generalization
    is the universal (or empty) theory at the top
    are assumed to have nothing in common (unless
    proven otherwise, in which case their common
    generalization could be added to the hierarchy
    on a branch above each of them).

Constructing a hierarchy similar to suohier.gif
would be fairly straightforward, and even at the
beginning stage it would be a useful thing to have
 
Even more useful, however, would be a refinement
of suohier.gif to find commonalities between SUMO
and OpenCyc and to find features of each that could
be used to enrich the other (i.e., to fill in the
"no man's land" between the two branches).

The purpose of the lattice operators is to provide
guidelines that show where to look for the
commonalities and missing information and where
to put the results when they have been derived.

Bottom line:  Building the hierarchy can be done in
a step-by-step fashion, and even the early stages
can be useful to the SUMO and OpenCyc developers
and anyone else who needs ontology building blocks.

John Sowa

GIF image