SUO: Building the hierarchy
Jon,
Some strategy for distintuishing the identifiers
used in different modules and relating them to
one another is essential. I'm renaming this
thread "Building the hierarchy" because it
addresses some important general principles.
The first point to note is that every module in
the hierarchy should have a unique identifier.
Therefore, all names could be distinguished by
a concatenated string such as "moduleID.localId".
Keeping the names distinct is not a problem.
The real problem is to have a systematic way
of specifying the conditions for determining
how and whether identifiers in different
modules could be assumed to be "the same".
As you noted in your email, one might assume
that the SUMO names or the OpenCyc names had
been coordinated with one another so that
we could assume:
1. Two names with the same spelling in two
modules that had been extracted from
some larger module would have the same
intended referents.
2. Two names, even with the same spelling, in
indepedently developed modules could not be
assumed to have the same intended referents.
The purpose of the lattice of theories (or at least
a generalization hierarchy that represents some
finite excerpt from or some finite step toward
such a lattice) is to relate the names in
different modules.
At the start of the joint project, we begin with
a one-node hierarchy: just the universal theory
T at the top (the one with no axioms at all).
Then we would put two big nodes on two separate
branches under T: for example, OpenCyc on the
right branch and SUMO on the left branch.
The next step would be to extract smaller modules
(or microtheories) from the two big theories.
Each of the smaller modules would be more general
than the larger module from which it was extracted.
Therefore, all the SUMO modules would lie on some
branch between T and SUMO, and all the OpenCyc
modules would lie on some branch between T and
OpenCyc. The hierarchy at this step is shown in
the attached file, which I also put on my web site:
http://www.jfsowa.com/figs/suohier.gif
Following are some general principles, which apply
to this hierarchy and any others that are organized
as a generalization-specialization hierarchy:
1. Every module is consistent with all its
generalizations (i.e., those that lie on
any upward path between it and the top).
2. Every module is consistent with all its
non-absurd specializations (i.e., those that
lie on any downward path between it and the
absurd theory at the bottom).
3. Any two modules whose only common specialization
is the absurd theory at the bottom are assumed
to be inconsistent (unless proven otherwise,
in which case their merger could be added to
the hierarchy on a branch below each of them).
4. Any two modules whose only common generalization
is the universal (or empty) theory at the top
are assumed to have nothing in common (unless
proven otherwise, in which case their common
generalization could be added to the hierarchy
on a branch above each of them).
Constructing a hierarchy similar to suohier.gif
would be fairly straightforward, and even at the
beginning stage it would be a useful thing to have
Even more useful, however, would be a refinement
of suohier.gif to find commonalities between SUMO
and OpenCyc and to find features of each that could
be used to enrich the other (i.e., to fill in the
"no man's land" between the two branches).
The purpose of the lattice operators is to provide
guidelines that show where to look for the
commonalities and missing information and where
to put the results when they have been derived.
Bottom line: Building the hierarchy can be done in
a step-by-step fashion, and even the early stages
can be useful to the SUMO and OpenCyc developers
and anyone else who needs ontology building blocks.
John Sowa
