Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

SUO: Why any finite ontology is doomed -- was *Date 16 Feb 20




I received the following offline message in response to my earlier
note saying that any attempt to develop a single standard ontology is
doomed to failure:

 > Does this represent a change of heart for you?  I was under the
 > (mistaken?) impression that you were one of the parties preent
 > in the discussions, conferences, etc., that led up to the present
 > SUO effort under the IEEE auspices.

I was involved with several things that led to the present situation:

   - The SRKB (Shared Reusable Knowledge Bases) workshop in 1991,
     which led to a number of efforts, of which KIF is the primary
     survivor.  For the email discussions of the SRKB group (which
     discuss many of the same topics as the SUO group and which
     show how little progress has been made in the past 11 years),
     see the SRKB email archives:

     http://www-ksl.stanford.edu/email-archives/srkb.index.html

     At the bottom of this note, I repeat my SRKB position paper,
     which is msg 17 on that list.  I still agree with the points
     made in that paper, which is dated March 1991.

   - An ANSI and ISO effort to develop a standard for conceptual
     schemas.  The idea of a conceptual schema was proposed in
     the 1970s, my first published paper on conceptual graphs
     in 1976 addressed the issue, and I joined the ANSI X3H4
     committee in 1991, which was involved in developing a
     standard for it.

   - The X3H4 WG6 subcommittee was later merged with X3T2, which
     was also working on conceptual schema standards.  A working group
     of X3T2 on ontology was inaugurated in 1996 and continued for
     two more years until X3T2 was merged with X3L8, which became
     NCITS L8 around the same time that the ISO conceptual schema
     effort finally died.

I agree that many other people working on these standards have sought
a single official ontology, but that was never my position.  For a more
accurate statement of my position, see the following position paper
from 1991, which I believe is just as true today as it ever was.

The only difference is that I believe we are now closer to implementing
something along the lines suggested in that 1991 paper.  I don't believe
that any single ontology along the lines of SUMO or CYC would be worth
the effort that it would require.  But I do believe that it is much
easier to define a framework that can support and relate an infinite
number of different ontologies.  For a statement of what I believe
should be done, see Section 7 of the following paper:

    http://www.jfsowa.com/pubs/signproc.htm

As I said in a previous note, I am also preparing the slides for
another lecture, which will go into more detail about how to achieve
the goals that I recommend.  I'll send a pointer to those slides
in another email note within the next few days.

John Sowa
________________________________________________________________

              Multi-Domain Knowledge Representation:
                      SRKB Position Statement

                           John F. Sowa

                            March 1991

AI has been most successful on small domains:  the microworlds
of early AI demos; the highly specialized expert systems for
commercial applications; and the machine translation systems like
METEO, which require no human editing, but are restricted to the
very narrow topic of weather reports.  Such knowledge bases can be
shared and reused, but only for other projects that are similarly
restricted.  The position taken in this paper is that such
compartmentalization is inevitable:  all deep knowledge is domain
dependent.  Only superficial, syntactic knowledge carries over from
one domain to another.  A serious question to consider is whether
such superficial knowledge can provide a framework in which the deeper
domain-dependent knowledge can be shared.  The answer given in this
paper is maybe:  some things can be shared, but the research needed
to support a significant amount of sharing of knowledge representations
across multiple domains is still in a primitive stage of development.

Examples of Domain-Independent Knowledge

Many different projects have surface similarities that seem to suggest
that shared knowledge representations are possible.  Expert systems
designed to assist automobile drivers, airplane pilots, ship captains,
and locomotive engineers, for example, would seem to have a lot in
common.  All of them must deal with time, speed, and distance as well
as fuel consumption, equipment condition, and passenger safety.
Programming languages also have a great deal in common, as the
following assignment statements seem to indicate:

    APL:         X <- A + B
    FORTRAN:     X = A + B
    PL/I:        X = A + B;
    Pascal:      X := A + B;

Yet these surface commonalities mask serious differences in detail.
A deeper analysis indicates that the similarities are more syntactic
than semantic:  the concepts required for each domain are so tightly
bound to that domain that they cannot be mapped from one to the other.
Generalizations that cover multiple domains have so little detail that
it is not clear whether they can contribute anything significant to the
development of a new knowledge base in any of the more detailed domains.

First consider the possibility of common knowledge bases for
automobiles, airplanes, ships, and trains.  A major difference between
these domains is the number of degrees of freedom in the motion.  A
train's motion is purely one dimensional because of the rigid tracks.
At a gross level, a car's motion is also one dimensional, but at a
detailed level, the driver must maneuver in two dimensions to keep the
car in lane and avoid other cars and obstacles.  A ship's motion is
also two dimensional, but its greater inertia causes a change in course
to take minutes instead of the split-second changes that are possible
with a car.  An airplane's motion is three dimensional, but changes in
attitude introduce three more degrees of freedom.  Besides differences
in motion, there are different kinds of signals to consider and
different ways of planning a course and following it.  As a result, a
driver, a pilot, a captain, and an engineer have totally different ways
of thinking and reacting.  A person who is both a driver and a pilot
would have two independent modes of thought with little or nothing in
common.  Expert systems designed for each of these domains would also
have few common concepts and practically no common rules.

For the programming languages, the similarities in syntax mask major
differences in semantics.  If A, B, and X were all integers or all
floating-point numbers, the results would be the same for each of the
languages.  But differences arise when the data types are different.
FORTRAN and PL/I allow type conversions to or from integer and
floating-point, but Pascal only does automatic conversion from integer
to floating and would print an error message if A+B happened to be
floating-point and X were integer.  APL also does automatic conversions
in evaluating A+B; but in doing the assignment, it could change the type
of X instead of converting the result of A+B to X's previous type.  PL/I
does many other kinds of automatic conversions and would even convert
character strings to and from numbers.  APL and PL/I both allow A, B,
and X to be arrays as well as simple scalars; but PL/I places more
restrictions on the dimensions of the arrays, while APL has fewer
restrictions and APL2 has even less.  Because of these differences,
terms like addition or assignment statement can be given a precise
definition only for a single programming language.  In some cases, the
language standards are so loose that the definition may change with
every compiler or even every modification of a compiler.  An ontology
might include ADDITION as a concept type, but it would also require
subtypes APL-ADDITION, FORTRAN-ADDITION, and so on for every programming
language and dialect.

Even the same physical object may be represented in totally different
ways for different purposes.  A highway, for example, is one-dimensional
on a map.  For an automobile driver, it is two-dimensional.  For the
workers building the roadbed, it is three dimensional, but highly
regular.  And for the surveyors who are planning a level road through
hilly terrain, it is three dimensional with highly irregular amounts of
cut and fill.  Any physical object or system can be represented at an
unlimited number of levels of detail.  There is no stopping point that
is natural to the object itself; the stopping point depends entirely on
the purpose for which that object is being used.

Is Natural Language Domain Independent?

Natural languages can express knowledge about any topic in any domain.
But that does not make them domain independent.  The syntax of language
and the constraints at the level of case frames are largely domain
independent, but the meaning of each word is highly dependent on the
domain.  As an example, consider the following four sentences:

    Tom supported the tomato plant with a stick.
    Tom supported his daughter with \$8,000 per year.
    Tom supported his father with a decisive argument.
    Tom supported his partner with a bid of 3 spades.

These sentences all use the verb support in the same syntactic pattern:

    A person supported NP1 with NP2.

Yet each use of the verb can only be understood with respect to a
particular subject matter or domain of discourse:  physical structures,
financial arrangements, intellectual debate, or the game of bridge.  For
each of these domains, the concept type SUPPORT would require different
subtypes, such as PHYSICAL-SUPPORT, FINANCIAL-SUPPORT, or INTELLECTUAL-
SUPPORT.  Each of those subtypes could be subdivided further:  physical
support by being tied to a stick could be distinguished from support by
being propped up from below or being suspended from above; financial
support by an allowance could be distinguished from support by a trust
fund or support by payments at irregular intervals.  Each difference in
concept type makes a difference in reasoning and behavior:  a child with
a regular allowance enjoys some measure of stability, while a child who
gets irregular payments must be on good behavior, always hoping for
another grant at any moment.

The point of these examples is that vagueness and ambiguity do not
result from the nature of language.  Instead, they result from the use
and reuse of the same words in many different domains and applications.
The same kinds of ambiguities that arise with a technical term like
assignment statement also arise with a common verb like support.  The
number of different concept types associated with a word is unlimited,
and the totality of meanings may be inconsistent.  An interior
decorator, for example, may think of walls as parts of a room, while
a construction contractor may think of them as separators between rooms.
Each view is correct for a certain purpose and point of view, but
they are incompatible with one another.  The word senses listed in
dictionaries represent the most common applications, and larger
dictionaries list more of them.  But even the largest dictionaries fail
to distinguish such nuances as addition in APL vs. addition in FORTRAN
or support by an allowance vs. support by irregular payments.  Although
the different meanings of addition, support, and wall are incompatible,
they still have something in common.  It is easier for a person to learn
and use a single word for them than to learn different words that change
with every application.  But that implies that the only thing that is
easily shared or reusable is the syntax, not the deeper semantics of the
knowledge base.

Language Games

The traditional AI approach to knowledge representation resembles the
early philosophy of Ludwig Wittgenstein, as presented in the Tractatus
Logico-Philosophicus.  In his later philosophy, Wittgenstein presented
scathing criticisms of his earlier work -- all of which apply equally
well to the current attempts to build shared, reusable knowledge bases.
Yet his later work is not totally negative; it contains the basis for a
solution.  His theory of language games suggests that the way to build
large, flexible intelligent systems is to provide a framework that can
use and reuse the same syntactic tokens in different language games for
different domains.  Some of the implications of these ideas for AI were
discussed in the last chapter of a book (Sowa 1984), two recent papers
(Sowa 1990, 1991), and a workshop on large knowledge bases (Silverman
and Murray 1991).

References

Silverman, Barry G., and Arthur J. Murray (1991) "Full-sized
knowledge-based systems research workshop," AI Magazine, vol. 11, no. 5,
January 1991, pp. 88-94.

Sowa, J. F. (1984) Conceptual Structures:  Information Processing in 
Mind and Machine, Addison-Wesley, Reading, MA.

Sowa, J. F. (1990) "Crystallizing theories out of knowledge soup," in
Intelligent Systems:  State of the Art and Future Directions}, edited
by Zbigniew W. Ras and Maria Zemankova, Ellis Horwood, New York,
pp. 456-487.

Sowa, J. F. (1991) "Lexical structures and conceptual structures," in
Semantics in the Lexicon, edited by James Pustejovsky, to be published 
by Kluwer Academic Press.

Wittgenstein, Ludwig (1921) Tractatus Logico-Philosophicus, Routledge
and Kegan Paul, London, 1961.

Wittgenstein, Ludwig (1953) Philosophical Investigations, Basil
Blackwell, Oxford.