Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: SUO: should a standard specify...




Jim,

You have a number of problems when you attempt to determine whether 2
ontologies are comparable (for mapping or merging or translation or
integration):

1) what are the respective KR languages they are formulated in (and
those KR langs will have various formal properties, some of which you
can abstract away from).
2) what is the modeling paradigm used to model the respective
ontologies? This is related to (1). An example, even using the same
language (if it has both paradigms): if ont1 is modeled in a frame
sublanguage of the KR and ont2 is modeled in an axiom format (like a
description logic). Note that this is still not yet about the meaning of
the ontologies.
3) typically there is no way to "carve out" comparable subsets of two
ontologies without great difficulty and long analysis, especially if
those ontologies are not "modular" to begin with. And what you are after
(related to points Pat Cassady made to the SUO back a ways) is some sort
of semantic "index" into ontologies, i.e., a kind of metadata/keyword
associated with an ontology module characterizing it as about
"Space-time" or "Part-whole" or "Human-organization", etc. You will not
find that typically, though you may find some natural language
description/documentation associated with a set of classes/relations
(which of course you will have to read, interpret semantically yourself,
and then decide on). I am beginning to think that Pat's notion of "index
terminology" may actually be useful for SUO to consider. These would be
your set of "primitives". Any ontology, if it abided by the "index term"
standard, would annotate its modules with an index term. Of course,
since an ontology could be a set of axioms (rather than modeled say as
more OO-like frames), the module, like the axioms, would be distributed
across the ontology (though topologically connected). Maybe SUO should
create such a standard too? It's not a standard upper ontology, but
would be a standard ontology indexing system. I know this is a potential
can of worms, but what do you think?

With respect to determining whether two ontologies are comparable
semantically, you will find that similar methods are used to establish
merges (or mappings or translations) in three different research
communities: the database, the thesaurus, and the ontology communities.
The state of the art in all 3 is semi-automated merging/mapping (and it
will be that for quite some time, since the process requires
something/someone to understand the semantics on both sides and ok the
semantic equivalences): and techniques include substring/string matching
of concept/relation/attribute/value names (notoriously weak, since the
expectation is that labels somehow carry their semantics on their
sleeves, so to speak), sometimes generalized to include other
text/linguistic processing, graph homomorphisms (i.e., basing semantic
matching on structural matching, again typically inadequate, since two
things with the same/similar structure are not necessarily semantically
equivalent, plus of course one expects many-to-many relationships and
there may be no structural "preservation"), and statistical
correlation/clustering (in effect, comparable to the generalized
linguistic methods in which a statistical model is created and used
based on strings and co-occurrences), and finally there is the method of
"inducing" semantic correlations based on an analysis of the
instances/tuples in the knowledge/databases, especially in cases where
two schema/taxonomies/ontologies have intersecting instances/tuples.
This latter is sometimes called using extensional equivalence
(instances/tuples being the extensions, schemas/ontologies the
intensions). 

Some researchers will also use additional knowledge resources to assist
in the process or bootstrap, e.g., using WordNet or some other thesaurus
in the background to approximate synonymy between the labels for
classes/relations/attributes in two different ontologies, and define
some "semantic distance measure" or threshold above which the labels can
be considered candidate synonyms. 

Leo




Jim Farrugia wrote:
> 
> All,
> 
> I face some practical problems that I hope you can help me with.
> 
> The broad topic has to do with how to access the guts of ontologies
> based on the ontologies themselves and perhaps also their specifications.
> 
> Those in a hurry can skip to the SUMMARY at the end.
> 
> GOAL:
> 
> I am trying to extract suitable sections of OpenCyc, SUMO, and perhaps ISO15926
> to serve as plausible example ontologies for demonstrating the IFF.
> 
> PROBLEM:
> 
> Faced with hundreds of megs of terms, definitions, comments, relations,
> axioms, and so forth, how can I comb through these ontologies to
> do the following?
> 
> (1) find out what are the ontological primitives (e.g., terms, relations,
> axioms) used by each ontology and whether these primitives are compatible
> (e.g., do the ontologies mean the same thing by "individual", "constant",
> "collection", "relation", ... ?);
> 
> (2) carve out from each of the ontologies subontologies that deal
> with some particular domain, say, naive spatial reasoning;
> 
> (3) get an idea of possible correspondences among the different
> subontologies, to determine whether the ontologies stand a chance of
> being able to interoperate with each other.
> 
> FUNDAMENTAL ISSUE:
> 
> How to get the relevant ontological primitives and assertions from the
> ontology itself?
> 
> (It is interesting to note that even if I were working with much
> smaller ontologies, the fundamental issue would still remain.)
> 
> Now you may say, "Well, that's your problem because of what you've
> chosen to do, and it's not really a problem that members of this group need
> to worry about."  But I think it is, and I will try to explain why.
> 
> Many members of the group would seem to agree that the SUO should have some
> kind of framework within which different ontologies can "interoperate."
> (SUMO itself is modular; and the IFF group feels that the IFF and SUMO
> represent complementary approaches to achieving interoperability.)
> 
> For the sake of argument, let's suppose that there is indeed widespread
> agreement among members on that score.
> 
> It seems that interoperability always comes at some price, such as
> agreement on certain conventions. From recent postings to the list, one
> price people seem ready to pay is that of agreeing on a particular
> logical language.
> 
> That agreement might be paraphrased roughly as follows: "once different
> ontologies can represent entities, relations, axioms, and semantics in
> the same language (something like KIF/CL ??), then we can begin to
> provide a framework that allows different ontologies to interoperate."
> (I'm leaving "interoperate" as an undefined term, because I think that for
> my argument its definition is not necessary).
> 
> What I want to suggest is that the assumption of a common language for
> expressing different ontologies may not be realistic, at least not at the
> present time.  (See, for example, the discussion below of "thing" in SUMO,
> Cyc, and ISO15926-2.)
> 
> What I want further to suggest is that maybe what is needed is not so much
> an agreed-upon common language, but an agreed-upon method to identify in
> a given ontology what are the primitive entities, relations, axioms, and
> definitions, and how these primitives can be used to extract
> domain-specific subontologies. Perhaps such an agreed-upon method should be
> part of the ontology's specification. Such a method would certainly help in
> the examples below.
> 
> That is, I am wondering how the "relevant primitive guts" of different
> ontologies can be identified, and I am looking to find some method
> that will allow humans and machines to use the ontology itself
> to identify these guts and then use them to extract useful subontologies.
> 
> You may suggest that a common ontology language would help establish such
> a method. I agree that it might, but I have two points: (1) it won't
> do so automatically, and therefore (2) some attention should be given
> in the standard for how certain fundamental primitives of ontologies can
> be designated in the ontologies themselves, as well as how they can be made
> available to programs that want to use them to extract particular
> subontologies.
> 
> To put it metaphorically, how can two ontologies, who happen to find
> themselves on a street corner checking each other out, assess each other's
> goods and whether or not they might be able to have a meaningful
> encounter together? The mere fact that they speak the same language may
> help, but something more is needed I think. It is this something more
> that I am suggesting bears consideration.
> 
> Now, for a concrete example of the problem mentioned above.
> 
> I want to find those components of the different ontologies that bear
> on, say, naive spatial relations. But even before I try to extract those
> components, I'll probably need to understand something very general: how do
> Cyc, SUMO, and ISO1596-2 treat the notion of "thing" or "entity"?
> 
> So, I looked in each of these ontologies for "thing" and "entity".
> What follows are some results, which don't pretend to be exhaustive.
> 
> Cyc has ( at http://www.cyc.com/cyc-2-1/vocab/fundamental-vocab.html#Thing ):
> ------------------------------------------------------------------------------
> #$Thing is the universal set: the collection of everything! Every Cyc constant
> in the Knowledge Base is a member of this collection; in the prefix notation
> of the language CycL, we express that fact as (#$isa CONST #$Thing). Thus,
> too, every collection in the Knowledge Base is a subset of the collection
> #$Thing; in CycL, we express that fact as (#$genls COL #$Thing).
> ...
> isa: #$Collection
> some subsets: #$Path-Generic #$Intangible #$Individual #$SimpleSegmentOfPath
> #$Path-Simple #$MathematicalOrComputationalThing #$IntangibleIndividual
> #$Product #$TemporalThing #$SpatialThing #$Situation #$EdgeOnObject
> #$FlowPath #$ComputationalObject #$Microtheory (plus 1488 more public
> subsets, 13568 unpublished subsets)
> ------------------------------------------------------------------------------
> 
> SUMO has:
> ------------------------------------------------------------------------------
> (documentation Entity "The universal class of individuals.  This is the root
> node of the ontology.")
> 
> ;; Everything is an entity (due to Robert E. Kent).
> 
> (forall (?THING) (instance ?THING Entity))
> ------------------------------------------------------------------------------
> 
> The ISO15926-2 draft, in clause 5.2.1.1 (with the underscore fore
> and aft added by me to indicate bolding in the document), has:
> ------------------------------------------------------------------------------
> A _thing_ is anything that is or can be thought about or perceived,
> including material and non-material objects, ideas, and actions. Every
> _thing_ is either an _individual_, a _class_, or a _relation_.
> 
> [a NOTE is ommitted here]
> 
> EXPRESS specification:
> ---------------------
> 
> *)
> ENTITY thing
> ABSTRACT   SUPERTYPE OF (ONEOF (class, individual, relation));
>   id : STRING;
> UNIQUE
>   UR1 : id;
> END_ENTITY;
> (*
> ------------------------------------------------------------------------------
> 
> How are people or programs to deal with such variability?
> 
> My particular problem, and I think it is representative of a problem to be
> faced by many people and machines in the near future, is:
> 
> How can I get a grip on what these different ontologies have to say
> about something I am interested in - in this case, "thing" or "entity"?
> 
> I'd like somehow to have a way to investigate possible ontologies for their
> usefulness or for their potential to match my needs, so that I could determine
> whether it is feasible to try to work with several ontologies together.
> 
> In trying to extract suitable subontologies of SUMO, Cyc, and ISO15926,
> I began by looking for relevant "possible correspondences" to some
> domain of interest -- that is, what were the basic ontological terms,
> definitions, and assertions that each ontology used to talk about "thing"
> and what are the correspondences between the ways each ontology does this?
> 
> (In a sense, perhaps, what I'm looking for is a sort of generic ontology API
> that would allow me to gather the relevant ontological primitives for a
> particular domain.  Ideally, such an API would allow programs to automatically
> assess different ontologies for compatability, based on only a small set of
> seed primitives.)
> 
> As far as the SUO effort is concerned, I suggest that we consider whether
> the identification of and access to certain ontological primitives, as well
> as the means to use those primitives for extracting domain-specific
> subontologies, should be part of the standard.
> 
> Just a few specific questions follow.
> 
> Should an ontology specification specify
> 
> 1. whether/how an ontology makes available to users methods to map input terms
> (given by a human or a machine) to terms used by the ontology?
> 
> In the above examples, I just intuited my way to look for "thing" or "entity",
> and I figured I would find one or the other of them. But in general, we don't
> want to rely on human intuition or luck.
> 
> 2. whether/how an ontology can present in one package all the definitions,
> relations, and axioms established for a particular term? I don't mean
> _all_ the possible inferences that include a particular term, just the
> definitions, relations, and axioms that are explicitly listed in the ontology.
> I think that being able to access just what is explicitly stated is valuable
> for determining whether two different ontologies could possibly interoperate.
> Feel free to enlighten me here, though.
> 
> 3. whether/how the semantics of the ontology can/should be made available
> from, say, lists of constant symbols, function symbols, relation symbols,
> along with denotational mappings and satisfaction relations?
> 
> For instance, (following the second edition of Mathematical Logic by
> Ebbinghaus, Flum, and Thomas) suppose R is a binary relation symbol.
> Then a formula like "for all x, Rxx" is "just a string of symbols to
> which no meaning is attached." Depending on the domain for x and on the
> interpretation of R as a particular relation on the domain, that string
> of symbols will mean different things. Without the denotational mappings
> and satisfaction relations, what good will it do to know certain axioms?
> 
> So, should the specification of an ontology say how the model theory
> used by the ontology is made available to interested users?
> 
> ------------------------------------------------------------------------------
> SUMMARY
> 
> * I see some difficult practical issues of how a human or a machine
> can determine whether (certain subdomains of particular) ontologies
> stand a chance of being able to interoperate with (certain subdomains of)
> other ontologies.
> 
> * The main difficulties seem to be: (1) how to identify the relevant
> ontological primitives (e.g., terms, definitions, axioms) used by an ontology;
> and (2) how to use these primitives to extract a particular domain-related
> portion of the ontology.
> 
> * Being able to identify, extract, and manipulate these primitives _seems_
> key to being able to assess whether two ontologies can interoperate
> (in virtually any plausible sense of that word). But it may indeed not be
> appropriate to suppose that one could determine whether two ontologies
> can interoperate based on accessing the ontological primitives and
> assertions used by the ontologies to talk about a particular domain.
> To those who hold this view, can you please elaborate?
> 
> * Currently, as shown by the examples of SUMO, Cyc, and ISO15926, different
> ontologies have different ways of representing these primitives, and in many
> cases it is difficult to determine from the ontology itself, how these
> primitives are identified and can be made available. Currently, these
> determinations must be done manually (I think) and in an ad hoc way.
> Perhaps the SUO standard can address how to accomplish make such
> determinations programatically.
> 
> * It is possible that a single, agreed-upon ontology language will go a long
> way in helping to identify and make available these ontological
> primitives and the subsequent extraction of domain-specific subontologies.
> But it seems that some specific mechanism that is different from a
> common language, and which belongs perhaps to the ontology's specification,
> needs to be provided. Such a mechanism needs to be, well, mechanical -- capable
> of being done by a machine with limited sets of seeded input.
> 
> * What that mechanism is, what the ontological primitives are, and how
> this mechanism can be used in practice to identify and make available these
> primitives and to allow certain subontologies to be extracted -- I suggest
> that all this should perhaps be investigated in the context of our
> standards work.
> 
> * If these issues have already been addressed and solutions for them
> have been found, please let me know.
> 
> -----------------------------------------------------------------------------
> 
> Any and all comments appreciated.
> 
> Thanks,
> 
> Jim

-- 
_____________________________________________
Dr. Leo Obrst		The MITRE Corporation
mailto:lobrst@mitre.org Intelligent Information Management/Exploitation
Voice: 703-883-6770	7515 Colshire Drive, M/S W640
Fax: 703-883-1379       McLean, VA 22102-7508, USA