Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

SUO: RE: Re: An article on the pitfalls of metadata




Richard:

I know John Sowa recommends an automatic ontology category generation
mechanism as well. But I don't understand how it would work. If you can send
me any references, I'd appreciate it.

Here's an example of what puzzles me about how we could automatically
generate higher level ontological categories.

One heuristic that comes to mind is that whenever two or more entities share
identical attributes (assume everything is well-defined), we can create a
supertype and move the common attributes up to it. In this way, an algorithm
could generate, for any set of tables, a next-level-up set of supertype
tables. Iterating the process, we would end up with a multi-level ontology
with a single entity as the highest entry (the ousia/"thing"/substance
entity). In an OO environment, the existence of one or more common methods
would probably suffice, even if there were no common attributes.

This seems based on the principle that a supertype (I'm talking in terms of
relational data models, now) of a set of entities  exists whenever that set
shares one or more attributes in common. This makes for tidier databases
and, in an OO environment, simpler code.

If I had time (or if you request it), I would (or will) develop an example
in which this heuristic generates a plethora of supertype entities, with
many starting entities defined as subtypes of many of those supertype
entities -- a "spaghetti" structure, in other words. OK, here's one quick
example. Consider a Parts table in a manufacturing database, and the many
tables which contain a foreign key back to Parts (the Inventory and Bill of
Materials tables, for example). Should we define a supertype for Inventory
and Bill of Materials on the basis of this one common attribute? If so, it's
name would be something like "Things Related to Parts". Every transaction
table, in any database, will have a date-time-entered attribute. Should all
transaction tables then be subtypes of a "Transaction Date-Time" supertype
table? If not, how would an algorithm using this heuristic weed out such
fluff? Or are there other algorithms that won't generate fluff?

In this structure, many of the generated entities would seem, intuitively,
to be wrong, to not correspond to a "natural kind" in the real world. Since
the human user is part of the information system, along with the codebase
and the database (a point I have emphasized in several articles of mine, and
which I think John Sowa might approvingly interpret as a bit of the semiotic
perspective on my part), it is important for our entities to be intuitive,
to seem to represent "natural kinds"; for otherwise, we will use the
database incorrectly, populating it with category mistakes and often not
being sure how to frame a query to get what we want from it.

So: how can an algorithm generate natural kinds?

Tom



-----Original Message-----
From: Richard Cooper [mailto:rich@valutech.com]
Sent: Wednesday, August 27, 2003 12:24 PM
To: Tom Johnston; John F. Sowa; West, Matthew R SITI-ITPSIE
Cc: SUO; cg@cs.uah.edu
Subject: RE: Re: An article on the pitfalls of metadata


Tom Johnston wrote:

<snip/>

>My first question would
> be: which one has
> successfully incorporated the largest and most diverse set of
> lowest level
> (i.e. working database level) ontologies? Which ones can most
> completely
> rely on the data model itself to fully express the semantics
> up and down the
> entire ontology, without "patching things up" with ad hoc
> program code.
> (Sorry, I don't know how to translate this point, expressed
> in my preferred
> language, into the language of axiomatized formal systems.)
>
> Whichever one it is, that's the one we should go with. Let's
> work to add
> more lowest level ontologies to it. In the process, we may
> sometimes make a
> good case for revisions a couple of levels higher up. We may on rare
> occasions make a good case for revisions much higher up. Some of those
> revisions will not force structural changes elsewhere in the
> web of this
> ontology, e.g. adding a creation-date-timestamp to the top-level
> entry/table/class. Other revisions will force structural
> changes, and such
> changes can be painfully expensive. But the further up we go, the less
> frequent the revisions will be. Once again, this is just
> Quine's sphere of
> language, his (or Peirce's?) holism.
>
> Tom
<snip/>

Tom, why not use the process you described above as the initial
statement of an algorithm to automate the merger of lower level
data models?

Observations about the actual databases stored with two data
models might be analyzed to come up with a higher level model
that incorporates both.  Since the top level model is empty,
when two data models merge to no common elements, the two are
clearly independent nodes on the lattice.  Some of the data
mining techniques can be applied to this approach.

I don't think its necessary, or even useful, to develop the
lattice manually since it will be necessarily a dynamic lattice
that changes with time.  So its not the initial lattice that
we should spend effort on, its the method (algorithm, process)
for building the lattice and refining it through observations.

JMHO,
Rich