Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

SUO: RE: Re: An article on the pitfalls of metadata




This is very helpful. See my embedded comments/questions.

Thanks.

Tom

-----Original Message-----
From: Richard Cooper [mailto:rich@valutech.com]
Sent: Wednesday, August 27, 2003 6:54 PM
To: Tom Johnston; John F. Sowa; West, Matthew R SITI-ITPSIE
Cc: SUO; cg@cs.uah.edu; jawbrey@att.net
Subject: RE: Re: An article on the pitfalls of metadata


Tom Johnston wrote:
> Richard:
>
> I know John Sowa recommends an automatic ontology category generation
> mechanism as well. But I don't understand how it would work.
> If you can send
> me any references, I'd appreciate it.

Automatic clustering of properties is what leads to designating an
object type.  For example, consider how the property "color" appears
in automatically reading English sentences from a corpus.  Everything
that has a color value must have some higher level commonality.  So
I may find that the sea has color, a bridge has color, a girl's eye
has color, and so on.

Later, I find things that never get any color description in the
corpus.  So electricity, hunches, ideas, and many other objects are
found not to have color descriptions.  That means there is a set of
things that have the property color, and another set of things that
are not known to have color.

Next, we want the algorithm to create an activity for itself to
specifically find out more about color by using WordNet to cluster
the set of all objects mentioned in the corpus.  The activity is just
a heuristic hunch which may have little or no benefit, yet it might
be valuable - the algorithm doesn't know yet.  So the activity it
assigned to itself goes into the "things to do when there is nothing
else to do" queue, arranged by the estimated cost and the estimated
benefit of completing each activity.

TJ: "estimated benefit". Now how do we determine that?

Now suppose the algorithm has been cooking for a while, and there are
lots of tasks on the activity queue.  We can choose to limit the number
of tasks (a beam search) and throw away those with lower cost benefit
estimates.  This keeps us from chasing every possible thing to do
about the corpus under study.

As individuals aggregate properties, we can figure out which classes
of individuals could have which properties.  Given the instantaneous
state of knowledge about the set of individuals, those sets P[i] which
have a proper subset of all the properties of a different set C[j] of
individuals are necessarily more abstract (or more generalized) concept
sets than the C[j] sets.

Of course, WordNet gives us evidence that some nouns are more abstract
than others because one synset is defined using the words from other
synsets.  Finding a noncyclical path from most general to least general
is the way to map out higher from lower concepts.  Same with verbs;
most English verbs with one syllable are used much more often than
others with lots of syllables.  The most common are terminal verbs
which can be modeled with actvity graphs, while the less common can be
organized in terms of their WordNet definitions.  Same with adjectives,
adverbs, phrases, sentences, paragraphs and so on.

TJ: Wordnet, I gather, is a formalization of a good dictionary. So when the
dictionary is eventually revised, we get a semantic earthquake, right? Or is
WordNet just a way to translate the formal ontology into something more like
natural language statements?

Using a widely agreed upon set of lexical terminals like those in
WordNet helps make the algorithm's results understandable to human
users, whether teachers, students or other consumer of the lattice.

For references, I suggest Tom Mitchell's book "Machine Learning", which
is a classic CS text.  There is a master's thesis by David Chapman
(MIT, mid 80's) which describes an algorithm he called "Tweak".  This
is the first paper to treat planning as an instance of temporal logic
and its primary product - the modal truth criterion.  And of course,
there's lots of work (google search) on data mining algorithms, rough
sets, etc.

My own work was strongly influenced by all those sources, along with
years of studying the software development process (back in cold
war military days), automated design systems, lisp programming, and
the modern Delphi programming environment IDE.  I wrote a commercial
product that you can visit at:
http://www.efficacyfx.com
where there's a very practical implementation of process improvement
concepts using a factory floor environment.  I was amazed at how much
real data can help when building a work flow concept set.


> Here's an example of what puzzles me about how we could automatically
> generate higher level ontological categories.
>
> One heuristic that comes to mind is that whenever two or more
> entities share
> identical attributes (assume everything is well-defined), we
> can create a
> supertype and move the common attributes up to it.

Yes, with the caveat that "identical" is sometimes elusive.  That's
why we should start with very basic stuff, like integers and strings,
and then use range-domain analytic techniques (a la IFF) to identify
what attributes have the same values.


>In this
> way, an algorithm
> could generate, for any set of tables, a next-level-up set of
> supertype
> tables.

Yes, with the WordNet structure serving as a suggestive source of
names to give the supertype tables.  Also, joined tables have attribute
names that are "roles" which suggest some functionality of the attributes.
Again, WordNet could possibly help in giving familiar names to the
common roles of the supertype attributes.


>Iterating the process, we would end up with a
> multi-level ontology
> with a single entity as the highest entry (the ousia/"thing"/substance
> entity).

Yes, though that's somewhat artificial, but I haven't come up with a
better way.  Also, remember that when the same (or similar) supertypes
are made from distinct subtypes, they may be instances of the same
supertype, but with some distinction among the subtype groups.  So
the various types of oranges and the various types of apples have
supertypes that eventually should merge into "fruit", along with others.


>In an OO environment, the existence of one or more
> common methods
> would probably suffice, even if there were no common attributes.

The reason I like IDEF0 so much for this modeling purpose is that
people without OO backgrounds can still think in terms of objects
and activities, controls, resources, consumables, products, and
so on.  OO methods are activities, OO properties are object roles,
and relations among objects are properties of the next higher
objects.

OO programs start top down, but build to a library of previously
defined objects.  The ultimate terminals in Delphi are TObject,
TClass, and Program.  Everything else is built up from them by
deriving new programs from previously defined objects.  That is
basically how the algorithm should function: as a programmer in
a very highly automated IDE that asks questions, understands the
answers, and uses the knowledge it gains to pose more questions
and answers.  Sort of like us.

TJ: yes, and Smalltalk has an object model which all Smalltalk programmers
use. The fact that countless programs have been written in Smalltalk, using
that model, I take as evidence for my contention that an upper level
ontology can be quite stable because we will learn to use it in a way that
MAKES it stable. A stable upper ontology makes automated ontology generators
a less urgent concern, since we would then not be revising higher
ontological categories very often at all.

My doubts about how far automated ontology generators can take us is NOT
based on a detailed knowledge of them. It is based on more general
considerations, viz: if we could formulate a "logic of discovery", as
philosophers of science have called it, express it as a software engine, and
give it the right starting points, then the machines are about to become
truly intelligent. One of their first tasks might be to crank out the TOE
(theory of everything). The next big breaks in solid state physics,
neuropharmacology, structural engineering, supply chain logistics management
and insurance claims processing would follow.

I don't think we're close to that. Now the notion of an automatic ontology
generator, it seems to me, is the notion of a software engine which
implements a "logic of discovery", i.e. true intellectual creativity, in one
specific area, that of categorizing things. I'm not saying that such
software could not produce anything. As indicated above, I see clearly how
it could produce an overwhelming number of higher level categories from any
robust set of base level entities. Only some of those higher level
categories are "right", i.e. are intuitively natural, i.e. seem natural
extensions or clarifications of the ontologies we all starting acquiring
before we were a year old. What I do not see is how an automated process can
separate the wheat from the chaff.

So I need to get that example for you, which I'll work on the next couple of
days.


> This seems based on the principle that a supertype (I'm
> talking in terms of
> relational data models, now) of a set of entities  exists
> whenever that set
> shares one or more attributes in common. This makes for
> tidier databases
> and, in an OO environment, simpler code.

Yes.  I very much agree in concept and even in interpretation.

TJ: "whenever that set shares one or more attributes in common". This rule
for generating supertypes will generate database models (ontologies) so ugly
that no one would every implement them. (Demonstration to follow in a couple
of days.)

> If I had time (or if you request it), I would (or will)
> develop an example
> in which this heuristic generates a plethora of supertype
> entities, with
> many starting entities defined as subtypes of many of those supertype
> entities -- a "spaghetti" structure, in other words. OK,
> here's one quick
> example.

Good!

>Consider a Parts table in a manufacturing database,
> and the many
> tables which contain a foreign key back to Parts (the
> Inventory and Bill of
> Materials tables, for example). Should we define a supertype
> for Inventory
> and Bill of Materials on the basis of this one common
> attribute?

No, we should define a domain called PartType.  That domain would be
the foreign key values.  So there could be one table (or view) of
all part type values.  Normally, that's just one attribute of many
tables, but collecting it into one view is useful for later distinguishing
among the different subtypes of PartTypes.  Groups of PartType values
can then be partitioned into those that are painted, carved, sanded,
cut, upholstered, shipped, purchased, and so on.

TJ: that domain already exists. It is the dynamic (rather than static
domain) consisting of all the primary key values of the referenced Parts
table. What I should have said is that if you accept my supertype generation
principle above, then on what grounds would you NOT "define a supertype
..... of this one common attribute"?


>If so, it's
> name would be something like "Things Related to Parts". Every
> transaction
> table, in any database, will have a date-time-entered
> attribute. Should all
> transaction tables then be subtypes of a "Transaction
> Date-Time" supertype
> table? If not, how would an algorithm using this heuristic
> weed out such
> fluff? Or are there other algorithms that won't generate fluff?


I prefer the concept of an "Event", as its used in formal logic texts.
For example, "Knowledge in Action" by Raymond Reiter is a good source
on thoughts and ideas on how to treat events.

TJ: yes. I have always thought of a transaction as the record of an event
that affects the state of one or more temporally-enduring things recorded in
a database.

Also, I don't like time stamps on every transaction, but only on those
that are formally considered events of some type.  A good example
of that is in my web site at
http://www.efficacyfx.com/manufact.htm
where there is a study of the parts and their activities in a factory
floor environment.

TJ: I'll check it out. I hope you enjoyed the manufacturing example I
developed a couple of weeks ago.


In brief, if a bunch of actions are related, let each one be part
of a single "event".  Then that event has a start time-date stamp
and a stop time-date stamp in the database.  Whenever an object (employee,
assembly, raw material, tool, activity) is related to another object
of the event, that is marked in the event table and kept as history
for further analysis.

Eventually, history starts acting like the past.  At that point, the
history table can be analyzed for repeating terms.  When a repeating
term is found, it can be hypothesized to repeat again the next time
the same situation is identified.

But to distinguish one situation from another, you have to group the
occurances by their various constituents and distinguish the various
subevents.  So that analysis leads to a lattice of concepts and
subconcepts itself.  The recursive application of this approach is
what eventually leads to the complete lattice.


> In this structure, many of the generated entities would seem,
> intuitively,
> to be wrong, to not correspond to a "natural kind" in the
> real world. Since
> the human user is part of the information system, along with
> the codebase
> and the database (a point I have emphasized in several
> articles of mine, and
> which I think John Sowa might approvingly interpret as a bit
> of the semiotic
> perspective on my part), it is important for our entities to
> be intuitive,
> to seem to represent "natural kinds"; for otherwise, we will use the
> database incorrectly, populating it with category mistakes
> and often not
> being sure how to frame a query to get what we want from it.

Yes again.  That's why a dictionary like WordNet should be integrated
into the development tool so that meaningful terms can be suggested.  When
new terms are used, people have different reactions.  These "events" can
be kept and analyzed also.  By trying the various words in a synset,
the algorithm can eventually select which ones to use in which situation.

TJ: sounds like you think an automated process can generate natural kinds,
and then only needs something like WordNet to find a nice name for them.
We're back to the main point I'm most skeptical about.


> So: how can an algorithm generate natural kinds?
>
> Tom

I hope the (nonsuccint) description above is a good starting point
for further discussion.  Its a tough nut, but I think we can crack it.

The history and results of the Cyc effort, as seen through OpenCyc, indicate
to me that it isn't the database of facts and axioms that make up an
intelligent system.  The lessons from expert systems techniques makes me
believe that it also isn't the algorithm (deduction, neural nets, fuzzy
logic, proof trees, grammars, languages, take your pick) that make a
system intelligent.

The genetics projects indicate that we only have about 21,000 genes.
Social science indicates that nurture is formative.  Twin studies indicate
that the two go hand in hand.  Hardware (algorithms, genes, ...) nor
software (database, axioms, ...) are enough.

The only believable answer is the long one; we have to develop a transparent
learning algorithm and a database of facts that are situated in a real
environment along with a bunch of humans interacting with it.  That's
the way people become intelligent starting from a fertilized egg.  Every
step of the process is needed, and any one defective part ruins the product.

TJ: I agree. But I doubt we can generate useful ontologies via an algorithm
anytime soon, because the ability to do so seems to me part and parcel of
the ability of a machine to be intelligent and conceptually creative.
Nonetheless (another caveat emptor), my doubts are based on more general
philosophical considerations, and are based on very little detailed
knowledge of the hard work that computer scientists have done and are doing
in this field.

More to come.

Tom

JMHO,
Rich



> Tom Johnston wrote:
>
> <snip/>
>
> >My first question would
> > be: which one has
> > successfully incorporated the largest and most diverse set of
> > lowest level
> > (i.e. working database level) ontologies? Which ones can most
> > completely
> > rely on the data model itself to fully express the semantics
> > up and down the
> > entire ontology, without "patching things up" with ad hoc
> > program code.
> > (Sorry, I don't know how to translate this point, expressed
> > in my preferred
> > language, into the language of axiomatized formal systems.)
> >
> > Whichever one it is, that's the one we should go with. Let's
> > work to add
> > more lowest level ontologies to it. In the process, we may
> > sometimes make a
> > good case for revisions a couple of levels higher up. We may on rare
> > occasions make a good case for revisions much higher up.
> Some of those
> > revisions will not force structural changes elsewhere in the
> > web of this
> > ontology, e.g. adding a creation-date-timestamp to the top-level
> > entry/table/class. Other revisions will force structural
> > changes, and such
> > changes can be painfully expensive. But the further up we
> go, the less
> > frequent the revisions will be. Once again, this is just
> > Quine's sphere of
> > language, his (or Peirce's?) holism.
> >
> > Tom
> <snip/>
>
> Tom, why not use the process you described above as the initial
> statement of an algorithm to automate the merger of lower level
> data models?
>
> Observations about the actual databases stored with two data
> models might be analyzed to come up with a higher level model
> that incorporates both.  Since the top level model is empty,
> when two data models merge to no common elements, the two are
> clearly independent nodes on the lattice.  Some of the data
> mining techniques can be applied to this approach.
>
> I don't think its necessary, or even useful, to develop the
> lattice manually since it will be necessarily a dynamic lattice
> that changes with time.  So its not the initial lattice that
> we should spend effort on, its the method (algorithm, process)
> for building the lattice and refining it through observations.
>
> JMHO,
> Rich
>
>