SUO: Re: An article on the pitfalls of metadata
Dear Matthew,
Your summaries of successful projects are always welcome.
They provide important insights into the kinds of applications
that we in the ontology business should consider as the
"low-hanging fruit" that is most likely to give us some quick
sustenance.
But I would also like to relate your experiences to the
points that Cory Doctorow was making, to the further
discussions by Jon Awbrey and me, and to the broader
issues of developing a "standard upper ontology".
> Our context is the handover of design information
> between engineering contractors and owner operators
> for large capital systems like oil rigs, typically
> worth $2 billiion or more.
These are commercially important applications, which
can provide a lot of funding to support low-level --
i.e., application-oriented -- ontology projects. But
as we have all noticed, engineering contractors and
owner operators are not likely to support projects for
developing anything that is not immediately relevant
to their business (such as upper-level ontologies).
Although I believe that work on upper-level ontologies
can be valuable, I have serious disagreements with the
people who are trying to do that work in isolation from
applications. That isolation makes it impossible to
evaluate any proposals in terms of anything that people
actually do or want to do.
Some further comments on your comments:
> 1. People Lie.
>
> We are in a contractual situation, also a relatively
> small world where establishing trust is essential to
> do business at all. There is therefore not much long
> term benefit in lieing.
Yes. A business that has real work to do can provide
immediate feedback on what works and what doesn't.
But people who work on upper-level ontologies are
usually funded (if at all) by research grants for
projects that are not required to address any
applications that anybody really needs to do.
In that atmosphere, a lot of people can survive
on hype and on publications in "peer-reviewed"
journals for which the "peers" are also supported
by research grants rather than hard applications.
> 2. People are lazy
>
> We are banking on this. The objective is that using
> the reference data will be the easiest (and cheapest)
> way to get the job done.
All the best mathematicians are lazy in the sense that
they look for the simplest way of proving a theorem
or carrying out some calculation. But they succeed
because mathematics has strong criteria for evaluating
success. Commercial applications are not mathematics,
but they also have hard criteria for success.
Upper-level ontologies, however, are so far removed from
reality that there are no good ways of distinguishing
a good one from a bad one. (And I don't consider the
OntoClean rules of Dolce as good evaluation criteria
becuase those rules themselves have never been tested
against reality.)
> 3. People are stupid.
>
> Unfortunately "You can't stop dumb people doing dumb
> things" and "We're all dumb at least some of the time"
> So you need to check and recheck what you have been
> given. Just a necessary part of the process. However,
> at least this can noe be largely automated, and its
> surprising how much the quality goes up when people
> know what they are doing is going to be checked.
I completely agree with this point. And unfortunately
none of the upper-level ontologies that anyone has been
proposing have undergone any such stringent testing.
> 4. Mission Impossible - know thyself
>
> Yes people are unreliable, but our experience has been
> that even using the reference data badly is better than
> not using it at all.
Cory D's point was that people are not good at evaluating
themselves. They need some hard external criteria to
test themselves and determine whether they are on the
right track. The applications you cite provide such
tests. But there are no good tests for upper-level
ontologies. (See my comment to point #2 about the
OntoClean criteria, for example.)
> 5. Schemas aren't neutral
>
> Certainly any single hierarchy is not neutral, but you
> don't have to have just a single hierarchy. Our data
> model is not neutral either. It is explicitly a 4D
> paradigm, and you have to see the world through those
> glasses. However, it is neutral between say washing
> machines and pumps. It also allows the addition of
> new things that you did not think of in the first place.
As long as you accept the need for multiple hierarchies,
I have no complaints.
> 6. Metrics Influence Results
>
> I'm not sure that this is really relevant in our context.
It is relevant, and your context does have a very good
metric: the method must succeed in producing what the
customer needs and is willing to pay for.
Many granting agencies do stipulate some sort of metrics,
and the research is duly tailored to the metrics. But
more often than not, the people who design the metrics
already have some group in mind that they want to fund,
and they write the rules so that the target group is
the one that gets the grant.
This point is related to my criticism of articifical
metrics, such as the OntoClean rules, which were intended
to influence how people design ontologies, but the rules
themsevles were never tested against the real world.
> 7. There's more than one way to describe something
>
> The world of engineering is relatively objective, we do
> not have too much problem with people describing lengths
> in metres or yards, we know how to relate one to the
> other. On the other hand at a higher level this is an
> issue. So we have made a choice for a 4D paradigm.
But your applications have many similarities. If you
broaden the scope to a larger range of tasks, the number
of ways of describing something increases rapidly.
As an example, just consider the difference between
civil engineers, chemical engineers, and electrical
engineers, who have totally different traditions,
formalisms, rules of thumb, and ways of looking at
things. All of them have to deal with the world,
but each of them emphasizes some aspects and
ignores others.
In order to solve a hard problem (and nearly all
real-world problems are hard), engineers always make
approximations that throw away "irrelevant detail".
But the details that a civil engineer considers
irrelevant may be crucial for a chemical or
electrical engineer -- and vice-versa.
> Well it looks like most of these issues are real,
> but not necessarily insurmountable.
I agree that they are not insurmountable -- but only
if you have a clear application in mind and hard
criteria for determining success. And nothing is
harder than meeting the requirements of a customer
who will not pay for an unfinished job or shoddy work.
John Sowa