Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: SUO: An Integrated View of Everything in SIX WEEKS!!




Mike,

I wasn't making any wild claims or promises.  I was merely
reporting on one application that was very successful for
a particular problem.

>It would seem that in 6 weeks, you could hardly define a suite of test cases

>to see if the techniques work, never mind run them, never mind create the
>solution etc.  

The VivoMind system for finding analogies was not developed
in 6 weeks.  It was the result of over 10 years of development,
during which many iterations of partial solutions had been
tried, tested, rejected, revised, reworked, and redone.

What Majumdar did in 3 weeks was merely to "customize" the
program to adapt it to the given task.  That customization
involved defining the canonical graphs that specified just
those patterns that the program was intended to extract from
that mountain of data.  It also involved writing the grammar
rules (in Prolog) for the COBOL Data Division and Environment
Division and for the JCL DD cards.

Most of the information in English, COBOL, and JCL was
irrelevant to the needs of the task; the parser looked at it,
and threw it away.  The only information it kept was what fit
into the patterns of the three canonical graphs that specified
what data and types of data each program used, generated, or
modified.  It was a very well defined task applied to a very
large amount of data (since it took 3 x 7 x 24 hours on a
750 MHz Pentium III).

>I generally ignore people who claim I can make millions in short spaces of

>time, and so I would normally be inclined to ignore anyone who made the
>claim you did.

I didn't make any claims whatever.  I simply said what Majumdar
and LeClerc accomplished.  Ed Yourdon, who signed the contract,
only promised that they would do a 6-week study to see what
they could accomplish.  What they in fact accomplished was to
complete the project -- an achievement that no one had expected,
not even the two programmers who did it.

But note that VivoMind did not complete the re-engineering
task.  It simply translated the information it extracted into
a form that could be processed by Yourdon's tools to produce
the diagrams and reports.  The company then had to use their
real live human programmers to use those diagrams and reports
to recode the legacy programs in some modern system.

What VivoMind accomplished was to predigest the mountain of
data in order for the humans to find exactly what they needed
to do their work.  It accomplished in 3 weeks of computer time
what the other consultants claimed would require 80 person
years of human time (40 people for 2 years).

> However, in this case, I expect that there are indeed some
>useful nuggets here.  Unfortunately, your presentation gives insufficient
>insight as to where they might be.

I have only one hour for the talk (which will be presented
at the Westin Hotel in Seattle on March 13th).  The main points
I made are in the beginning and the conclusions.  The example
of the project only illustrates the fact that such techniques
can be applied to large amounts of real-world data.

The main issues I was trying to get across do not depend
on the example.  The only use for the example is to impress
upon the audience that it is possible to apply such techniques
today, and that we don't have to wait another 10 years for
Cyc or SUMO or some other effort to mature.  Following are
the main points of my talk:

 1. The problems of knowledge soup (which I have been talking
    about and writing about for nearly 20 years) aren't going
    to be solved by legislating an ontology a la Cyc or SUMO.

 2. They can be tackled today, even without a 10 year project
    to finish Cyc or SUMO or whatever.

 3. A better approach is to use superficial analysis with a
    rather shallow semantics, such as WordNet, rather than a
    deep semantics that is fully axiomatized.  That is a point
    that I have also been talking about for a long time, and
    it is a position that is widely recognized among people
    working in the information extraction field.  (See my
    template.htm paper, which I presented at a conference on
    info extraction in July 1999.)

 4. Majumdar's parser (which I had only recently had a chance
    to observe) is indeed a shallow parser based on WordNet
    and similar lexical resources.  But it's main claim to fame
    is that it supports the option of deepening the analysis
    for those items that are of interest.  (It belongs to a
    type of parser that I was recommending in my 1999 paper.)

>There are no doubt countless assumptions and limitations of the approach you

>suggest, so that should I attempt to do the same thing at Boeing, I might
>expect to search for weeks or months to find just the right set of
>circumstances when your particular solution fits well. Even then, it would

>be worth the search, if your claim was indeed legitimate.

As I said, I didn't make any claims; I merely reported a fact.
You can ask the people at the company in question about what
they thought of the work M & L did.  (I didn't mention the
company's name, only because I didn't get their permission to
talk about the details of their work.)  But if you like, you
can send an offline note to Arun Majumdar, who would be happy
to tell you who to contact at the company.   His email address
is arun@genumerix.com.

>Can you enlighten us as to what the [possibly hidden] assumptions are? What

>are the limitations of this technology? When will it work well? When will it

>work not so well, Why?  How can I decide where to look in my company for a

>situation where I can apply this approach and get these apparently
>miraculous results?

VivoMind did not do a deep logical analysis of what it parsed.
It merely did a shallow syntactic and semantic analysis of an
enormous amount of data, from which it extracted whatever
information fit the three canonical graphs I showed in one
of the slides near the end of the talk.  If you have any
similar problem, which involves searching large amounts of
data (in English, COBOL, or other language, natural or
artificial) in order to find all occurrences that fit some
clearly defined set of patterns, then the technique is likely
to be useful.

John