Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

SUO: RE: Monosemy, Semantics, and Natural Language




Hi John,

As always, your thoughts brim with erudition and
your conclusions are penetrating.  The subject of
polysemy is a serious puzzle.  

I think there is wisdom in choosing a legislated
meaning for each term in an ontology or CLCE that 
is intended to travel among many contexts.  However,
I think there is a serious need to provide a human
interface to it in a polsyemous form that matches
the way people actually use language.  

So an ontology with fifty letter words that have
very specific meanings should be referencable by
five letter words that people actually use, and
polysemously.  (Hard to pronounce?)  

There are adequate mechanisms in programming
languages to disambiguate each symbol before the
code is generated or interpreted.  The same kinds
of mechanisms are needed for disambiguating both
ontologies and CLCEs.  

Of course, natural language is stridently more
polysemous than programs, but there are published
accounts of corpus analyses where investigators have
managed to organize sentential forms into classes
that can indeed be disambiguated.  CoreLex, EVCA,
and the synsets of WordNet are excellent examples.

So the question IMHO is whether to carry out the
disambiguation process with the original ontology
or CLCE publication, or to provide HMIs that
handle the disambiguation later.  There are pros
and cons to both sides.  

Disambiguate later:  The main advantage here is
of purity; the original ontology or CLCE is very
precisely defined.  Its just not very easy to use,
so the debates among early adopters (and fewer
of them) are precise and surgical.  

Disambiguate the original publication:  There are 
two main benefits.  First, this approach still
allows subsequent HMIs to be developed for contexts
that weren't considered in the original publication,
but it provides a more generally readable initial
release that could help sell the ontology or
CLCE to a wider audience, spreading it faster.
Secondly, it reduces the learning curve for people
who would like to become familiar with the new
ontology or CLCE, thus assuring a larger initial 
audience.  

The problem with delaying disambiguation is that
there may be much misinterpretation of the original
symbology if the later disambiguators are not among
the original developers.  

Rich



John F. Sowa wrote:
> In an earlier note, I mentioned that I was
> assuming a single word sense for each CLCE term.
> Jon Awbrey correctly pointed out that such an
> approach does not come to grips with the reality
> of how concepts are used.  I certainly agree.
> 
> Then today, I received a copy of a Seybold report
> that made some unfounded claims about the potential
> impact of the semantic web on "on data, computing,
> people, publishing, government and manufacturing."
> I wrote the following response to the person who
> sent it to me.
> 
> Since Seybold charges money for their reports, I
> cannot distribute or point to the full text, but
> my comments below should give some idea of what
> the authors claim.
> 
> John Sowa
> ___________________________________________________
> 
> Thanks for sending the report.  I hate to say nasty
> things about people who say nice things about my work,
> but I can't endorse that report.  Following are some
> comments.
> 
> The first point is that the authors fail to recognize
> that the ambiguities of natural language are intimately
> related to semantics and that changing the notation
> does nothing to solve them.  The following point is
> not only wrong, but totally wrong headed:
> 
>     A key theme is that language-based approaches suffer
>     from ambiguities inherent in natural language.
>     We contrast language-based knowledge representation
>     with semantic-form declarative knowledge and conclude
>     that semantics trumps linguistics.
> 
> As I have said repeatedly, SYNTAX IS NOT THE PROBLEM.
> Therefore, no change to syntax can ever solve the problem.
> 
> I certainly agree that there are ambiguities in NLs.
> The simple, trivial ambiguities are syntactic.  They
> are easy to deal with.  The difficult problems are all
> in the semantics.  And nothing that the authors say
> in that report comes to grips with the real semantic
> problems or even acknowledges their existence.
> 
> The following sentence indicates the authors' starry-eyed
> innocence:
> 
>     As the semantic model becomes richer, it more
>     completely specifies not only the formal class-subclass
>     relationships, but also relationships between concepts,
>     and the descriptive logic and conditional assertions
>     that are used to perform inference.
> 
> That is the view that Wittgenstein adopted from Bertrand
> Russell in 1913, and which he elaborated in his first book,
> the _Tractatus Logico-Philosophicus_.  After writing that
> book, W. thought that he had solved all the problems of
> philosophy, and he retired to an Austrian village to teach
> elementary school.  That's where he discovered that kids
> (and grown-ups) don't think that way.  As Shakespeare said,
> 
>     There are more things in heaven and earth, Horatio,
>     Than are dreamt of in your philosophy.
> 
> For the rest of his life, Wittgenstein analyzed, recanted,
> and clarified the hopelessness of the superficial approach
> that he and Russell had developed in their early work.
> W's analysis had nothing to do with the expression of
> concepts in natural language or in logic.  Every problem
> and pitfall that W. analyzed in his later philosophy
> applies just as much to the declarative languages of
> logic or the semantic web as it does to natural language.
> 
> Hope reigns eternal within the programmer's breast:
> 
>     Also, it is possible to reuse ontologies, in whole or
>     in part, that have already been developed.
> 
> I heard that same statement made about programs back
> in the 1960s:  once a program has been written, it
> could be used again by anybody else who had the same
> problem, and the collection of solved problems would
> grow indefinitely.
> 
> The key goal mentioned below has nothing to do with
> the kind of language that is being used:
> 
>     A key goal of language-based knowledge representation
>     is to eliminate the ambiguity of describing things
>     with labels and natural language, leading to improved
>     search and easier integration of content and processes.
>     But, as we have seen, this goal is difficult to achieve
>     when we use language to describe what we mean. Natural
>     language use is inherently ambiguous. Many words have
>     multiple meanings. There is no way to guarantee that
>     two occurrences of the same word have the same meaning.
> 
> That makes it sound as the word "cat" in English is
> "inherently ambiguous", but if we write a unique identifier
> "cat" in a pure, unsullied declarative language, it will
> magically aquire a unique meaning that nobody (or no
> computer) could possibly confuse with anything else.
> 
> The solution that the authors propose is the same one
> that Frege, Russell, and the logical positivists were
> hoping to achieve with logic -- and they failed miserably:
> 
>     Ultimately, the only way to ensure precise meanings
>     is to move away from natural language toward
>     pure semantic codes and relationships; that is, use
>     unique identifiers to identify concepts. (We may draw
>     an analogy here with the UPC [universal product code]
>     identifier that has no significance other than that
>     it is unique.) Do not use labels or names of things.
>     Rather, determine meaning by the sum of all the
>     relationships the concept has.
> 
> A UPC has a unique meaning because it has been legislated
> to have a unique meaning.  Once you have legislated the
> meaning, its meaning is unique in any context, linguistic
> or nonlinguistic (such as swiping a light pen over it).
> 
> People have been legislating meanings for words and concepts
> in natural languages since the time of Socrates.  It is done
> all the time for concepts in mathematics and science --
> and those concepts are expressed by "labels" or "names" in
> natural language sentences.  That has been done repeatedly
> and successfully since the time of Euclid.
> 
> But even in mathematics, where the most precise NL usage
> can be found, it is very rare for any two mathematicians
> (or even any single mathematician) to use the same term
> in exactly the same way in two different publications.
> Therefore, it is common for mathematical publications
> to have an opening section (or an appendix) that states
> the definitions and axioms that are assumed.
> 
> Note the lessons to be learned:
> 
>   1. The only concepts that are ever precisely defined
>      are legislated concepts -- ones whose meanings are
>      stipulated or agreed by convention.
> 
>   2. Those agreements can be formalized and used in
>      natural languages just as well as in any artifical
>      language.  That fact has been demonstrated
>      repeatedly since the time of Socrates and Euclid.
> 
>   3. But establishing agreements that hold for more
>      than one context, such as one Platonic dialog,
>      a single publication in mathematics, a single
>      computer program, or a single database system,
>      is extremely difficult and extremely rare.
> 
>   4. In computer systems, the only legislated concepts
>      that are repeatedly used in a fixed sense are ones
>      that are embodied in programming code that is very
>      hard to change, such as the kernel of an operating
>      system, the compiler of a programming language, or
>      a library of programs that are fundamental to the OS
>      or the compiler.  Even then, those meanings change
>      with every release or patch to the OS or the compiler.
> 
>   5. Outside of mathematics and computers, the most common
>      attempts to legislate meaning are in the legal system.
>      The US Constitution is the world's first and most
>      successful attempt to legislate a complete system
>      of government and the concepts used to describe it.
>      But that success depends completely on the ongoing
>      efforts to interpret, extend, and clarify those
>      concepts by the three branches of government
>      (judicial, legislative, and executive).  With such
>      a mechanism of constant reinterpretation, the system
>      has survived and flourished for over two centuries.
>      Without it, the Constitution would have been a
>      useless piece of paper.
> 
> Note that in every one of these examples -- in mathematics,
> computer systems, and government -- the mechanism of
> enforcement, interpretation, and reinterpretation of the
> semantics is the *ONLY* guarantee that the concepts are
> used with a common meaning.  The use of a natural language
> or an artificial language does not make the slightest
> difference in determining whether a concept's meaning
> shifts or stays constant.
> 
> Summary:  The Seybold article is a restatement of a hope
> that Frege and Russell proposed a century ago.  Wittgenstein
> and the logical positivists tried to achieve it with logic,
> and they failed miserably.  The semantic webbers have zero
> chance of achieving it by replacing natural language with
> any other kind of language or notation.
> 
> Bottom line:  You can't solve a problem by ignoring it.
> 
> For more about semantics and the failure of logical
> positivism, I recommend the following:
> 
>     http://www.jfsowa.com/pubs/signproc.htm
>     Signs, Processes, and Language Games
> 
> John Sowa