Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

SUO: RE: linguistic concepts




Hi Scott,

	See my comments below.

-Ian

> -----Original Message-----
> From: scott farrar [mailto:farrar@email.arizona.edu]
> Sent: Friday, February 22, 2002 1:41 PM
> To: standard-upper-ontology@ieee.org
> Subject: SUO: linguistic concepts
> 
> 
> 
> 
> Here are some further suggestions for linguistic concepts in SUMO.
> 
>  
>  SF: Scott Farrar
>  DW: David Whitten
>  IN: Ian Niles
>  JS: John Sowa
>  JA: John Awbry
> 
> 
> The first issue concerns the multiple senses of 'Language.' 
> Currently, SUMO has &%Language as a subclass of 
> &%LinguisticExpression,
> and as a sister of &%Word, &%Morpheme, &%Sentence, etc. After 
> SF's suggestion
> that &%Language include a set theoretic notion, DW suggested
> that 'language' really refers to at least these two concepts. 
> 
> Sense 1 "a mode or method of expressing a concept in a 
> linguistic fashion"
> Sense 2 a "set of utterances"
> 
> IN added that the difference was "something like the 
> intensional[sense 1]/extensional[sense 2] distinction...
> The intensional notion would be the set of grammatical 
> rules which generate the well-formed expressions of the language, and 
> the extensional notion would be the set of expressions which are 
> actually generated by these rules." IN suggests that we focus on the
> extensional sense, sense 2. 
> 
> To formalize this Noam Chomsky argues for a distinction between what 
> he calls I-Language and  E-Language.  I-Language is language viewed 
> as a mental object. On the other hand, E-Language is the product of a 
> mental grammar. Viewed as a mental object, then
> language (I-Language) IS the grammar itself, which is an 
> abstract concept,
> 
> call it 'Grammar'. The product of the grammar is the 
> E-Language which is
> really a set of  
> &%LingusticExpressions. So, I propose:
> 
> (subclass &%MentalGrammar [not sure about the superclass, but 
> it's &%Abstract])
> (generates &%MentalGrammar &%LinguisticExpression)
> 
> This implies removing &%Language from &%LinguisticExpression. 
> We could re-use
> the concept to denote the set of &%LinguisticExpressions 
> which is language
> viewed as a corpus, so we really have 3 senses of language.

I'm not sure I follow this last part abou there being three senses of
"language".  However, I agree that we need to change the definition of
&%Language in the SUMO.  How about we replace '(subclass Language
LinguisticExpression)' with the following:

(subclass Language (PowerSetFn LinguisticExpression))

In other words, the class of 'Languages' is a subclass of the class of all
subclasses of LinguisticExpression.

> 
> (subclass &%Language &%Set)
> (member-of &%LinguisticExpression &%Language)
> As JA said,  (disjoint &%Language &%FiniteSet) might not be valid. 
> 
> DW and JA pointed out that the notion of a formal language should be 
> incorporated because of the commonalities with natural language. 
> Adding the concept of a FormalGrammar to account for 
> non-natural languages,
> 
> we have:
> 
> (subclass &%FormalGrammar [not sure about the superclass, but 
> it's &%Abstract])
> (generates &%FormalGrammar &%LinguisticExpression)
> 
> And from formal language theory, we could add:
> 
> (subclass ContextSensitiveGrammar FormalGrammar)
> (subclass ContextFreeGrammar FormalGrammar)

I agree that the concepts of mental grammar, formal grammar, etc. are
important ones, but I would argue for including them in the
linguistics-specific extension of the SUMO rather than in the SUMO itself.
As a general rule, I think domain-specific theoretical posits don't belong
in an upper-level ontology.  

> ...
> 
> 
> 
> To summarize, we could have three concepts referring to what was 
> initially one:
> 
> 
> 1) &%MentalGrammar
> 2) &%FormalGrammar
> 3) &%Language
> 
> 
> 
> 
> 
> 
> 
> The second issue that was brought up is the status of 
> &%Language as spoken,
> 
> written, and signed. IN proposed the following:
> 
> (subclass WrittenLinguisticExpression LinguisticExpression)
> (subclass SpokenLinguisticExpression LinguisticExpression)
> (subclass GesturalLinguisticExpression LinguisticExpression)
> 
> 
> I think this distinction is necessary, because the spoken &%Word
> is definitely a separate concept from the written or signed &%Word. 
> (I might change Gestural to Signed, because Gestural implies
> early views of sign language when it wasn't considered 'language.')

I agree with Douglas McDavid's point that we need to justify the inclusion
of these various types of 'LinguisticExpression'.  More specifically, we
need to be able to show that these subclasses will likely promote
interoperability between information systems from different domains, that
they will allow needed inferences that wouldn't otherwise be possible, that
they will allow needed sense-disambiguation in a domain-independent IR
system or etc.

> 
> 
> Scott Farrar
> 
> 
> farrar@u.arizona.edu
> 1049 N. Jacobus Ave
> Tucson AZ  85705
> 
> 
> 
>