Re: Fw: min / max and empty intervals - Entire and Missing Data
Ian,
Yes. We seem to be using the word "unknown" in two semantically
different ways. An interval analysis theorist may view an interval
either as a "partially but not exactly known point value," or as
an entire set of values that is actually taken on, two different
semantic meanings attached to an interval. In the case of
an interval as an imprecisely known point, the idea of "unknown"
translates to the entire real line. In the case of the set of
values, the real line corresponds to "everywhere," without the
concept of "unknown" Your concept of unknown as meaning
missing data is yet a third idea, for which it is clear that
the representation by the real line is not useful.
On the other hand, there is a core of basic interval operations that
is useful in all three situations. Is our work with the exceptional
situations and decorations an attempt to accommodate these different
higher-level semantics over these common basic operations? If so,
we may end up either needing to add additional specialized operations
(or decorations), or else produce a much smaller standard. In your
case, my first thought is an additional decoration (e.g. applying
to the "empty" set) might work.
Best regards,
Baker
On 6/9/2011 2:00 PM, Ian McIntosh wrote:
From:
Ralph Baker Kearfott <rbk@xxxxxxxxxxxx>
To:
Ian McIntosh/Toronto/IBM@IBMCA
Date:
06/09/2011 02:11 AM
Subject:
Re: min / max and empty intervals
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
On 6/9/2011, Ralph Baker wrote:
>On 6/7/2011 7:07 AM, Arnold Neumaier wrote:
>> Dan Zuras Intervals wrote:
>>>> Date: Tue, 07 Jun 2011 08:43:03 +0200
>>>> From: Arnold Neumaier <Arnold.Neumaier@xxxxxxxxxxxx>
>>>> To: Dan Zuras Intervals <intervals08@xxxxxxxxxxxxxx>
>>>> CC: John Pryce <j.d.pryce@xxxxxxxxxxxx>, stds-1788@xxxxxxxxxxxxxxxxx
>>>> Subject: Re: min / max and empty intervals
>>>>
>>>> Dan Zuras Intervals wrote:
>
>>> Actually, John touched on the reasonable application for
>>> NaNs in a matrix. That of as yet unknown data.
>>
>> Unknown values represented in interval analysios by Entire, not by Empty!
>
>I personally agree with Arnold on this point. It is tied to the basic
>philosophy underlying interval analysis.
>
>Baker
I understand and agree with the reasons that unknown should normally be Entire, not Empty (and BTW, one of the multiple meanings of NaN is equivalent to Entire).
At the same time, a standard is better if it can apply to diverse situations, not just the usage that led to it, so we should think about other potential applications and their implications.
Suppose I had a large set of data and wanted to do some analysis on it, but many values were unknown. If unknown is represented as Entire and I use the obvious approach, then most of my answers will be Entire and I will know nothing except
that I know nothing. If I skip unknown values then I can produce answers tighter than Entire, and I will know something, with the caveat that the answers are not guaranteed to be correct. I may consider that useful.
Let's take a concrete example. For the set of all adults in the USA on July 1st 2011, measure their height. Since there's some measurement error and height can vary throughout the day, use intervals with reasonable bounds.
Now find out the minimum and maximum heights. No problem.
But what if you only have data for 1% of the people? If you treat the unknowns as Entire, the minimum height is -oo and the maximum +oo. Ruling out negatives still doesn't give a useful answer. If you treat the unknowns as "Ignore this
unknown value", then you get minimum and maximum heights for the people you have data for. You can't claim that the answers are exactly what you were asked, but you can say they are correct for the 1% subset of the cases you have data for,
and if you know statistics you may say that the true answers should not be a much larger range than the answers for this subset.
There are _many_ real applications where intervals could be useful if missing data can be ignored and the limitations are understood as part of the results.
So here are my questions: Can we define a decoration for "missing data" or "unknown", and decoration operations which when encountering that produce "some data is missing" or "some data is unknown"? Is it better to define specific
operations to be used in such cases (eg, max_known_value)? Can either or both of those be done in a consistent way? Can they be done without damaging other things? Would that increase (or decrease?) the usefulness of the standard?
- Ian McIntosh IBM Canada Lab Compiler Back End Support and Development
--
---------------------------------------------------------------
Ralph Baker Kearfott, rbk@xxxxxxxxxxxxx (337) 482-5346 (fax)
(337) 482-5270 (work) (337) 993-1827 (home)
URL: http://interval.louisiana.edu/kearfott.html
Department of Mathematics, University of Louisiana at Lafayette
(Room 217 Maxim D. Doucet Hall, 1403 Johnston Street)
Box 4-1010, Lafayette, LA 70504-1010, USA
---------------------------------------------------------------