Thread Links | Date Links | ||||
---|---|---|---|---|---|
Thread Prev | Thread Next | Thread Index | Date Prev | Date Next | Date Index |
Arnold Neumaier wrote:
> Ian McIntosh wrote:
>> Let's take a concrete example. For the set of all adults in the USA on
>> July 1st 2011, measure their height. Since there's some measurement error
>> and height can vary throughout the day, use intervals with reasonable
>> bounds.
>>
>> Now find out the minimum and maximum heights. No problem.
>>
>> But what if you only have data for 1% of the people? If you treat the
>> unknowns as Entire, the minimum height is -oo and the maximum +oo.
>
>And that's the only thing you can say with certainty. Interval analysis
>is about certainty, not about the estimation of probabilities. The
>latter is the domain of statistics, and will never be certain.
I disagree. You can say more than one thing with certainty:
1. The minimum height is -oo and the maximum +oo (your only certain result).
2. Using domain knowledge, you can make that 0 to +oo.
3. The minimum height in the known data is X and the maximum is Y, where 0 <= X <= Y and X <= Y << +oo.
(Obviously what you CAN'T say is that the minimum height is X and the maximum is Y.)
4. The decoration would say that some unknown values had been omitted, so the results should not be confused with calculations that included all values.
(The program should detect that and print it in some meaningful application-specific way.)
5. If you bother to count, you can say that the known data is a specific fraction (in this example 1%) of the total data.
If the calculation used operations with names like minknown(a,b), maxknown(a,b) and countknown(a,previouscount) then only users wanting to would see any difference. Those wanting to know only that the height is between 0 and +oo would not be affected.
Where's the harm in providing better tools to deal with unknown data? Partial data is almost universal in the real world, and we have to deal with it, just as we have to deal with finite precision in measurements.
Of course any programmer could write the functions I suggested on top of the standard, so they don't have to be part of the standard. The main advantage is the proposed "unknown data omitted" decoration, to distinguish certainty from uncertainty. It's a lot like using ranges instead of single values, to make the uncertainty in the exact value of a result visible and quantifiable instead of hidden and possibly infinite.
- Ian McIntosh IBM Canada Lab Compiler Back End Support and Development