Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: small correction to the Vienna Proposal




There's no perfect answer.  Correctness costs.

m = 0.5 * (l + u)  will overflow when the sum l + u does, eg, when both l and u are in the most positive half of the positive representable range, or both in the most negative half of the negative representable range.

m = l + 0.5*(u - l)  will overflow when the difference u - l does, eg, when l is negative and u positive and each is in the most nonzero half of the representable range for its sign.

Changing that to round towards zero avoids the overflow.

m = 0.5*l + 0.5*u  will only overflow if both l and u have the same sign and both are equal to the maximum magnitude representable value of that sign, and the rounding method used for the calculation rounds it away from zero.  It can also underflow, so would unnecessarily turn  0.5*SUBNORMAL_MIN + 0.5*SUBNORMAL_MIN  into zero when the other ways would give SUBNORMAL_MIN.  (Note SUBNORMAL_MIN is not a standard name.)

It may be possible to avoid problems by using  m = 0.5*l + 0.5*u  and rounding the multiplies in opposite directions (somebody please check that).  Since changing rounding modes is expensive on some architectures, a faster alternative on them is  m = 0.5*l - (-0.5*u)  rounding both multiplies in the same direction, if you can prevent your compiler from optimizing it into   m = 0.5*l + 0.5*u.

These can all give reasonable but slightly differently rounded answers.  Providing portability and repreducability means all implementations must use the same way.

Costs are:
        m = 0.5 * (l + u):                                 1 constant load + 1 add/subtract + 1 multiply
        m = l + 0.5*(u - l):                        1 constant load + 2 adds/subtracts + 1 multiply
        m = l + 0.5*(u - l) rounded down:        1 constant load + 2 adds/subtracts + 1 multiply + on some architectures cost of saving, setting and restoring the rounding mode
        m = 0.5*l + 0.5*u:                                1 constant load + 1 add/subtract + 2 multiplies
        m = 0.5*l - (-0.5*u):                        either 2 constant loads + 1 add/subtract + 2 multiplies
                                                        or        1 constant load + 2 adds/subtracts/negates + 2 multiplies

If the cost doesn't matter, avoiding overflow and underflow error in all cases is easy - just check the signs and values first, then choose a formula safe for those.


Integer index calculations have fewer problems because in C an index is always nonnegative, so l and u must have the same sign, and integer arithmetic rounding is always towards zero, so INT_MAX/2 + INT_MAX/2 can't oveflow.

- Ian          Toronto IBM Lab   8200 Warden   D2-445   905-413-3411

----- Forwarded by Ian McIntosh/Toronto/IBM on 23/02/2009 02:18 PM -----
Guillaume Melquiond <guillaume.melquiond@xxxxxxxx>

23/02/2009 01:35 PM
Please respond to
Guillaume Melquiond <guillaume.melquiond@xxxxxxxx>

To
Ian McIntosh/Toronto/IBM@IBMCA
cc
Subject
Re: small correction to the Vienna Proposal





Le lundi 23 février 2009 à 11:14 -0700, Nelson H. F. Beebe a écrit :
> Chenyi Hu writes today:
>
> >> I'd prefer m = 0.5 * (l + u) rather than m = l + 0.5*(u - l) because of
> >>
> >> 1. Avoiding the loss of significance in performing u-l for narrow intervals
> >> 2. One less arithmetic operation.
>
> Chenyi's alternative suffers from premature overflow when l and u are
> greater than half the maximum representable number.  The proposed form
> l + 0.5*(u - l) is safe.

It depends what you mean by "safe". Getting a midpoint that is not a
finite number (l big negative, u big positive) could break a lot of
things. So it would be better to ensure that the midpoint is always
finite:

set round down
m = l + 0.5 * (u - l)
r = -(m - u)

(Disclaimer, I have no idea if the formula above is always correct.)

Best regards,

Guillaume