Thread Links			Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Re: small correction to the Vienna Proposal

To: STDS-1788@xxxxxxxxxxxxxxxxx
Subject: Re: small correction to the Vienna Proposal
From: Ian McIntosh <ianm@xxxxxxxxxx>
Date: Mon, 23 Feb 2009 14:54:55 -0500
Delivered-to: mhonarc@xxxxxxxxxxxxxxxx
List-help: <http://listserv.ieee.org/cgi-bin/wa?LIST=STDS-1788>, <mailto:LISTSERV@LISTSERV.IEEE.ORG?body=INFO%20STDS-1788>
List-owner: <mailto:STDS-1788-request@LISTSERV.IEEE.ORG>
List-subscribe: <mailto:STDS-1788-subscribe-request@LISTSERV.IEEE.ORG>
List-unsubscribe: <mailto:STDS-1788-unsubscribe-request@LISTSERV.IEEE.ORG>
Sender: stds-1788@xxxxxxxx

There's no perfect answer. Correctness costs.

m = 0.5 * (l + u) will overflow when the sum l + u does, eg, when both l and u are in the most positive half of the positive representable range, or both in the most negative half of the negative representable range.

m = l + 0.5*(u - l) will overflow when the difference u - l does, eg, when l is negative and u positive and each is in the most nonzero half of the representable range for its sign.

Changing that to round towards zero avoids the overflow.

m = 0.5*l + 0.5*u will only overflow if both l and u have the same sign and both are equal to the maximum magnitude representable value of that sign, and the rounding method used for the calculation rounds it away from zero. It can also underflow, so would unnecessarily turn 0.5*SUBNORMAL_MIN + 0.5*SUBNORMAL_MIN into zero when the other ways would give SUBNORMAL_MIN. (Note SUBNORMAL_MIN is not a standard name.)

It may be possible to avoid problems by using m = 0.5*l + 0.5*u and rounding the multiplies in opposite directions (somebody please check that). Since changing rounding modes is expensive on some architectures, a faster alternative on them is m = 0.5*l - (-0.5*u) rounding both multiplies in the same direction, if you can prevent your compiler from optimizing it into m = 0.5*l + 0.5*u.

These can all give reasonable but slightly differently rounded answers. Providing portability and repreducability means all implementations must use the same way.

Costs are:
m = 0.5 * (l + u): 1 constant load + 1 add/subtract + 1 multiply
m = l + 0.5*(u - l): 1 constant load + 2 adds/subtracts + 1 multiply
m = l + 0.5*(u - l) rounded down: 1 constant load + 2 adds/subtracts + 1 multiply + on some architectures cost of saving, setting and restoring the rounding mode
m = 0.5*l + 0.5*u: 1 constant load + 1 add/subtract + 2 multiplies
m = 0.5*l - (-0.5*u): either 2 constant loads + 1 add/subtract + 2 multiplies
or 1 constant load + 2 adds/subtracts/negates + 2 multiplies

If the cost doesn't matter, avoiding overflow and underflow error in all cases is easy - just check the signs and values first, then choose a formula safe for those.

Integer index calculations have fewer problems because in C an index is always nonnegative, so l and u must have the same sign, and integer arithmetic rounding is always towards zero, so INT_MAX/2 + INT_MAX/2 can't oveflow.

- Ian Toronto IBM Lab 8200 Warden D2-445 905-413-3411

----- Forwarded by Ian McIntosh/Toronto/IBM on 23/02/2009 02:18 PM -----

Guillaume Melquiond <guillaume.melquiond@xxxxxxxx>

23/02/2009 01:35 PM

Please respond to
Guillaume Melquiond <guillaume.melquiond@xxxxxxxx>

To	Ian McIntosh/Toronto/IBM@IBMCA
cc
Subject	Re: small correction to the Vienna Proposal

Le lundi 23 février 2009 à 11:14 -0700, Nelson H. F. Beebe a écrit : > Chenyi Hu writes today: > > >> I'd prefer m = 0.5 * (l + u) rather than m = l + 0.5*(u - l) because of > >> > >> 1. Avoiding the loss of significance in performing u-l for narrow intervals > >> 2. One less arithmetic operation. > > Chenyi's alternative suffers from premature overflow when l and u are > greater than half the maximum representable number. The proposed form > l + 0.5*(u - l) is safe. It depends what you mean by "safe". Getting a midpoint that is not a finite number (l big negative, u big positive) could break a lot of things. So it would be better to ensure that the midpoint is always finite: set round down m = l + 0.5 * (u - l) r = -(m - u) (Disclaimer, I have no idea if the formula above is always correct.) Best regards, Guillaume

Prev by Date: Re: The current proposal
Next by Date: Re: small correction to the Vienna Proposal
Previous by thread: Re: small correction to the Vienna Proposal
Next by thread: Reply to Beebe...
Index(es):
- Date
- Thread