[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: question about underflow detection

Dan and Steve,

thank you so much for the answers. Let me think through these again
and if I still cannot figure this out, I will send out another email
with my questions.


2011/8/22 Dan Zuras IEEE <forieee@xxxxxxxxxxxxxx>:
From: Liang-Kai Wang <liangkai.wang@xxxxxxxxx>
Date: Mon, 22 Aug 2011 14:34:02 -0500
Subject: question about underflow detection
To: STDS-754@xxxxxxxxxxxxxxxxx


I have a question about how the standard raises underflow and would
like to see if somehow can help on this.

Here are the two cases (single precision multiply) I am trying to
reslve (assuming both use round tie to even)
1. 80ffffff x bf000000
Assuming we have unlimited precision and unbound exponent, we actually have
1.111_1111_1111_1111_1111_1111 * 2 ^ {-126} x 1.0 * 2 ^ {-1} =
1.111_1111_1111_1111_1111_1111 * 2 ^{-127}
Because the unbiased exponent is -127, it cannot be fit in the IEEE
format, and we need to right shift the mantissa by one bit to get
0.111_1111_1111_1111_1111_1111_1 * 2 ^{-126}

After rounding, we have 1.0 * 2 ^ {-126}

2. 807fffff x bf800001
0.111_1111_1111_1111_1111_1111 * 2 ^{-126} x
1.000_0000_0000_0000_0000_0001 * 2^{0} =
0.111_1111_1111_1111_1111_1111_1111_..._1 * 2 ^{-126}
After rounding, we have 1.0 * 2 ^ {-126} as well.

In both cases, should the underflow flag be raised based on the standard?



       You have found the exact place where this question comes
       up as well as the one case (your example (1)) the lies on
       the boundary.

       To answer your question as asked:  While 1.0*2^-126 must
       be returned in both cases, an implementation is free to
       choose whether or not to signal underflow with this result.

       The relevant clause in the 2008 standard is 7.5.

       But I find that the text in the 1985 standard clause 7.4
       illustrates the problem more clearly. I quote it verbatim
       at the end of this note.

       Underflow is defined as an extraordinary loss of precision
       due to tininess.  So the question becomes whether your
       implementation chooses to detect tininess before or after

       If it is before rounding, that is, if it is the infinitely
       precise result that is found to be tiny, then both your
       examples (1) & (2) should signal underflow.

       If it is after rounding, that is, if it is the rounded
       final result that is found to be tiny, then both your
       examples (1) & (2) need not signal underflow.

       But your example (1) is a boundary case.

       The question arises as to whether a result is considered
       tiny if it is rounded as if the exponent range were
       unbounded or if both the exponent range & the precision
       were considered unbounded.  The latter is equivalent to
       the before rounding case above.  But the former approach
       distinguishes between your examples (1) & (2) in that (1)
       is rounded in such a manner as to require 'extraordinary'
       (i.e. greater than 1/2 ULP) loss of precision & (2) loses
       no more precision than any similar result at any other
       binade boundary.  Thus, it is considered acceptable to
       signal underflow for (1) but not for (2).

       This is generally done only on implementations that claim
       to signal underflow after rounding.  But there have been
       exceptions as I recall.

       There is a 4th case but I have never heard of anyone
       implementing underflow in this manner.

       One person pointed out that underflow need only be
       signaled in the case of 'extraordinary' loss of precision.
       That COULD be interpreted to mean that if an otherwise
       tiny result comes close enough to ANY denormalized number
       as to require no more than 1/2 ULP change in its value,
       then one need not signal underflow.  This would mean that
       every denormalized number is surrounded by a small interval
       of results for which no extraordinary loss of precision is
       required to return that result.  It would also mean that
       underflow is not so much something that is bounded away
       from normal results as it is an ever denser foam of results
       that get signaled more often as they get smaller.

       Like I said, only one person ever came up with this contorted
       interpretation & I think it illustrates the difficulty in
       describing standards using unambiguous language more than
       something to be considered a viable interpretation of the
       concept of underflow.

       In any event, decide whether you want to round before or
       after tinyness & you're pretty much there.

       If you are designing a circuit to perform this function,
       tinyness after rounding becomes a worst case path in that
       circuit whereas tinyness before rounding is easier to do.

       But you might also want to find out how various existing
       machines (c.f. Intel x86, Motorola, IBM, et al) make this
       decision.  There might be marketing reasons why you want
       to make this choice one way or another as valid as any
       technical considerations.

       It is a small decision often made for big reasons which
       is never noticed by anyone but your fellow floating-point
       wizards.  It will define you in their company but no one
       else will care.

       There's a lot of that sort of thing in floating-point. :-)



Two correlated events contribute to underflow. One is the creation of a tiny
nonzero result between ±2Emin which, because it is so tiny, may cause some
other exception later such as overflow upon division. The other is
extraordinary loss of accuracy during the approximation of such tiny numbers 
denormalized numbers. The implementor may choose how these events are 
but shall detect these events in the same way for all operations. Tininess may
be detected either
1. After rounding - when a nonzero result computed as though the exponent 
were unbounded would lie strictly between ± 2Emin
2. Before rounding - when a nonzero result computed as though both the 
range and the precision were unbounded would lie strictly between ± 2Emin.
Loss of accuracy may be detected as either
1. A denormalization loss - when the delivered result differs from what would
have been computed were exponent range unbounded
2. An inexact result - when the delivered result differs from what would have
been computed were both exponent range and precision unbounded. (This is the
condition called inexact in 7.5).
When an underflow trap is not implemented, or is not enabled (the default
case), underflow shall be signaled (by way of the underflow flag) only when
both tininess and loss of accuracy have been detected. The method for 
tininess and loss of accuracy does not affect the delivered result which might
be zero, denormalized, or ± 2Emin. When an underflow trap hasbeen implemented
and is enabled, underflow shall be signaled when tininess is detected
regardless of loss of accuracy. Trapped underflows on all operations except
conversion shall deliver to the trap handler the result obtained by 
the infinitely precise result by 2æ and then rounding. The bias adjust æ i
192 in the single, 1536 in the double, and 3 × 2n–2 in the extende
format, where n is the number of bits in the exponent field. [FOOTNOTE 8: Note
that a system whose underlying hardware always traps on underflow, producing a
rounded, bias-adjusted result, shall indicate whether such a result is rounded
up in magnitude in order that the correctly denormalized result may be 
in system software when the user underflow trap is disabled.] Trapped
underflows on conversion shall be handled analogously to the handling of
overflows on conversion.

754 | revision | FAQ | references | list archive