Thread Links			Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index
An exploration of Inexact FP Arithmetic

To: stds-1788 <stds-1788@xxxxxxxxxxxxxxxxx>
Subject: An exploration of Inexact FP Arithmetic
From: Michel Hack <mhack@xxxxxxx>
Date: Thu, 22 Mar 2012 14:52:02 -0400
Delivered-to: mhonarc@xxxxxxxxxxxxxxxx
List-help: <https://listserv.ieee.org/cgi-bin/wa?LIST=STDS-1788>, <mailto:LISTSERV@LISTSERV.IEEE.ORG?body=INFO%20STDS-1788>
List-owner: <mailto:STDS-1788-request@LISTSERV.IEEE.ORG>
List-subscribe: <mailto:STDS-1788-subscribe-request@LISTSERV.IEEE.ORG>
List-unsubscribe: <mailto:STDS-1788-unsubscribe-request@LISTSERV.IEEE.ORG>
Sender: stds-1788@xxxxxxxx
Lee Winter's comments on Inexact Infinity reminded me of my comments about
inexact zeros last January (UTC, for me it was still December 2011).  I was
commenting on Siegfried Rump's Ru11c.pdf, and wrote:
| On the substance, this is excellent work, and prods me to revisit some of
| my earlier attempts to distinguish between exact and inexact zeros and
| infinities (see posts of 2008 Oct29 &ff, 2010 Apr22, Apr28 and Sep20 &ff).
| The crucial idea is indeed to define a set of atomic intervals (possibly
| singletons).  I'll report if and when I come up with something interesting.

Well, I now enclose the "something interesting" (496 lines).  I'm sending
a .pdf version to Jürgen so he can post it on the website.

Michel.

Enclosed:  IFP MEMO (496 lines, with ANSI carriage control)

1 IFP MEMO -- An exploration of Inexact FP Arithmetic (MH, Jan 2012)   Page   1


  An exploration of Inexact FP Arithmetic
  ---------------------------------------

  The intent is to carry IN THE FP DATUM an Inexactnes indication,
  which (i. a.) would permit distinguishing overflow from infinity,
  and underflow from zero.  Conceivably a mode bit would provide a
  re-interpretation of the four basic rounding modes as well as
  details of the arithmetic and interpretation of the formats.

  The basic idea is to dedicate the low-order significand bit to the
  I-bit, or inexactness indicator, thus reducing the effective precision
  by one bit.  Special considerations are needed for DFP, where
  unnormalised (but normal magnitude) representations are by definition
  exact: these are representations with a zero high-order digit and a
  nonzero biased exponent (i.e. not subnormal).  The I-bit is a regular
  low-order bit for these "unnormal" DFP numbers, so that in particular
  exact integers and exact fixed-point decimals can be represented with
  their preferred quantum (i.e. expected exponent).  For normalised or
  subnormal DFP representations the low-order bit is however an I-bit,
  so that the effective low-order digit is even.  (This means we have to
  be especially careful in defining the rounding rules in the to-nearest
  family, i.e. not the directed rounding modes.)  Luckily the low-order
  bit is independent of BID vs DPD encoding, so the rules to be defined
  here do not depend on the encoding.

  The above only applies to nonzero finite numbers.  The largest magnitude
  is one ulp less than for regular FP, and the smallest magnitude is twice
  that of regular FP for a given format.  The smallest normal magnitude is
  the same as for regular FP (because its low-order bits are all zero).

  Inexact Infinity (the result of overflow rounded away from zero) can use
  the normal I-bit in the case of DFP, but in the case of BFP it takes one
  bit away from the NaNcode.  Given the bit-reversed interpretation used by
  IBM (e.g. when converting NaNcodes between BFP and DFP), this is a loss
  of the high-order bit, so NaNcodes interpreted as small integers (i.e.
  the kind that is expected to survive format conversions) are not affected.

  Exact and Inexact zero present the biggest conceptual problems, because
  of the interpretation of the sign bit.  Regular 754 arithmetic has precise
  rules for signed zero.  These are of course not affected, as IFP is to be
  a distinct mode of FP arithmetic.  The question is how these rules are to
  be transferred to the IFP mode.

  The problem is that there are two kinds of inexact zero:  with or without
  a definite sign.  The square of an inexact nonzero number is definitely
  positive, and multiplying it by a definitely negative number (not inexact
  zero) should yield a definitely negative result, even in the presence of
  underflow.  On the other hand, the difference between two inexact numbers
  with identical representations is an inexact zero with indeterminate sign.

  One possible solution to this conundrum is to model IFP arithmetic on the
  projective (one-point) compactification of the Reals, with an unsigned
  exact Infinity, namely the inverse of an unsigned exact zero.  We can then
  re-interpret the sign bit of Infinity and Zero to denote an indeterminate
  sign when the I-bit is zero.  (When the I-bit is one, Infinity denotes a
  signed overflow, and Zero denotes a signed underflow.)

1 IFP MEMO -- An exploration of Inexact FP Arithmetic (MH, Jan 2012)   Page   2


  IFP arithmetic with finite nonzero numbers
  ------------------------------------------

  The rounding is described in terms of the Parity (for round-to-nearest-even
  only), Rounding and Sticky bits (PRS), following conceptually exact arithmetic
  with unbounded precision and range.  Given that the effective precision is
  one bit less than for regular FP arithmetic with a given format, the position
  of the PRS bits is shifted one bit to the left, i.e. the R bit occupies the
  position of the eventual I-bit (lowest-order significand bit).  For DFP the
  P-bit needs to be described separately as 10 is not divisible by 4, but the
  main focus is on the RS bits anyway.

  As a first step, the S bit is initialised to the OR of the operands' I-bits.
  Note that for "unnormal" DFP, there is an implied I-bit of 0 because the
  complete significand (with a zero high-order digit) is presumed exact.

  Except for "unnormal" DFP, the low-order significand bit of each operand
  is set to zero, and the conceptually exact intermediate result is computed.
  In the case of division or square root, infinitely many low-order bits may
  occur, but they will collectively simply contribute to the S-bit.

  At that point, all bits or digits to the right of the R-bit are ORed together
  into the already-initialised S-bit.  This works for DFP because the R-bit is
  the low-order bit of a low-order result digit, so that only complete digits
  are to its right.  (In this context, significance decreases going right.)

  For DFP, if the S bit is now zero (typically operands were "unnormal") and
  the result is such that the high-order digit would be zero ("unnormal"
  result), the R-bit is ignored, i.e. the complete low-order digit survives
  even if it is odd (there is no I-bit in "unnormal" DFP numbers).  The result
  is indeed exact and no rounding applies.  If however the result has more
  than precision-1 digits (or if it was an inexact square root or quotient),
  it is subject to normalization.

  For BFP, and for DFP when the S-bit is one, the intermediate result is
  subject to normalisation, which determines the actual position of the R-bit
  and hence the set of bits to its right that will contribute to the S-bit.
  This conceptual shift is accompanied by an exponent adjustment, provided
  the biased exponent remains positive.  If this is not possible, the result
  will be subnormal, and sufficiently many zero high-order bits (or DFP
  digits) will be tacked on at the left end to lead to a biased exponent of
  zero.  If this leads to all zero digits or bits to the left of the R-bit,
  we have total underflow, and the result will be an inexact zero or inexact
  2*Dmin (smallest subnormal nonzero magnitude), depending on the rounding
  direction.  There is one exception: subtraction of two equal magnitudes:
  this tricky case will be discussed later.  (If overflow or underflow is
  trapped, the exponent range is effectively unbounded and does not affect
  rounding or inexactness; the trap handler will be provided with a wrapped
  exponent or a separate scale factor, as appropriate, and normalization is
  always feasible if indicated.)

  If exponent overflow occurs, the S-bit is forced to 1, and the result will
  be inexact Infinity or Nmax, depending on the rounding direction.

1 IFP MEMO -- An exploration of Inexact FP Arithmetic (MH, Jan 2012)   Page   3


  Normal rounding is then applied, and the R-bit is replaced with the S-bit,
  which becomes the result's I-bit.  Rounding overflow could lead to delayed
  exponent overflow, which is treated as described above.  Note that rounding
  overflow cannot turn an "unnormal" DFP into a normal number, because only
  exact results can be "unnormal", in which case there is no rounding.

  Round-to-nearest-even for DFP needs a separate description.  That's because
  only odd low-order digits are present (the low-order bit is occupied by the
  I-bit, which is one), though they actually represent even digits (the I-bit
  overlays a conceptually zero bit).  Again, "unnormal" numbers (which don't
  have an I-bit) are not affected since they are always exact, and rounding
  simply does not apply.

  Tie breaking is needed when R=1 and S=0.  The idea of ties-to-even is to
  break ties in a balanced manner.  This is not possible when there are only
  five digits to choose from in the low-order digit.  One possibility would
  be to use the parity of the next-to-lowest order digit to decide whether
  to round up or down, in a balanced manner.  Not much more complicated, and
  closer to the BFP model, would be to break ties towards a multiple of four,
  where the P bit is the XOR of the low-order bit of the next-to-last digit
  and the next-to-last bit of the last digit.  Note that DFP often uses the
  financial ties-away-from-zero rule, which rounds away from zero when R=1,
  disregarding both P and S bits, so the choice of rule for ties-to-even may
  not be critical (but it has to be decided).

  Round-to-prepare-for-shorter-precision is actually not needed anymore
  since Inexactness is not lost.  It can therefore be implemented as any
  other rounding mode, perhaps simply truncating (i.e. ignoring the R bit).
  This works for BFP as well as for DFP.


  IFP subtraction of equal magnitudes
  -----------------------------------

  When both operands are exact (S=0), the result is an exact zero.  For
  regular FP arithmetic this would be +0 except for Floor rounding.  IFP
  uses an unsigned exact zero, so it would be +0 in all cases.  For DFP
  the preferred quantum rule would be followed.

  When at least one operand is inexact (S=1), rounding applies.  Directed
  rounding is mostly easy: round towards zero would yield an *exact* zero
  (which is one of the exceptions to inexactness-preservation), whereas
  Floor and Ceiling yield an appropriately-signed inexact operand ulp (if
  only one operand was inexact or two ulps (if both were inexact).  There
  is however a problem with round-away-from-zero: the result should be of
  the same magnitude as for Floor or Ceiling, but with an indeterminate
  sign: this rounding mode (which exists for IBM DFP) should perhaps not
  be supported, or this case should lead to Invalid Operation with a NaN
  result as a last resort.  (In other words, underflow away from zero has
  a scale that reflects the source magnitude, like DFP zeros.)

  Note: by defining signed inexact tiny rounded away from zero to be at
  least 4*Dmin in all cases, including when the cancelling operands are both
  inexact subnormals, we avoid Dmin trouble when we use tow low-oder bits to
  encode inexact zeros.  (See discussion in the "More to do..." section.)

1 IFP MEMO -- An exploration of Inexact FP Arithmetic (MH, Jan 2012)   Page   4


  Round-to-nearest (note that there can be no tie, as R=0, so the various
  flavours of tie breaking are all equivalent) naturally leads to an inexact
  zero with indeterminate sign, a datum we do wish to support.  Since exact
  zero is unsigned in IFP, we could pick the encoding of -0.  For DFP the
  quantum is probably best computed the same way as for exact zero, since
  this preserves an indication of the absolute scale of inexactness.  See
  below for the rules of arithmetic with indeterminate-sign inexact zero.

  (An inexact signed zero can only arise from multiplication or division
  underflow, as an inexact signed difference is at least one ulp.)


  Addition/Subtraction of zeros and finite numbers
  ------------------------------------------------

  If, of two zero operands, at least one is of indeterminate sign, the
  result is an inexact zero of indeterminate sign, except in the case
  of round-towards-zero, where the result is an exact zero.  The same
  applies to the addition of inexact zeros of opposite signs, or the
  subtraction of inexact zeros of the same sign.  The sum or difference
  of two exact zeros is of course an exact zero; the sum or difference
  of an exact zero and a signed inexact zero depends on the rounding
  mode, and is either an exact zero (if rounding towards zero) or a
  signed inexact zero of the appropriate sign (if rounding away from
  zero, or to nearest); there is no sign ambiguity.

  If one operand is non-zero, Inexactness is propagated.  An exact zero
  causes the result to be the other operand (with its sign reversed if
  it was the subtrahend), and the rounding mode has no effect.  For an
  inexact zero (signed or not) and round-to-nearest (any flavour), the
  result is the other operand (with sign reversed if subtrahend), and
  forced Inexact.  For a signed inexact zero and directed rounding,
  the result is either of the same magnitude, or rounded up or down,
  depending on whether the sign of zero and the rounding direction
  agree or disagree.  For an inexact zero of indeterminate sign and
  directed rounding, the result is always rounded up or down based on
  the rounding direction alone.

  Multiplication of finite numbers by zeros
  -----------------------------------------

  Multiplication by exact zero yields exact zero, regardless of rounding
  mode.  Multiplication by inexact zero leads to inexact zero of the same
  kind (signed or indeterminate-sign); signed zero has the exclusive-OR
  of the operand signs.  Division of zero by nonzero follows the same rule
  as multiplication.

1 IFP MEMO -- An exploration of Inexact FP Arithmetic (MH, Jan 2012)   Page   5


  Division of finite numbers by zeros
  -----------------------------------

  Division of exact zero by any inexact zero yields exact zero, regardless
  of rounding rule.  Division of any nonzero by exact zero yields exact
  unsigned infinity (and may signal divide-by-zero).  Division of nonzero by
  inexact signed zero behaves like overflow and rounds to inexact infinity
  or to inexact Nmax, depending on the rounding direction like overflow; the
  sign is the exclusive-OR of the operand signs.  Division by inexact zero of
  indeterminate sign leads to infinity of indeterminate sign, regardless of
  rounding mode.

  This is the usual source of infinities (other than literals).


  Square Root of zero
  -------------------

  Square root of negative signed zero, or of zero of indeterminate sign,
  signals Invalid Operation and results in a NaN.  Square root of exact
  zero is exact zero; square root of positive inexact zero is positive
  inexact zero except when rounding towards zero or Floor, in which case
  it is exact zero.


  Infinities
  ----------

  As seen under "division by zero" above, there are four infinities, which
  are the inverses of the four zeros: exact unsigned, inexact  signed, and
  inexact of indeterminate sign.  Since exact +Inf is considered to be
  unsigned we could use -Inf to encode infinity of indeterminate sign.
  The signed inexact infinities have the I-bit set, and for BFP these
  would look like NaN in regular FP arithmetic, which is a shame.

  It might be better to invert the meaning of the I-bit for BFP infinities,
  so that Inexact signed infinities behave like regular infinities when
  used in regular FP arithmetic, and the exact unsigned or inexact Inf
  of indeterminate sign would indeed be treated as NaNs, which seems
  appropriate; in fact, they should be made signalling NaNs.  But this
  only applies to BFP; for DFP the I-bit is part of a payload that will
  be ignored by regular DFP arithmetic; it just so happens that the sign
  would be misleading for exact or indeterminate-sign infinities.  (This
  issue is discussed further under "NaNs", below.)


  Addition/Subtraction of infinities
  ----------------------------------

  If at least one operand is exact infinity, the result is exact infinity
  regardless of rounding direction.  Even Inf-Inf = Inf (not NaN) for two
  exact infinities, because -Inf == Inf (unsigned) and Inf-Inf = Inf+(-Inf).

1 IFP MEMO -- An exploration of Inexact FP Arithmetic (MH, Jan 2012)   Page   6


  Otherwise, if at least one operand is an infinity of indeterminate sign,
  the result is an inexact infinity of indeterminate sign -- except for
  Floor and Ceiling rounding, where it is a negative or positive inexact
  infinity, respectively.  The same applies to the addition of inexact
  infinities of opposite signs, or the subtraction of inexact infinities
  of the same sign.  An exception is round-away-from-zero which, if it is
  supported at all, would yield exact infinity (just as round-towards-zero
  of underflow yields exact zero).


  Multiplication by infinities
  ----------------------------

  Multiplication of exact zero by inexact infinity is exact zero.
  All other multiplicative combinations of zeros and infinities
  are an Invalid Operation and result in a NaN.

  Multiplication of a finite number by inexact infinity leads to
  inexact infinity of the same kind (signed or indeterminate-sign);
  signed infinity has the exclusive-OR of the operand signs, in
  round-to-nearest (any flavour).  Round-towards-zero of a signed
  result, whether explicit or due to Floor or Ceiling, leads to Nmax
  of the appropriate sign.  Round-away-from-zero of a signed result,
  whether explicit or due to Floor or Ceiling, leads to inexact
  infinity of the appropriate sign.  Implicit round-away-from-zero
  (Floor or Ceiling) of infinity of indeterminate sign leads to
  signed inexact infinity of the appropriate sign.  When supported,
  explicit round-away-from-zero leads to exact infinity.

  Division of infinity by a nonzero finite number follows the same
  rule as multiplication.


  Division by infinities
  ----------------------

  Division of exact zero by any infinity yields exact zero, regardless
  of rounding rule.  Division of inexact zero by any infinity leads to
  inexact zero, of indeterminate sign if either the zero or the infinity
  had indeterminate sign, otherwise the sign of inexact zero is the XOR
  of the operand signs.


  Square Root of infinity
  -----------------------

  Square root of negative signed infinity, or of infinity of indeterminate
  sign, is an Invalid Operation and results in a NaN.  Square root of exact
  infinity is exact infinity; square root of positive inexact infinity is
  positive inexact infinity except when rounding towards zero or Floor,
  in which case it is positive Nmax.

1 IFP MEMO -- An exploration of Inexact FP Arithmetic (MH, Jan 2012)   Page   7


  Format conversions of zeros and infinities
  ------------------------------------------

  Format conversions of unsigned zeros or infinities, whether exact
  or of indeterminate sign, propagate unchanged (i.e. same kind
  in the new format), except that exactness is forced for explicit
  round-towards-zero of zero, or explicit round-away-from-zero of
  infinity.

  Format conversions of inexact signed zeros or infinities present
  a problem however, for directed rounding of zeros away from zero,
  or of infinities towards zero.  The problem is that the size of
  Nmax and Dmin changes.  So when converting to a larger exponent
  range, the source Nmax or 2*Dmin should be converted (with the
  appropriate rounding) to the target format, and when converting
  to a smaller exponent range, rounding should be to the target
  Nmax or 2*Dmin.


  Text representation of Inexact FP entities
  ------------------------------------------

  Suggestion:  place '?' between the sign and the number.  For ?0 or ?Inf,
  the sign is relevant; for 0 or Inf without '?' the sign is irrelevant
  and meaningless (and would not be generated), and indeterminate sign
  (valid only for 0 and Inf) would be indicated with '??'  and no sign
  (on input a sign in front of ??  would be ignored).


  NaNs, and encoding of Inexactness and Unsignedness
  --------------------------------------------------

  For DFP the situation is easy:  NaNs don't have an I-bit, so the
  payload is not affected, and the distinction between Inf and Nan
  is independent of the payload.  DFP infinities are capable of
  having a payload, which is simply considered non-canonical if
  nonzero for regular FP.  DFP infinities could also encode the
  indeterminate-sign case by using two payload bits, so that the
  actual sign bit could simply be ignored.

  For BFP we have to steal at least one bit from the NaNcode,
  because we need an I-bit for Infinity.  In fact, we might as
  well take TWO low-order significand bits (which correspond
  to high-order NaNcode bits in the IBM BFP interpretation of
  NaNcodes, so small integers -- the preferred expression of
  NaNcodes -- would not be affected).  This would avoid the
  issues mentioned earlier for encoding indeterminate-sign
  infinity as -Inf.

  In fact, it might be nice to do the same thing for zero, but
  that would double again the smallest magnitude to 4*Dmin,
  and more seriously, break the (a==b) vs (a-b==0) equivalence.
  It would however greatly simplify the definition of the sign
  operations, which are supposed to be blind -- yet should not
  as a side effect change exact zeros to indeterminate-sign
  zeros, or vice versa.

1 IFP MEMO -- An exploration of Inexact FP Arithmetic (MH, Jan 2012)   Page   8


  Comparisons
  -----------

  (TBD -- there are some tricky issues here:  are two inexact numbers
  of the same nominal value comparable?  They could be treated either
  as EQUAL or UNORDERED, depending on the intent.  IEEE 754 does have
  two flavours of comparison, quiet and noisy, so perhaps that can be
  exploited here.)


  Application
  -----------

  This exploration was prompted by a paper by Prof. Siegfried Rump:
  "Interval Arithmetic Over Finitely Many Endpoints", Ru11c.pdf on
  www.ti3.tu-harburg.de (under Publications for Prof. Rump), as well as
  recent discussions on stds-1788@xxxxxxxxxxxxxxxxx (P1788 workgroup).

  The idea of distinguishing overflow/underflow from true infinity/zero
  popped up a number of times in stds-1788 -- notably in late Oct 2008,
  and in April and Sept 2010, prior to resurrection in Dec 2011.

  The arithmetic proposed here is intended for Interval Arithmetic, and
  is well-defined only for directed rounding modes Floor and Ceiling.
  It must of course be defined for the other rounding modes as well, but
  inconsistencies can occur if different rounding modes are applied in
  incompatible sequences, for example by a sequence of additions with
  rounding alternating between Floor and Ceiling.

  Prof. Rump's proposed arithmetic differs from what is being discussed
  here primarily by supporting only signed inexact infinities and zeros
  (the former called HUGE and the latter called TINY, each representing
  an interval from Nmax to Inf, or from Zero to Dmin, respectively.  The
  finitely-many endpoints are themselves sets of reals, in fact either
  intervals or singletons, satisfying certain constraints.  The paper
  is mathematical and addresses coding issues only indirectly.


  More to do...
  -------------

  The encoding of indeterminate-sign zero and infinity needs to be revisited,
  most likely by stealing two low-order significand bits for Zero and Inf, and
  checking out the consequences, the most damaging of which is what happens
  to Dmin, the smallest non-zero (subnormal) magnitude.  Since this is done
  only for zero, there is no problem with DFP, as coefficient values from 0
  to 9 (we would use 0, 1, 3) are encoded identically in BFP, DPD and BID.
  (The term "coefficient" denotes the significand interpreted as an integer,
  not as a fraction.)  The great benefit of this approach is that the sign
  bit of unsigned or indeterminate-sign zeros and infinities simply becomes
  irrelevant, so that the blind sign bit operations are not affected at all.

  Another to-do is complete operation tables for all rounding modes and all
  combinations of zeros, positive and negative finites, and infinities.

1IFP MEMO -- An exploration of Inexact FP Arithmetic (MH, Jan 2012)   Page   9


  Tentative analysis of iDmin = 4*Dmin (two low-order bits for Inexact Zero)
  --------------------------------------------------------------------------

  Interpretation of the 3 low-order bits, all higher-order bits being zero,
  i.e. we are looking at the bottom range of subnormals, which in fact look
  identical in BFP, DPD and BID:

     000   Exact (unsigned) zero; sign bit is irrelevant
     001   Inexact signed zero (corresponds to Rump's TINY)
     010   Exact 2*Dmin difference of two exact subnormals (could be supported)
     011   Inexact zero of indeterminate sign; sign bit is irrelevant
     100   Exact 4*Dmin difference of two exact small numbers (could be normal)
     101   Total underflow rounded away from zero (signed nonzero inexact)
     110   Exact 6*Dmin difference of two exact small numbers (could be normal)
     111   Inexact 6*Dmin difference of two small numbers (partial underflow)

  What's a bit unusual here is that the smallest exact nonzero magnitude is
  half that of the smallest inexact nonzero magnitude -- so it's not clear
  which one should be called iDmin (i.e. "Dmin for IFP").  But in fact the
  magnitude of an inexact cancellation can be arbitrarily large, so the fact
  that it cannot quite be as small as in the case of the subtraction of two
  equal-magnitude inexact subnormals should not matter much.

  This scheme is consistent with the equivalence of comparison of two numbers
  and the comparison of their difference with exact zero -- assuming that two
  inexact numbers of equal magnitude compare Unordered (as their difference
  would be of indeterminate sign).

  We also need to consider the effects of multiplication and division.  The
  worst effect is probably that the inverse of the smallest positive inexact
  is only half the size of the smallest positive exact value.  In general the
  difference between inverses of very small numbers is wildly inaccurate, and
  numeric programs must already deal with such ill-conditioning.


  **************************************************************************

  Michel Hack, IBM Research, 7 Jan 2012
---Sent: 2012-03-22 19:00:07 UTC
Prev by Date: Re: Table 4 proposal version 0.2...
Next by Date: Re: Table 4 proposal version 0.2...
Previous by thread: John's asinh split versus convert-to-sortable...
Next by thread: An exploration of Inexact FP Arithmetic
Index(es):
- Date
- Thread