[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[stds-754] Re: GAMM-IMACS Proposal for Accurate Floating-Point Vector Arithmetic

In mail to this list on Thu, 20 Oct 2005 10:30:12 +0200, Ulrich
Kulisch <ae35@xxxxxxxxxxxxxxxxxxxx> kindly forwarded the GAMM-IMACS
Proposal for Accurate Floating-Point Vector Arithmetic.  

One implementation of this proposal would use internal accumulators
wide enough to hold numbers over the entire exponent range, requiring
widths of hundreds to tens of thousands of bits (for the 32-bit,
64-bit, 80-bit, and 128-bit formats).

For several decades, researchers in floating-point arithmetic have
investigated ways to improve the accuracy of floating-point summation
and dot product in purely-software implementations, and there are two
new articles cited below that add to this knowledge.  The fparith.bib
file at


includes them, and now marks all such articles with the special string
"accurate floating-point summation" in keywords values, making it easy
to find them.

The first article below in particular avoids sorting, avoids branches,
and avoids higher intermediate precision, producing several different
summation algorithms that, it appears to me, largely solve the problem
that the GAMM-IMACS proposal addresses.

The article's authors also note the great utility of hardware
fused-multiply-add instructions for use in the core of their
algorithms, providing further evidence of their desirability.

If this article does indeed solve the problem, then further
investigations are still merited to compare the efficacy of a
wide-accumulator solution (in software or hardware) with the
techniques of this article, possibly augmented with hardware assists.
That in turn can guide the committee in deciding whether to support
the proposal in the emerging revised IEEE 754 standard.

@String{j-SIAM-J-SCI-COMP       = "SIAM Journal on Scientific Computing"}

  author =       "Takeshi Ogita and Siegfried M. Rump and Shin'ichi
  title =        "Accurate Sum and Dot Product",
  journal =      j-SIAM-J-SCI-COMP,
  volume =       "26",
  number =       "6",
  pages =        "1955--1988",
  month =        nov,
  year =         "2005",
  CODEN =        "SJOCE3",
  DOI =          "10.1137/030601818",
  ISSN =         "1064-8275 (print), 1095-7197 (electronic)",
  MRclass =      "5-04, 65G99, 65-04",
  bibdate =      "Mon Nov 21 14:52:48 MST 2005",
  bibsource =    "http://epubs.siam.org/sam-bin/dbq/toc/SISC/26/6";,
  URL =          "http://epubs.siam.org/sam-bin/dbq/article/60181";,
  abstract =     "Algorithms for summation and dot product of
                 floating-point numbers are presented which are fast in
                 terms of measured computing time. We show that the
                 computed results are as accurate as if computed in
                 twice or K-fold working precision, $K\ge 3$. For twice
                 the working precision our algorithms for summation and
                 dot product are some 40\% faster than the corresponding
                 XBLAS routines while sharing similar error estimates.
                 Our algorithms are widely applicable because they
                 require only addition, subtraction, and multiplication
                 of floating-point numbers in the same working precision
                 as the given data. Higher precision is unnecessary,
                 algorithms are straight loops without branch, and no
                 access to mantissa or exponent is necessary.",
  acknowledgement = ack-nhfb,
  keywords =     "accurate dot product; accurate floating-point
                 summation; fast algorithms; high precision; verified
                 error bounds",

  author =       "Yong-Kang Zhu and Jun-Hai Yong and Guo-Qin Zheng",
  title =        "A New Distillation Algorithm for Floating-Point
  journal =      j-SIAM-J-SCI-COMP,
  volume =       "26",
  number =       "6",
  pages =        "2066--2078",
  month =        nov,
  year =         "2005",
  CODEN =        "SJOCE3",
  DOI =          "10.1137/030602009",
  ISSN =         "1064-8275 (print), 1095-7197 (electronic)",
  MRclass =      "65G05, 65B10",
  bibdate =      "Mon Nov 21 14:52:48 MST 2005",
  bibsource =    "http://epubs.siam.org/sam-bin/dbq/toc/SISC/26/6";,
  URL =          "http://epubs.siam.org/sam-bin/dbq/article/60200";,
  abstract =     "The summation of $n$ floating-point numbers is
                 ubiquitous in numerical computations. We present a new
                 distillation algorithm for floating-point summation
                 which is stable, efficient, and accurate. The algorithm
                 iteratively ``distills'' the summands without
                 discarding any significant digit until the partial sums
                 cannot change the whole sum. It uses standard
                 floating-point arithmetic and does not rely on the
                 choice of radix or any other specific assumption.
                 Furthermore, the error bound of our algorithm is
                 independent of $n$ and less than 1 ulp.",
  acknowledgement = ack-nhfb,
  keywords =     "accurate floating-point summation; distillation; rounding 

- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: beebe@xxxxxxxxxxxxx  -
- 155 S 1400 E RM 233                       beebe@xxxxxxx  beebe@xxxxxxxxxxxx -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe  -

754 | revision | FAQ | references | list archive