[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[stds-754] Re: GAMM-IMACS Proposal for Accurate Floating-Point Vector Arithmetic
In mail to this list on Thu, 20 Oct 2005 10:30:12 +0200, Ulrich
Kulisch <ae35@xxxxxxxxxxxxxxxxxxxx> kindly forwarded the GAMM-IMACS
Proposal for Accurate Floating-Point Vector Arithmetic.
One implementation of this proposal would use internal accumulators
wide enough to hold numbers over the entire exponent range, requiring
widths of hundreds to tens of thousands of bits (for the 32-bit,
64-bit, 80-bit, and 128-bit formats).
For several decades, researchers in floating-point arithmetic have
investigated ways to improve the accuracy of floating-point summation
and dot product in purely-software implementations, and there are two
new articles cited below that add to this knowledge. The fparith.bib
file at
http://www.math.utah.edu/pub/tex/bib/index-table-f.html#fparith
includes them, and now marks all such articles with the special string
"accurate floating-point summation" in keywords values, making it easy
to find them.
The first article below in particular avoids sorting, avoids branches,
and avoids higher intermediate precision, producing several different
summation algorithms that, it appears to me, largely solve the problem
that the GAMM-IMACS proposal addresses.
The article's authors also note the great utility of hardware
fused-multiply-add instructions for use in the core of their
algorithms, providing further evidence of their desirability.
If this article does indeed solve the problem, then further
investigations are still merited to compare the efficacy of a
wide-accumulator solution (in software or hardware) with the
techniques of this article, possibly augmented with hardware assists.
That in turn can guide the committee in deciding whether to support
the proposal in the emerging revised IEEE 754 standard.
@String{j-SIAM-J-SCI-COMP = "SIAM Journal on Scientific Computing"}
@Article{Ogita:2005:ASD,
author = "Takeshi Ogita and Siegfried M. Rump and Shin'ichi
Oishi",
title = "Accurate Sum and Dot Product",
journal = j-SIAM-J-SCI-COMP,
volume = "26",
number = "6",
pages = "1955--1988",
month = nov,
year = "2005",
CODEN = "SJOCE3",
DOI = "10.1137/030601818",
ISSN = "1064-8275 (print), 1095-7197 (electronic)",
MRclass = "5-04, 65G99, 65-04",
bibdate = "Mon Nov 21 14:52:48 MST 2005",
bibsource = "http://epubs.siam.org/sam-bin/dbq/toc/SISC/26/6",
URL = "http://epubs.siam.org/sam-bin/dbq/article/60181",
abstract = "Algorithms for summation and dot product of
floating-point numbers are presented which are fast in
terms of measured computing time. We show that the
computed results are as accurate as if computed in
twice or K-fold working precision, $K\ge 3$. For twice
the working precision our algorithms for summation and
dot product are some 40\% faster than the corresponding
XBLAS routines while sharing similar error estimates.
Our algorithms are widely applicable because they
require only addition, subtraction, and multiplication
of floating-point numbers in the same working precision
as the given data. Higher precision is unnecessary,
algorithms are straight loops without branch, and no
access to mantissa or exponent is necessary.",
acknowledgement = ack-nhfb,
keywords = "accurate dot product; accurate floating-point
summation; fast algorithms; high precision; verified
error bounds",
}
@Article{Zhu:2005:NDA,
author = "Yong-Kang Zhu and Jun-Hai Yong and Guo-Qin Zheng",
title = "A New Distillation Algorithm for Floating-Point
Summation",
journal = j-SIAM-J-SCI-COMP,
volume = "26",
number = "6",
pages = "2066--2078",
month = nov,
year = "2005",
CODEN = "SJOCE3",
DOI = "10.1137/030602009",
ISSN = "1064-8275 (print), 1095-7197 (electronic)",
MRclass = "65G05, 65B10",
bibdate = "Mon Nov 21 14:52:48 MST 2005",
bibsource = "http://epubs.siam.org/sam-bin/dbq/toc/SISC/26/6",
URL = "http://epubs.siam.org/sam-bin/dbq/article/60200",
abstract = "The summation of $n$ floating-point numbers is
ubiquitous in numerical computations. We present a new
distillation algorithm for floating-point summation
which is stable, efficient, and accurate. The algorithm
iteratively ``distills'' the summands without
discarding any significant digit until the partial sums
cannot change the whole sum. It uses standard
floating-point arithmetic and does not rely on the
choice of radix or any other specific assumption.
Furthermore, the error bound of our algorithm is
independent of $n$ and less than 1 ulp.",
acknowledgement = ack-nhfb,
keywords = "accurate floating-point summation; distillation; rounding
error",
}
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: beebe@xxxxxxxxxxxxx -
- 155 S 1400 E RM 233 beebe@xxxxxxx beebe@xxxxxxxxxxxx -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe -
-------------------------------------------------------------------------------