Exact vs correctly-rounded dot product
(I assume voting is over...)
I am by now well aware of the clever techniques that permit exact sums
to be evaluated without a Kulisch accumulator (real or emulated). They
all use an intermediate structure that keeps track of the error relative
to the principal sum being accumulated.
For faithful rounding it is actually sufficient to maintain a bound on
this error -- but only after a first pass has seen all arguments, right?
For correct rounding it is however necessary to know whether that error
is exactly zero.
Summation methods for streams (i.e. with no means to look again at operands
that have already been processed) must however keep track of the exact error,
whether in a Kulisch accumulator or in a more space-efficient (in the common
cases) list of decreasing error terms. I just see no way around that.
If we only look at complete dot products, i.e. where the entire set of
operands is known ahead of time, there may be shortcuts that produce the
correctly-rounded result without the machinery that would enable an exact
sum (or dot product) to be returned -- but I sm not fully convinced.
Note that implementations for large sums would probably like to be able
to partition the job, possibly parallelizing it. As I mentioned earlier,
the CompleteMultiplyAccumulate() primitive that was proposed in Motion 9
is just what is needed for this. The only thing that is not needed here
is a complete description of the representation -- i.e. the question of
whether there should be an interchange format for a Kulisch accumulator.
The PRIMARY difference between an exact sum and a correctly-rounded sum
is in the specification of the result. An exact sum (or dot product)
has to define a new datum type for (effectively) a Kulisch accumulator,
whereas the internal types can remain opaque for a correctly-rounded sum
implementation. I still believe however that the internal machinery
needed is essentially the same in both cases, so why not take the extra
little step of making the exact sum available as well?
Michel.
P.S. There is another argument with regard to implementability: the
ability to combine implementations from different sources, e.g.
computational-primitive vendors, library vendors, package vendors
etc. Smooth interoperability requires defined intermediate types,
and Kulisch accumulators (at least in abstract form) seem a useful
defined type to me for the range of issues that P1788 is addressing.
(So it Arnold's triplex format, btw.)
---Sent: 2009-11-18 15:46:13 UTC