[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Clause 10, Expression Evaluation



On Thu, 02 Aug 2007 08:59:32 +0900, David Hough 754R work <754r@xxxxxxxxxxx> wrote:
There are actually two possible
meanings for non-deterministic that are relevant here:
reproducible on one platform but not reproducible across platforms
(call this non-reproducible),
and not reproducible on the same platform with the same executable
(call this non-deterministic - usually occurs in conjunction with
parallelism).     Both forms offer performance
advantages and debugging challenges.

I for one can't imagine disputing these statements as they are consistent
with my experience.    The argument I propose is one analogous to
the following:

1. For programs whose performance is dominated by floating-point arithmetic,
binary floating-point arithmetic is always faster and more accurate
than decimal floating-point arithmetic, other things being equal.

No, that is not entirely true.

(a) At this particular point in time,
    it's true that binary FP is always faster.  MUCH faster.

    It is unclear that this situation will always hold, given other
    constraints on program speed like the "memory wall" et cetera.

    Finally on this point, even if binary FP is faster than decimal FP,
    when decimal hardware is being used the difference might well be
    tiny (viz disappear into the noise).  Maybe, maybe not, but since
    decimal hardware is not available much now, I don't think we can
    reliably predict its future.

(b) It is true that binary FP is more accurate for computations with
    binary FP numbers.  It is certainly untrue that it is accurate
    for computations with decimal FP numbers.  Decimal FP can compute
    1/10 with perfect accuracy, binary FP cannot.

    Where binary FP has any real advantage it is for computing with
    inexact numbers - i.e. rounding behaviour.  As far as I can see,
    that is pretty much it (other accuracy effects look small).

    So I'd have to reject "always ... more accurate" as being much too
    misleading.  Binary FP might be *usually* more accurate, but it
    depends on the data (and the program) - for some programs binary
    FP is never more accurate!

    But the accuracy improvement is really tiny.  The number of programs
    where the rounding improvements between decimal and binary would tip
    them over is, I think, small.  Furthermore, most (all?) such programs
    must already be living dangerously close to the edge of numerical
    failure and will probably need to go to the next-bigger FP type size
    the next time the data gets bigger anyway.

Conclusion: "always faster and more accurate" is a bit misleading.
  I think a better characterisation is that
    "Binary FP has some speed (or cost) advantage - how big this advantage
     will be in the near future is hard to predict - and some precision
     advantage (better rounding of imprecise results)."

BTW, I also think that if decimal FP is more than 10% slower in the eventual
hardware it will be very hard to get it accepted in a large part of the
"high performance" community.  If it's more than 50% slower (as it will be
for software implementations), it's going to be hard for it to be accepted
even in the "moderate performance" community. (The next revision of Fortran will have facilities for the program to specify decimal FP [or indeed binary
FP] instead of the default, but no-one is going to use them until hardware
is available or someone performs a software miracle.)

A program that evolves toward high performance

This rarely happens.  People usually (not always) know whether they are
writing a throwaway program that needs only minimal performance, or
whether they are embarking on computational fluid dynamics.

might start out being coded in reproducible deterministic decimal sequential fashion,
and when the basic logic is correct in that scheme, the elements
of parallelism, binary arithmetic, non-reproducible arithmetic,
and non-deterministic arithmetic get added back in, if and when
the economic benefits justify the increasing coding and debugging efforts.

This doesn't seem to match what I've seen.

Many useful optimisations change the evaluation order.  Tuning a blocked
MATMUL (matrix multiplication) changes the results (using a blocked one
instead of an unblocked one changes the results!).

Everyone running a program that takes nontrivial time turns on optimisation
at some level.  Thus immediately entering the "non-reproducible" world
(if they were not there already).

At the higher levels one is often debugging the hardware,
operating system, compilers, and libraries,

Well yes, but almost no-one does that. (Those who think they spot hardware,
operating system, compiler or library bugs are usually wrong.)

Unfortunately there are interactions between algorithm design and
decisions about parallelism and non-determinism so these steps aren't as
orthogonal as one would like.

In fact I think they are backwards.  You don't get a high-performance
program by throwing together a shell script with "eval" and then spotting
you should be using a blocked cyclic distribution of seven matrices over
your 200-cpu cluster.  (That's exaggeration BTW, for the sake of humour.)

Performance is like security - you can't add it to a program, you either
design it in in the first place or it's not there.

OK, that's a significant exaggeration, and (especially with smaller
programs) one *can* "evolve" them towards higher performance - but I think
it's generally true; in my experience evolving something towards higher
performance usually requires throwing it away and designing a higher
performance version from scratch!  (Others' experience might differ.)

"Better programming languages" and "better hardware architectures" to match them are part of the long-term solution,

Ah, the myth of "the language must match the hardware".  Admittedly I
usually see this from the other side (someone wants Fortran to match
*THIS* generation of hardware!) where it is more obviously a suboptimal
strategy in the long term, but I still think that designing hardware
to match existing language *and* compiler technology is not necessarily
the best way forward (IA-64 didn't take that route, but just because
that didn't work so well doesn't mean taking that route is good - IIRC
IA432 did take that route and was an unmitigated disaster).

Yes, we need better languages ... but "better" is not a single variable,
nor are the aspects of betterness independent.  That's one of the
reasons we have so many different languages.  Matching the hardware
better is a net loss for many measures of betterness.

Similarly, the hardware that optimally processes a vectorisable array
language (Fortran, APL?, doubtless many others) isn't necessarily the
same as the hardware that optimises pointer-chasing C programs.

(And the whole RISC movement was away from the "hardware matching the
software" movement that preceded it.)

but in the meantime, 754R can do its little bit to encourage languages
to encourage compilers to make things easier on a shorter time scale.

If you don't mind me saying so, that's completely out of order.
It's none of 754's business what languages choose to make important.

It's particularly galling since 754R even now does not specify
reproducible single operations - the obvious example being the options
for underflow.

So 754R doesn't have to specify reproducibility but it thinks it can
beat up the languages people on that topic?  That's just breathtaking.

754R's *primary* task must be to specify the basic FP operations without
options.  If 754R does not mandate reproducibility for its own ops, it
has no moral standing to demand that languages do so for whole expressions.
And should languages comply with the demands, it still doesn't work (i.e.
it's not actually reproducible in fact) while the options remain.

BTW, I agreed with most of the posting...

(And given recent experience maybe I should apologise in advance for any
imprecision or overstatements I might have inadvertantly made in my reply!)

Cheers,
--
Malcolm Cohen, Nihon Numerical Algorithms Group KK, Tokyo, Japan.

754 | revision | FAQ | references | list archive