[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Clause 10, Expression Evaluation



David Hough 754R work <754r@xxxxxxxxxxx> wrote:

There are actually two possible 
meanings for non-deterministic that are relevant here:
reproducible on one platform but not reproducible across platforms 
(call this non-reproducible), 
and not reproducible on the same platform with the same executable
(call this non-deterministic - usually occurs in conjunction with
parallelism).     Both forms offer performance
advantages and debugging challenges.

I for one can't imagine disputing these statements as they are consistent
with my experience.    The argument I propose is one analogous to
the following:

Well, I dispute them, based on my experience.  But perhaps I have a
wider range of it than you do.  Let's ignore the many and varied other
possible meanings of non-deterministic, as they aren't critical to
this debate, and simply address the flaws in what you have said.

It is COMMON for the results produced by a 'deterministic' program
to vary with the system configuration and environment.  For example,
running the same executable on the same data in the same system
(configuration and all) can give different results according to
whether you have created a new, unreferenced file in your home
directory.  Been there - seen that.

What that does is to affect the environment, which changes things
like addresses, which can then change the results of library calls
(e.g. whether getenv returns the same address twice or not), which
are then visible to a LEGAL program.  Nobody has ever succeeded in
producing a clean boundary between 'reproducible' facilities and
others, while still leaving enough function in the former to
enable major applications to be written in it.

This is the checkpoint-restart problem.  In 40+ years of being in
IT, I have HEARD OF precisely ONE system that delivered it for ONE
release (it broke it in the next) for arbitrary programs.  I have
had the cynical pleasure of telling senior people (up to VP) of
Tier 1 companies that they weren't going to deliver on their
promises, because they couldn't do it, and being proved right!

Why on earth do you think that IEEE 754R is going to change that?

Unfortunately there are interactions between algorithm design and 
decisions about parallelism and non-determinism so these steps aren't as
orthogonal as one would like.

And those interactions are imposed by the laws of mathematics, so
you can't eliminate them, however hard you try.  Live with it.

"Better programming languages" and "better hardware architectures" to match
them are part of the long-term solution, but in the meantime, 754R can do
its little bit to encourage languages to encourage compilers to make things
easier on a shorter time scale.

Yes, it could.  I can see very little evidence that it is likely to
do so.  From where it has gone so far, and even more where it is
going, the changes are going to make things harder.

Please, for the Nth time, move TOWARDS the languages, not AWAY from
them.  Their position is constrained by real requirements, and they
CAN'T move far towards where you want them to do so.

Or, if you claim that is false, provide some evidence.  Produce a
strawman specification of how to add reproducibility to Fortran, C or
C++ so that we can shoot it down in flames.  And, make no mistake,
that IS what will happen to it.

You can't specify reproducibility for an arbitrary program and I
don't believe that you can specify a usable, describable subset that
is reproducible.  Please prove me wrong.



"Malcolm Cohen" <malcolm@xxxxxxxxxxx> wrote:

(a) At this particular point in time,
     it's true that binary FP is always faster.  MUCH faster.

     It is unclear that this situation will always hold, given other
     constraints on program speed like the "memory wall" et cetera.

     Finally on this point, even if binary FP is faster than decimal FP,
     when decimal hardware is being used the difference might well be
     tiny (viz disappear into the noise).  Maybe, maybe not, but since
     decimal hardware is not available much now, I don't think we can
     reliably predict its future.

Actually, no.  It is pretty clear that it can be implemented to run
at much the same speed, today, and the overheads of doing so in a
heavyweight design (like IBM's POWER) is small.  The killer is that
it needs a LOT more gates (and hence power) for a minimal design,
which is Bad News for the embedded people and the theoretical true
massively parallel designs (1,000,000 way and up).

(b) It is true that binary FP is more accurate for computations with
     binary FP numbers.  It is certainly untrue that it is accurate
     for computations with decimal FP numbers.  Decimal FP can compute
     1/10 with perfect accuracy, binary FP cannot.

Agreed.  It is unclear, however, if there is ANY real requirement for
decimal floating-point.  Yes, there IS for decimal fixed-point but, as
we have discussed ad tedium, it is unclear how much decimal floating-
point will help over integers when emulating that.

A program that evolves toward high performance

This rarely happens.  People usually (not always) know whether they are
writing a throwaway program that needs only minimal performance, or
whether they are embarking on computational fluid dynamics.

Heaven help me, it does :-(  And, like a feudal legal system evolving
towards a technological democracy, the result is an ungodly mess that
fails to deliver.

might start out being coded in reproducible deterministic decimal  
sequential fashion,
and when the basic logic is correct in that scheme, the elements
of parallelism, binary arithmetic, non-reproducible arithmetic,
and non-deterministic arithmetic get added back in, if and when
the economic benefits justify the increasing coding and debugging  
efforts.

This doesn't seem to match what I've seen.

Many useful optimisations change the evaluation order.  Tuning a blocked
MATMUL (matrix multiplication) changes the results (using a blocked one
instead of an unblocked one changes the results!).

Everyone running a program that takes nontrivial time turns on optimisation
at some level.  Thus immediately entering the "non-reproducible" world
(if they were not there already).

Precisely.  And, as I have posted out, the problems are fundamental
(i.e. in the basic mathematics).  As I tell every user who asks, you
just CAN'T convert a serial program to more than trivial parallelism
by incremental changes.  You HAVE to go back and redesign.

At the higher levels one is often debugging the hardware,
operating system, compilers, and libraries,

Well yes, but almost no-one does that.  (Those who think they spot  
hardware, operating system, compiler or library bugs are usually
wrong.)

I wish :-(  But perhaps I count as "almost no-one"?

Yes, we need better languages ... but "better" is not a single variable,
nor are the aspects of betterness independent.  That's one of the
reasons we have so many different languages.  Matching the hardware
better is a net loss for many measures of betterness.

Absolutely!

but in the meantime, 754R can do its little bit to encourage languages
to encourage compilers to make things easier on a shorter time scale.

If you don't mind me saying so, that's completely out of order.
It's none of 754's business what languages choose to make important.

It's particularly galling since 754R even now does not specify
reproducible single operations - the obvious example being the options
for underflow.

So 754R doesn't have to specify reproducibility but it thinks it can
beat up the languages people on that topic?  That's just breathtaking.

I don't think that's a fair example - options are permissible, but one
has to consider whether the standard is well-specified for all possible
combinations of them.

More seriously, I assume that you agree with my demand that proponents
of reproducibility should produce a draft specification of what should
be put into the Fortran, C or C++ standards, so that we can use it for
target practice?  I can't do it, without causing ten times more harm
than good, and I don't believe that they can.



keith bierman <khbkhb@xxxxxxxxx> wrote:

The large body of COBOL with it's longstanding usage suggests to me  
that you are overly pessimistic about decimal's acceptance in the  
wider community of people who compute things ;> It all depends on the  
things being computed.

Cobol programmers are primarily interested in decimal fixed-point.
It has very different semantics from decimal floating-point, in ways
that are very visible to a Cobol program.

...Everyone running a program that takes nontrivial time turns on  
optimisation at some level.  Thus immediately entering the "non- 
reproducible" world
(if they were not there already).

No doubt a switch or levels above and below the nondetermistic  
"barrier" are plausible.

Really?  Some evidence of that statement would be interesting.  My
assertion is that most such levels are indescribable (in the strict
mathematical sense of not requiring a solution to the Halting Problem
to categorise them).  The levels exist, right enough, but they are
beyond the ability of mere mortals to describe.



David Hough 754R work <754r@xxxxxxxxxxx> wrote :

But it's almost never economic to track down the blame, so a typical
course of action is to reduce the optimization level and blame the hardware
for not living up to its advertised performance.

The inherent non-reproducibility of floating-point arithmetic as it has 
been done historically has enabled all sorts of sins to be concealed,
and that's one of my motivations for pressing the issue as far as
it can be pressed in this revision.     At least if one requests 
reproducible results in appropriate circumstances and still gets varying
results, one knows that somebody else in the hardware-software chain is not
living up to his end of the bargain.

Sigh.  Until and unless you can specify what "appropriate circumstances"
are, that is PURELY a trap for the unwary.  You clearly have NO IDEA
how many users TODAY claim that different results imply that there
MUST be a bug in the hardware/system/compiler/library.  We need to
encourage that delusion like we need a hole in the head.

Your objective is INTRINSICALLY undesirable unless you can deliver on
such a specification, and I am asserting that you cannot.  That is one
of the main reasons that I am wading into this unproductive debate.

Some of the opposition to a reproducible-results option might be attributable
to a fear of what bugs might be turned up that must be dealt with by vendors
rather than users, and perhaps also to fear that
(like decimal) it might become a popular paradigm over time, to which vendors
might be forced to submit by weight of benchmarks (while still being required
to provide peak performance as well).

Heck.  I thought that you were less of a tyro than THAT.  Yes, the
demand for reproducibility in such things is there, today, and it
is ALREADY a popular paradigm :-(  Have you had NO experience with
vendors refusing to fix even the clearest bugs because it might change
the results produced by a benchmark?

The first high-profile case I remember was IBM's ghastly Fortran
library for the System/360, but I have fairly recently beaten my head
against many Tier 1 vendors on this issue.

The desire for reproducibility ALREADY causes MAJOR harm; I am not
a vendor opposing it because of fear of users reporting bugs, but
a user opposing it because it will encourage vendors to not fix
even the worst bugs.  And for other reasons ....

It's none of 754's business what languages choose to make important.

It is 754R's business to specify what's needed for numerical computing.
It is up to each language to decide how much they are going to support
that goal.

Then it should damn well try to learn what the requirements and
constraints are before doing so.  From this debate, and several other
ones, I have my doubts that most of its proponents have much of a
clue about numerical computing, either in theory or practice.  I know
for certain that SC22WG5 does.

Please prove me wrong.  I almost always publicly admit that I am, when
it is demonstrated unequivocally.  Let's see that strawman of changes
to one of the Fortran, C or C++ standards.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1@xxxxxxxxx
Tel.:  +44 1223 334761    Fax:  +44 1223 334679

754 | revision | FAQ | references | list archive