[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
Re: Clause 10, Expression Evaluation
Thorsten Siebenborn <7_born@xxxxxx> wrote:
Malcolm Cohen schrieb:
Thorsten Siebenborn <7_born@xxxxxx> wrote:
<snip>
To compare speed, it is necessary to benchmark programs written
in the languages to compare. And there is a saying: Lies,
damn lies, benchmarks.
Which processor is used ? It is a single processor or multiple
processors ? Which compiler is used and how good are the implementors
of the compiler ? How proficient is the programmer in the language
and how good in knowing the details of optimization ? How much memory
is needed for the instructions (Can the program be hold in cache
memory
and therefore be executed much faster ?).
All very true and then ignored in the following.
The easiest claim would be: All "Volvo" languages are inferior in
terms
of speed against "Porsche" languages. But:
http://shootout.alioth.debian.org/gp4/benchmark.php?
test=all&lang=g95&lang2=ocaml
Fortran G95 is in this test slower than OCaml, the deterministic
language. *Much* slower. OCaml is in the top ranking together with
gcc.
Does it prove that OCaml is faster than FORTRAN in general ? No.
And I am not interested in exchanging the typical bickering of
how-fast-my-favored-language-is.
But it also proves that the claim that determisitic languages
are always slower is not true.
Strawman. That is NOT the claim of interest to Malcolm, or I hope,
this standards committee. The claim is that an optimal implementation
of non-deterministic FLOATING POINT semantics will provide
significant performance gains on APPROPRIATE code, compared with an
optimal implementation of deterministic floating point. Several
people, including Malcolm, have provided detailed arguments as to why
this should be true. They have also argued that experience has not
only validated this argument,but indicated that the gains are often
significant. In particular I suspect the simple test of turning
determinism on an off for a given compiler processing a given code
would confirm this, for appropriate code (any citations for this?).
The relative size of this gain wold depend on the detailed
characteristics of the code and the processor it is run on. The
perceived importance of this gain would depend on the user.
Absent this compiler switch, evaluating the gain requires the sort of
detailed analysis you suggest earlier, to determine how optimal the
implementation is, how useful the application is for evaluating that
question, and how optimal the code is. My expectation is that g95
code was significantly further from optimal than ocaml because its
implementation is much younger, largely the work of one person while
ocaml was developed as a small team, that one person worked part time
on this task for much of the development while he was primarily
employed as a numericist, and Fortran as a language has more
historical baggage to implement before optimization becomes a
priority. A better test would be a more mature compiler such as is
available from NAG, Intel, Sun, Absoft, etc. As Fortran's floating
point semantics are similar to that of C and C++, a look at the
performance of codes written in those languages would also be useful.
As to the applicability of the code, in the benchmark set you cite,
only six codes use significant floating point: fasta, madelbrot, n-
body, partial-sums, recursive, and spectral-norm. Looking at the
Fortran code
fasta to me looks like an inefficient code for the task and, partly
as a result, offers little opportunity for non-deterministic
optimization. I think the current code run-time is dominated by
procedure call overhead. I would have generated all the random
numbers in one procedure call, and I think looking at the C code
should yield other optimizations.
madlebrot to me looks like its performance is dominated by bit and
byte manipulation.
n-body is an example of the type of floating point calculation,
complicated one line expressions, where non-deterministic semantics
offers potential for significant performance gains. Guess which code
g95 significantly out performs ocaml on. I would expect optimal code
+implementations to meet or exceed C's performance here.
partial-sums I suspect is dominated by transcendental function calls,
and benefits little by non-determinism.
recursive is primarily intended to test the efficiency of
implementation of recursive procedure calls, and to do this runs
several different functions, only one of which uses significant
floating point, and that one is a simple function where non-
deterministic behavior is likely to provide little gain.
spectral-norm is dominated by the function call A. Its relative
performance depends almost entirely on the extent of inlining by the
optimizer, and benefits little if at all from non-deterministic
semantics.
Looking at the above, the one benchmark in the set that tests the
aspect of performance that I believe this thread is concerned with, n-
body, agrees with Malcolm's conclusions. Note also that C and C++ show
more than a factor of two performance gain over ocaml on this
benchmark. Perhaps others know the floating point semantics of the
implementations some of the other high performing languages on this
code, e.g., Oberon-2, Eiffel, JAVA 6-server etc. (Note it can be
difficult to impose deterministic semantics on floating point code,
particularly if C/C++ is used as an intermediate language, or an X86
is the machine processor, even if the language nominally requires it.)
Ok,you may rant that you are talking about number crunching with
parallel processors and supercomputers where only FORTRAN is
an option. Well, I will answer to that later.
Ad hominem. I would not characterize Malcolm's response as ranting,
and, even if it were, it is not germane to the issue.
Error of fact. Malcolm knows that more than Fortran is available for
floating point processing as an option for those systems. Typically
C, C++, and Ada are available, if only as part of the gcc system.
Error of fact. That performance gains are only for those
systems. Memory hierarchies in todays desktops have affects on
performance similar to that of the hierarchies of the supercomputers
of ten to twenty years ago.
Strawman. I read this (and the following) as implying that only new
code is of interest to this committee. The committee should be
concerned about the effects of running old code on processors with
the 754r functionality.
And I added that these specifics Volvos are able
to use tools which are extraordinarily useful.
Let's not stretch analogies too far. Volvos don't "use tools".
I didn't come up with this particular image.
Malcolm started his claim with ">>Furthermore<<,
it is not useful".
Obviously I didn't start anything with "Furthermore".
Reread your statement.
When someone challenges a quote chances are he has it right. Your
response was actually posted as a response to a message by Nick
MacLaren who did use the word "furthermore" followed by something
that (roughly) implies "it is not useful". Malcolm Cohen did not.
Arguing with two people in one message can be confusing, even for the
writer. Note that eliding Nick's arguments as to why it not useful is
an over simplification.
So the inherent inability of FORTRAN
You what?
??
to come up with a strict evaluation order is apparently not the
reason he came up with this image.
The reference to "inability" is an error of fact. Fortran could decide
on an evaluation order, without changing the "defined" semantics of
existing code, but has chosen not to for performance
reasons. Similarly a Fortran implementation could impose defined
semantics, and some provide that as an option. Further some
implementations of languages with deterministic floating point
semantics have the option of compiling with limited forms of non-
determinism.
For high performance, requiring a strict evaluation order is
counter-productive! I don't see anything difficult to understand
there.
Fine that I have understood it correctly.
but stating it as an "inability" rather than a "choice" was incorrect.
Your statement is effectively:
"I am a compiler writer. If I am enforced to slavishly imitate
the deterministic behaviour of my source code during run time,
I cannot write *really* fast programs. All other languages who
show this deterministic behaviour must be slow; there is no way to
bypass it. So I can safely compare determistic languages with
Volvos and my language with a Porsche."
Beware of putting words in other peoples mouth. You are not a mind
reader. I don't think Malcolm has argued from the authority of being
a compiler writer, though he is one. Instead he has argued on
technical merits using knowledge he has gained partly from being a
compiler writer. Certainly "I cannot write *really* fast programs"
would not be Malcolm's thought here. Closer might be "Many of my
users' programs will not be as fast as they would like given that
they do not care about the detailed evaluation order". He could also
note that parenthesization allows knowledgeable users to specify a
deterministic order. He might wish that Fortran provided an
alternative bracketization for expressions that did not impose an
evaluation order, e.g. so that compilers could expand the equivalent
of a*(b+c) to a*b+a*c. I sometimes think that the detailed results of
floating point arithmetic are so difficult to predict, that if you
are highly concerned about a deterministic result, you should
probably use a different data type, e.g., scaled integers, or, if you
are concerned about over/underflow and integers do not provide enough
dynamic range, scale the floating point quantities so that typical
values are naturally close to unity.
Well, it may be true for *imperative* languages like FORTRAN, but
it is perhaps not true for *functional* languages like OCaml
or Sisal. Sisal is a determistic language which is
capable to run as fast as optimized FORTRAN on supercomputers.
James R. McGraw has written an interesting paper named
"Parallel Functional Programming in Sisal: Fictions, Facts
and Future", Advanced Workshop: Programming Tools for Parallel
Machines, July 1993.
In this he ran Sisal solutions against FORTRAN ones on a Cray
YMP/864. While in single processor mode FORTRAN was faster, it
looses dramatically against Sisal during multi-processor mode.
He also explains what he is doing in the first-place so that
Sisal can achieve its speed.
Note there was a better known, slightly latter article in
Communications of the ACM. That article was discussed several times
when Malcolm followed comp.lang.fortran in detail. Are you certain
that the code generated by Sisal consistently imposed deterministic
floating point semantics? FWIW during most of Sisal's active period
it relied on C code compiled using gcc, gcc was largely a K&R
compiler, and K&R C had looser floating point semantics than Fortran
and C89. Note also that there is a reason funding for Sisal stopped
shortly after that paper was written. For whatever reason, people
(even at LLNL) were not adopting Sisal instead of Fortran, although
Sisal was available for several years before funding stopped. At that
time, those who left Fortran largely went to C and C++, both of which
largely have the same non-deterministic semantics for floating point
code. FWIW when there are many languages claiming to be better, most
people look only at the better known ones.
I can send the paper on interest.
Are you the copyright holder, or are you certain of the distribution
restrictions imposed by the copyright holder?
Regards,
Thorsten
--
William B. Clodius Space & Remote Sensing Sciences
Tech. Staff Member, ISR-2 Phone: (505) 665-9370
Mail Stop B244 FAX: (505) 665-4414
Email: wclodius@xxxxxxxx Group office: (505) 667-5776
LANL email review:
[ ] - Non-Technical Correspondence
[x] - Technical Correspondence
[ ] - DUSA:
[x] - ADC Review by: Clodius