Thread Links			Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Re: Question on performance

To: Arnold Neumaier <Arnold.Neumaier@xxxxxxxxxxxx>
Subject: Re: Question on performance
From: Lee Winter <lee.j.i.winter@xxxxxxxxx>
Date: Mon, 11 Oct 2010 19:33:52 +0400
Cc: Paul Zimmermann <Paul.Zimmermann@xxxxxxxx>, Marco Nehmeier <nehmeier@xxxxxxxxxxxxxxxxxxxxxxxxxxx>, stds-1788@xxxxxxxx, "Corliss, George" <george.corliss@xxxxxxxxxxxxx>, "Kearfott, R. Baker" <rbk@xxxxxxxxxxxxx>
Delivered-to: mhonarc@xxxxxxxxxxxxxxxx
In-reply-to: <4CB315FE.3060106@xxxxxxxxxxxx>
List-help: <http://listserv.ieee.org/cgi-bin/wa?LIST=STDS-1788>, <mailto:LISTSERV@LISTSERV.IEEE.ORG?body=INFO%20STDS-1788>
List-owner: <mailto:STDS-1788-request@LISTSERV.IEEE.ORG>
List-subscribe: <mailto:STDS-1788-subscribe-request@LISTSERV.IEEE.ORG>
List-unsubscribe: <mailto:STDS-1788-unsubscribe-request@LISTSERV.IEEE.ORG>
References: <E1P5GJ8-0003oR-Gp@xxxxxxxxxxxxxx> <4CB315FE.3060106@xxxxxxxxxxxx>
Sender: stds-1788@xxxxxxxx

On Mon, Oct 11, 2010 at 5:49 PM, Arnold Neumaier
<Arnold.Neumaier@xxxxxxxxxxxx> wrote:
> Paul Zimmermann wrote:
>>
>>> The problem in your example is not the inlining of the function
>>> nan2zero(). The problem is the function call isnan().
>>> If you replace this function by a "x != x" then you will get a better
>>> performance which is close to the runtime without a call of nan2zero().
>>
>> right, but Arnold's explicitly mentioned a call to isnan(x).
>
> Only for lack of knowledge that x!=x executes much faster.
> Can someone explain _why_ this is so?

Because it is a primitive on '754 hardware such as x87 (x87 is the
architecture of the floating point co-processor for the x86
instruction set)

However, many compilers do not implement this properly.  The issue is
that the results of the primitive comparison instruction are reflected
in the condition codes (a set of bits on which primitive conditional
instructions operate).  Some compilers, notably those from Micro~1,
fail to emit the conditional branch instruction that tests the
condition codes for the case you desire.  This omission makes their
code incrementally smaller and faster at the expense of correctness.

Also users have to understand that aggressive optimization may replace
the expression (x!=x) with (false) because the optimization process
uses a more primitive FP algebra than '754 requires.  Such optimizers
also tend to ignore the sign of an underflow because they treat all
underflows as zeros and the sign of zero is not useful in the
primitive FP algebra they use.

> Are the predicates isfinite(x) and/or isinf(x) also much slower than
> an arithmetic operation? Can they also be replaced by something
> costing next to nothing?

Yes, but this issue applies to the other end of the compiler.  Whereas
the generation of conditional branch instructions occurs in the back
end, the mapping of the functions you mentioned onto a primitive
operations may not be part of the relevant language standards and for
those languages with a standard that covers those functions, it is not
part of the mainstream of the language, so vendors tend to not
implement the efficient mapping.

For example, on x87 hardware there is a classify instruction that
evaluates the operand as one of { underflow(AKA signed zero),
sub/denormal, normal, overflow(AKA signed infinity), or NaN }.  The
single instruction is far more efficient than any alternative.  But
vendors often offer software classification based on analyzing the
S/E/M bitfields within an FP number.  That analysis is not
particularly complex, but it is essentially an if tree containing
bitwise and operations, which chain is much larger and very much
slower than the classify primitive.  In many cases the functions you
mentioned are software wrappers for the software classify function, so
they tend to be relatively inefficient.

A good compiler can eliminate some of the spurious if/else tests based
on the final if in the wrapper function.  However the improvement is
sensitive to the sequence of the tests within the classify function,
so if finitude is the last test in the chain it may be that none of
the preceding tests can be eliminated.

That is a worst case scenario because '754 finitude is simple so tends
to be early/high in the if tree.  The isnan() situation is a bit
harder to optimize down than the isinf() situation.

Thus the performance you asked about may vary a great deal across
vendors and a greater deal across optimization aggressiveness.

-- Lee

P.S.  This message may bounce off of the list server because, despite
my efforts to update my email address, it thinks I am not registered
as a member of the committee.

Follow-Ups:
- Re: Question on performance
  - From: Arnold Neumaier

References:
- Re: Question on performance
  - From: Paul Zimmermann
- Re: Question on performance
  - From: Arnold Neumaier

Prev by Date: Re: Motion P1788/0023.01:NoMidRad -- VOTING PERIOD BEGINS
Next by Date: Re: Motion P1788/0023.01:NoMidRad -- VOTING PERIOD BEGINS
Previous by thread: Re: Question on performance
Next by thread: Re: Question on performance
Index(es):
- Date
- Thread