Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Question on performance



Am 09.10.2010 11:36, schrieb Paul Zimmermann:
        Dear Arnold,

I have a question about performance on current 754-conforming hardware:

Suppose I write code consisting only of 754 floating-point operations
and calls to simple customized additional functions such as
      nan2zero(x), which returns 0 if isnan(x), and x otherwise.


Will the code generated by a standard, good compiler run --

(i) essentially as efficient as without these function calls?

(ii) essentially as efficient as if it contained explicit case
distinctions?

(iii) intermediate but still efficient?

(iv) intermediate but still inefficient?

In case of (ii) or (iv), could a special purpose compiler do
significantly better?

Arnold Neumaier

here is a test case. On a Core-2 under Fedora Core 12 with gcc 4.4.4 and
glibc 2.11.2, I get a slowdown by a factor of 3:

tarte% gcc -O3 -g neumaier.c -lm ; time ./a.out
s=2.7182818284590455e+00
2.129u 0.000s 0:02.15 98.6%     0+0k 0+0io 0pf+0w

tarte% gcc -DTEST -O3 -g neumaier.c -lm ; time ./a.out
s=2.7182818284590455e+00
6.368u 0.000s 0:06.39 99.5%     0+0k 0+0io 0pf+0w

I let you decide whether it corresponds to (i), (ii), (iii) or (iv).

Do other compilers give better results?

The problem in your example is not the inlining of the function nan2zero(). The problem is the function call isnan(). If you replace this function by a "x != x" then you will get a better performance which is close to the runtime without a call of nan2zero().


Modified code:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>


#if defined(INLINE)
inline
#endif
double nan2zero (double x)
{
#if defined (ISNAN)
  return isnan (x) ? 0.0 : x;
#else
  return x != x ? 0.0 : x;
#endif
}

int
main()
{
  double t, s, i, N = 1000000000.0;

  /* compute s = sum(1/k!, k=0..N) */
  for (t = 1.0, s = t, i = 1; i <= N; i++)
    {
      t /= i;
#if defined (TEST)
      s += nan2zero (t);
#else
      s += t;
#endif
    }
  printf ("s=%.16e\n", s);
}



Runtime on a
Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz
under  Ubuntu 10.04 LTS with g++ 4.4.3


no nan2zero:
$ g++ -O2 inlineTest.cpp
$ time ./a.out
s=2.7182818284590455e+00

real	0m2.131s
user	0m2.120s
sys	0m0.020s


nan2zero with "x != x":
$ g++ -O2 -DTEST inlineTest.cpp
$ time ./a.out
s=2.7182818284590455e+00

real	0m2.235s
user	0m2.240s
sys	0m0.000s


nan2zero with "x != x" and inlining:
$ g++ -O2 -DTEST -DINLINE inlineTest.cpp
$ time ./a.out
s=2.7182818284590455e+00

real	0m2.231s
user	0m2.230s
sys	0m0.000s

nan2zero with isnan:
$ g++ -O2 -DTEST -DISNAN inlineTest.cpp
$ time ./a.out
s=2.7182818284590455e+00

real	0m7.829s
user	0m7.830s
sys	0m0.000s

nan2zero with isnan and inlining:
$ g++ -O2 -DTEST -DINLINE -DISNAN inlineTest.cpp
$ time ./a.out
s=2.7182818284590455e+00

real	0m6.347s
user	0m6.350s
sys	0m0.000s


Best regards

Marco


--
     o           Marco Nehmeier, Lehrstuhl fuer Informatik II
    / \          Universitaet Wuerzburg, Am Hubland, D-97074 Wuerzburg
InfoII o         Tel.: +49 931 / 31 88684
  / \  Uni       E-Mail: nehmeier@xxxxxxxxxxxxxxxxxxxxxxxxxxx
 o   o Wuerzburg