Re: Question on performance
Am 09.10.2010 11:36, schrieb Paul Zimmermann:
Dear Arnold,
I have a question about performance on current 754-conforming hardware:
Suppose I write code consisting only of 754 floating-point operations
and calls to simple customized additional functions such as
nan2zero(x), which returns 0 if isnan(x), and x otherwise.
Will the code generated by a standard, good compiler run --
(i) essentially as efficient as without these function calls?
(ii) essentially as efficient as if it contained explicit case
distinctions?
(iii) intermediate but still efficient?
(iv) intermediate but still inefficient?
In case of (ii) or (iv), could a special purpose compiler do
significantly better?
Arnold Neumaier
here is a test case. On a Core-2 under Fedora Core 12 with gcc 4.4.4 and
glibc 2.11.2, I get a slowdown by a factor of 3:
tarte% gcc -O3 -g neumaier.c -lm ; time ./a.out
s=2.7182818284590455e+00
2.129u 0.000s 0:02.15 98.6% 0+0k 0+0io 0pf+0w
tarte% gcc -DTEST -O3 -g neumaier.c -lm ; time ./a.out
s=2.7182818284590455e+00
6.368u 0.000s 0:06.39 99.5% 0+0k 0+0io 0pf+0w
I let you decide whether it corresponds to (i), (ii), (iii) or (iv).
Do other compilers give better results?
The problem in your example is not the inlining of the function
nan2zero(). The problem is the function call isnan().
If you replace this function by a "x != x" then you will get a better
performance which is close to the runtime without a call of nan2zero().
Modified code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#if defined(INLINE)
inline
#endif
double nan2zero (double x)
{
#if defined (ISNAN)
return isnan (x) ? 0.0 : x;
#else
return x != x ? 0.0 : x;
#endif
}
int
main()
{
double t, s, i, N = 1000000000.0;
/* compute s = sum(1/k!, k=0..N) */
for (t = 1.0, s = t, i = 1; i <= N; i++)
{
t /= i;
#if defined (TEST)
s += nan2zero (t);
#else
s += t;
#endif
}
printf ("s=%.16e\n", s);
}
Runtime on a
Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz
under Ubuntu 10.04 LTS with g++ 4.4.3
no nan2zero:
$ g++ -O2 inlineTest.cpp
$ time ./a.out
s=2.7182818284590455e+00
real 0m2.131s
user 0m2.120s
sys 0m0.020s
nan2zero with "x != x":
$ g++ -O2 -DTEST inlineTest.cpp
$ time ./a.out
s=2.7182818284590455e+00
real 0m2.235s
user 0m2.240s
sys 0m0.000s
nan2zero with "x != x" and inlining:
$ g++ -O2 -DTEST -DINLINE inlineTest.cpp
$ time ./a.out
s=2.7182818284590455e+00
real 0m2.231s
user 0m2.230s
sys 0m0.000s
nan2zero with isnan:
$ g++ -O2 -DTEST -DISNAN inlineTest.cpp
$ time ./a.out
s=2.7182818284590455e+00
real 0m7.829s
user 0m7.830s
sys 0m0.000s
nan2zero with isnan and inlining:
$ g++ -O2 -DTEST -DINLINE -DISNAN inlineTest.cpp
$ time ./a.out
s=2.7182818284590455e+00
real 0m6.347s
user 0m6.350s
sys 0m0.000s
Best regards
Marco
--
o Marco Nehmeier, Lehrstuhl fuer Informatik II
/ \ Universitaet Wuerzburg, Am Hubland, D-97074 Wuerzburg
InfoII o Tel.: +49 931 / 31 88684
/ \ Uni E-Mail: nehmeier@xxxxxxxxxxxxxxxxxxxxxxxxxxx
o o Wuerzburg