Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Revised Motion 26 decoration scheme



Vincent Lefevre wrote:
On 2011-07-20 00:16:44 -0500, Nate Hayes wrote:
Vincent, the scalability of high performance parallel computing on
modern computer architectures is almost entirely determined by
memory latencies and efficient usage of the cache hierarchy.
[...]

As long as you don't need a format for data interchange between two
different applications, this is not a problem. The compiler could
optimize to store data in some more optimized way than taking a
full 32-bit or 64-bit word for the decoration (e.g. by packing
decorations in a word).

It doesn't matter. In fact, now the problem might be worse, since you are
breaking the interval data into different pieces with different memory
alignments: there is more memory locations to be accessed and some of those
memory locations are now more likely not to be aligned.

So memory traffic on the bus increases, the cache is more heavily taxed, and
many of the moves may be unaligned (which as Ian points out are even more
detrimental to performance).


This would still be limited though, and the
language might allow hints given by the programmer (perhaps there
could be something in this direction in the P1788 standard), but
there aren't many solutions if you don't want to drop decorations
or make them "global" flags.

There's really no way around the detrimental effects of this problem
except:

   a) for modern hardware architectures to radically change
   b) to make sure our Level 4 datatypes fit nicely into power-of-two
number of bits with no wasted storage

I'm skeptical a) will happen anytime soon.

I suppose that concerning (b), you would want the size of a FP datum
to be reduced, so that it is possible to store the decoration in the
holes?

No.

Bare intervals and bare decorations can fit in those holes using standard
IEEE 754 bit patterns.

That would require hardware changes and new formats that are
no longer IEEE-754 basic formats... This won't happen anytime soon
either, IMHO.

There's c: compiler optimizations and user compilation directives.

As noted above, this doesn't escape the problem: at run-time the system
performance will still be constrained by the memory bus and chache
hierarchies.

Nate