Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Revised Motion 26 decoration scheme



Vincent Lefevre wrote:
On 2011-07-19 06:25:01 +0100, John Pryce wrote:
On 6 Jul 2011, at 01:09, Ian McIntosh wrote:
> An analogy may help. Picture a 16 car long subway station platform...

It persuaded me we should get used to the idea that decorating a
16-byte interval should make it 32 bytes wide, not 17, until
architectures change drastically. I recall years ago we got used to
the fact that Fortran typically stored a Boolean in a 32-bit word,
which really offended my cheeseparing mathematical mind...

32 bytes only when *explicitly* stored into memory. Otherwise
optimizing compilers or interpreters could keep intermediate
results in registers or use a more efficient way for temporary
storage.

In a language without pointers or other ways to retrieve the
encoding in the memory, storage could also be done efficiently.
Otherwise the encoding matters mostly for I/O and data shared
between applications. But if most of the time is taken by raw
computations, decorations won't probably have much influence
on the performances due to this storage problem.

Vincent, the scalability of high performance parallel computing on modern
computer architectures is almost entirely determined by memory latencies and
efficient usage of the cache hierarchy.

This is because it often takes an order of magnitude longer to move data
over a memory bus (and through the various levels of cache) to the processor
than it actually takes to perform a single instruction on the processor (for
data already in a register).

Since multi-core processors often share a memory bus, this is a sequential
operation... meaning that *all* such processing cores may be stalled during
such memory moves. The result can be a lot of idle processors sitting around
doing absolutely no work.

Even newer 64-bit processors have a relatively small number of registers. So
unless the data for your entire computation can fit into those registers,
you're automatically going to be at the mercy of these performance barriers
to some extent.

There's really no way around the detrimental effects of this problem except:

   a) for modern hardware architectures to radically change
   b) to make sure our Level 4 datatypes fit nicely into power-of-two
number of bits with no wasted storage

I'm skeptical a) will happen anytime soon.

Nate