Thread Links | Date Links | ||||
---|---|---|---|---|---|
Thread Prev | Thread Next | Thread Index | Date Prev | Date Next | Date Index |
Vincent Lefevre wrote:
On 2011-07-19 06:25:01 +0100, John Pryce wrote:On 6 Jul 2011, at 01:09, Ian McIntosh wrote: > An analogy may help. Picture a 16 car long subway station platform... It persuaded me we should get used to the idea that decorating a 16-byte interval should make it 32 bytes wide, not 17, until architectures change drastically. I recall years ago we got used to the fact that Fortran typically stored a Boolean in a 32-bit word, which really offended my cheeseparing mathematical mind...32 bytes only when *explicitly* stored into memory. Otherwise optimizing compilers or interpreters could keep intermediate results in registers or use a more efficient way for temporary storage. In a language without pointers or other ways to retrieve the encoding in the memory, storage could also be done efficiently. Otherwise the encoding matters mostly for I/O and data shared between applications. But if most of the time is taken by raw computations, decorations won't probably have much influence on the performances due to this storage problem.
Vincent, the scalability of high performance parallel computing on modern computer architectures is almost entirely determined by memory latencies and efficient usage of the cache hierarchy. This is because it often takes an order of magnitude longer to move data over a memory bus (and through the various levels of cache) to the processor than it actually takes to perform a single instruction on the processor (for data already in a register). Since multi-core processors often share a memory bus, this is a sequential operation... meaning that *all* such processing cores may be stalled during such memory moves. The result can be a lot of idle processors sitting around doing absolutely no work. Even newer 64-bit processors have a relatively small number of registers. So unless the data for your entire computation can fit into those registers, you're automatically going to be at the mercy of these performance barriers to some extent. There's really no way around the detrimental effects of this problem except: a) for modern hardware architectures to radically change b) to make sure our Level 4 datatypes fit nicely into power-of-two number of bits with no wasted storage I'm skeptical a) will happen anytime soon. Nate