Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Revised Motion 26 decoration scheme



Vincent Lefevre wrote:
On 2011-07-20 09:42:40 -0500, Nate Hayes wrote:
Vincent Lefevre wrote:
>As long as you don't need a format for data interchange between two
>different applications, this is not a problem. The compiler could
>optimize to store data in some more optimized way than taking a
>full 32-bit or 64-bit word for the decoration (e.g. by packing
>decorations in a word).

It doesn't matter. In fact, now the problem might be worse, since you are
breaking the interval data into different pieces with different memory
alignments: there is more memory locations to be accessed and some of
those
memory locations are now more likely not to be aligned.

So memory traffic on the bus increases, the cache is more heavily taxed,
and
many of the moves may be unaligned (which as Ian points out are even more
detrimental to performance).

That's the goal of the compiler to ensure that data are aligned. For
instance, if a 64-bit alignment is OK, then 8 decorated intervals
(I1,D1), ..., (I8,D8) could be stored in the following way:

I1 I2 ... I8 [D1 D2 ... D8]

where the 8 decorations are packed in a 64-bit word (4 would be
sufficient for a 32-bit alignment).

I understand.

But think about the problem now of "putting back together", say (I1,D1) and
(I7,D7) at the processor. You have to do the following:

   -- Load I1
   -- Load I7
   -- Load D1
   -- Load D7 (likely an unaligned memory access)

Thats a total of 4 memory moves, and one of them likely will be unaligned.



Of course, packing in such a way should be done with data locality
in mind. There may be compromises between the efficiency of packing
and the locality.

If the data is accessed in the same sequential pattern as the packing order,
this will be the best case. But it still requires extra memory moves for the
decorations.

If the data is accessed in any other pattern, then actual performance can be
worse, of course.



>I suppose that concerning (b), you would want the size of a FP datum
>to be reduced, so that it is possible to store the decoration in the
>holes?

No.

Bare intervals and bare decorations can fit in those holes using standard
IEEE 754 bit patterns.

Do you mean dropping decorations when an interval needs to be returned?

Yes, if it is safe to do so.

There is an algebraic structure for operations involving bare objects (see
the appropriate section in my Nov. 14 DRAFT paper, e.g.). Very briefly:
operations on bare intervals give the usual bare interval result, unless an
exception occurs and then a bare decoration is given as result instead. Once
a bare decoration appears in a computation, no subsequent operation in the
DET gives a bare interval result: the (worst) decoration always propagates
to the end (similar to how NaNs propagate in IEEE 754 computations).
Algebraically, it is closed.

I am NOT saying that users can't compute with decorated intervals, if they
wish. I have no objection if they do.

I'm only saying that in many cases always computing with decorated intervals
is a waste. In B&B, for example, the information provided by the decoration
portion of a decorated interval is not used if no exception occurs while
evaluating the DET; and likewise the information provided by the interval
portion of a decorated interval is not used if an exception *does* occur
while evaluating the DET.

So why should such an application be forced to suffer the performance
penalties and overheads incurred by computing with decorated intervals? In
such cases, the bare intervals and bare decorations can fit nicely into the
power-of-two memory holes using standard IEEE 754 bit patterns so optimal
performance can be achieved.

Nate

P.S. I note again that Motion 27 doesn't address arithmetic on bare objects.
This was planned for a future motion. I've only spent some time addressing
it now because it showed up in Motion 26.