Re: Revised Motion 26 decoration scheme
On 2011-07-20 16:45:06 -0500, Nate Hayes wrote:
> Vincent Lefevre wrote:
> >That's the goal of the compiler to ensure that data are aligned. For
> >instance, if a 64-bit alignment is OK, then 8 decorated intervals
> >(I1,D1), ..., (I8,D8) could be stored in the following way:
> >
> >I1 I2 ... I8 [D1 D2 ... D8]
> >
> >where the 8 decorations are packed in a 64-bit word (4 would be
> >sufficient for a 32-bit alignment).
>
> I understand.
>
> But think about the problem now of "putting back together", say (I1,D1) and
> (I7,D7) at the processor. You have to do the following:
>
> -- Load I1
> -- Load I7
> -- Load D1
> -- Load D7 (likely an unaligned memory access)
>
> Thats a total of 4 memory moves, and one of them likely will be unaligned.
Actually only 3 possibly slow ones, without unaligned memory accesses.
The compiler makes sure that I1 and I7 are aligned (since it decides
where to store data). Ditto concerning the word [D1 D2 ... D8] (I
recall I assumed a complete 64-bit machine, a similar reasoning can
be done for other machines with similar choices). Then, depending on
the processor:
* [D1 D2 ... D8] will be loaded with a load-word instruction. Then
individual bytes could be extracted with logical instructions.
AFAIK, this is how the Alpha worked and it was quite efficient
at doing that. If there are many variables and control structures
(if/then, loops), it may be difficult. However if the program has
regularity in its data and operations (e.g. linear algebra), this
can be a great solution.
* Decorations can be loaded individually with load-byte instructions
(by definition, bytes are always aligned, but how they are grouped
can be important for performance). Internally, this is probably
decomposed by the processor into a load-word and byte extraction.
One would seek to group decorations in a single L1 cache line, so
that a second decoration load would generally be very fast.
> If the data is accessed in the same sequential pattern as the
> packing order, this will be the best case. But it still requires
> extra memory moves for the decorations.
The optimal ratio (which can probably be reached if there are many
data to move with good regularity) is 17/16, thus close to 1. IMHO,
that's why an implementation would be important: it would help to
decide what to specify (knowing that things can still be improved
in the future).
> If the data is accessed in any other pattern, then actual
> performance can be worse, of course.
Yes, but I still think that packing would improve cache-related
performance. Note that the access pattern is never completely
random: the decoration and the associated interval will still
be quite close to each other. And the compiler may seek to make
sure that an interval and its decoration are in the same cache
line (this would degrade the packing ratio, but not much for
caches with large lines).
> >Do you mean dropping decorations when an interval needs to be returned?
>
> Yes, if it is safe to do so.
If it is safe (e.g. the compiler can detect that decorations are
not used), I agree. There should also be an interval datatype so
that the user explicitly says that decorations are not used.
Still, the standard should also specify decorated intervals, with
performance in mind even in this case.
> There is an algebraic structure for operations involving bare objects (see
> the appropriate section in my Nov. 14 DRAFT paper, e.g.). Very briefly:
> operations on bare intervals give the usual bare interval result, unless an
> exception occurs and then a bare decoration is given as result instead.
[...]
I partly disagree on that. There is a second possibility: if an
exception occurs, then one may still want a bare interval (just
because an exception can occur only because ranges will generally
become larger than the real ones due to rounding and variable
duplications). However the standard could support both. The second
possibility shouldn't be a problem as it can semantically be
decomposed into a normal decorated operation and a decorated
interval -> bare interval conversion.
--
Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)