Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

[P1619.2] Re: Bit and Byte Order Mapping in XTS



Brian,

Did I understand right, that the difference between the little and big
endian SW implementations is in loading and storing 4..8 bytes of data as
integers? If you handle individual bytes, it does not matter, right?

There is no significant difference in HW, so we ought to differentiate
based on what processor architecture is the most common. It is little
endian (PC, MAC), is not it? (8-bit processors don't care.)

Another issue is with what penalty can the Intel processor family read and
write big endian data. A single instruction can rearrange the bytes, so at
most one instruction is wasted. Can the latest Intel compatible processors
do it on-the-fly (without the single instruction penalty)? Multimedia
extensions might allow it, so there will be no slow down with big endian
coding. Is it also true with PowerPC and the most common other big endian
processors?

Laszlo



                                                                           
             Brian Gladman                                                 
             <brg@gladman.plus                                             
             .com>                                                      To 
             Sent by:                  Matt Ball <matt.ball@QUANTUM.COM>   
             stds-p1619@ieee.o                                          cc 
             rg                        David McGrew <mcgrew@cisco.com>,    
             No Phone Info             SISWG <stds-p1619@IEEE.ORG>, Colin  
             Available                 Sinclair <colin@heliontech.com>,    
                                       John Viega <viega@viega.org>        
                                                                   Subject 
             01/05/2007 06:18          [P1619.2]  Bit and Byte Order       
             AM                        Mapping in XTS                      
                                                                           
                                                                           
             Please respond to                                             
             brg@gladman.plus.                                             
                    com                                                    
                                                                           
                                                                           




Matt Ball wrote:

> Thank you David,
>
> I think the following list looks reasonable for the P1619.2 ABL4 mode.
>
> A couple comments (relating to text below):
>
> 3. (non-multiple of 512 bytes) I believe we decided to make it a
> requirement to support non-multiples of 16 bytes.
>
> 4. (Bit-order of Galois multiplier) We have developed two strong camps
> within the SISWG: Those who prefer multipliers that are optimal for
> either Big or Little Endian systems.  The GCM spec is better for Big
> Endian, and the current XTS (P1619) spec is better for Little Endian
> systems.  We will need to discuss this issue during the January 15th
> meeting.  I'll try to represent fairly the little-endian case for Doug
> Whiting and Brian Gladman, who I don't believe will be there.

Hi to All,

Now that Christmas is over, I thought that I should make a further input
to this debate about the bit and byte order used by XTS. I am doing so
because I fear that my position may be misinterpreted as being one of
preference for 'little endian' systems when this is ***NOT*** my
position at all.

My concern is to avoid a specification that is potentially bad when
implemented in software on ***BOTH*** big and little endian systems.

My Argument
===========

As far as I can see there are four possible mappings that we have to
consider (GF bit number = power of alpha):

(1) AES  0 1 2 3 4 5 6 7  ... 120 121 122 123 124 125 126 127     GCM
    GF   0 1 2 3 4 5 6 7  ... 120 121 122 123 124 125 126 127
    bit  7 6 5 4 3 2 1 0  (i.e. power of 2 in integer in first byte)

(2) AES  0     1   2   3   4   5   6   7  ... 124 125 126 127 Rogaway
    GF   127 126 124 124 123 122 121 120  ...   3   2   1   0     XEX
    bit  7     6   5   4   3   2   1   0

(3) AES  0 1 2 3 4 5 6 7  ... 120 121 122 123 124 125 126 127 Current
    GF   7 6 5 4 3 2 1 0  ... 127 126 125 124 123 122 121 120     XTS
    bit  7 6 5 4 3 2 1 0

(4) AES  0     1   2   3   4   5   6   7  ... 124 125 126 127
    GF   120 121 122 123 124 125 126 127  ...   4   5   6   7
    bit  7     6   5   4   3   2   1   0

If the input to AES is in a 128-bit register with the bits in AES order
then the mappings (1) and (2) are nice because the field 'multiply by x'
operation is a simple left or right shift of this complete register.
This is the typical hardware situation (where shifts are also cheap) so
its perhaps not surprising that (1) and (2) seem to be favoured by folk
who are involved in hardware implementation.

For software it is different since, in order to take advantage of fast
arithmetic on many machines, it is important that the fundamental
'multiply by x' field operation maps to a left shift, ***NOT*** a right
shift.

The reason for this is that a left shift can be done on machines by an
'add reg,reg' operation rather than a shift. Since many processor
designer's give high priority to fast and efficient addition operations
but low priority to shifts, the ability to use an addition instruction
as the fundamental field operation is often very important in speed
terms. Moreover some processors don't even provide shift instructions
and this can be a 'show stopper' if a right shift is needed.  In
contrast a left shift can be easily emulated with addition operations.

Hence those involved in software implementation will favour mappings in
which x^n for the field maps to 2^n for integers. For little endian
machines this is (3) and for big endian machines it is (2). I am happy
with ***EITHER*** of these.

So we have the pros and cons as:

(1) Pros: GCM compatible;
          Logically neat (GF_bit_no = AES_bit_no);
    Cons: Poor for software because bit order inversion is needed
          to map 'multiply by x' onto an addition operation;

(2) Pros: Matches order in the original XEX specification
          Good for software on big-endian systems
          Logically neat (GF_bit_no = 127 - AES_bit_no);
    Cons: less good for little endian systems since it requires byte
          order inversion - but not nearly as bad as (1) and (4);

(3) Pros: Good for software on little endian machines;
    Cons: Less good for big endian systems since it requires byte
          order inversion - but not nearly as bad as (1) and (4)
          Logically messy (AES <==> GF bit no function is messy)
          Not good for some hardware implementations because
          simple shifts can't be used on long registers mapped in
          AES bit order (see Colin Sinclair's input).

(4) Good for Nothing (that is nobody is arguing for this AFAIK).

My Position
===========

First I cannot overstress that my concern is NOT about gaining
implementation benefits on big or little endian systems. My concern is
to use a mapping that allows efficient software implementation on as
wider range of processors as possible by ensuring that the fundamental
'multiply by x' field operation maps directly to an integer addition
operation (equivalently the representation of unity in integers and
Galois fields is identical on either big or little endian processors).

Second, we should not pursue compatibility with GCM as an end in itself.

We need to evaluate the _real_ practical advantages that will be derived
from GCM/XTS compatibility and then see if these are sufficient to
outweigh the real advantages that derive from what we have now (mapping
3) or from mapping (2).

I have yet to see any convincing practical arguments in favour of
mapping (1) for XTS (or even for GCM but that's another matter).

 with best regards,

     Brian Gladman