Re: [P1619-2] another P1619.2 question: the EME2 mix function
Thanks, Hal, for the ideas, although I hope to find something still better.
1. Cutting the wide block to smaller sub blocks to be processed in
parallel could speed things up, but the main advantage of
as-wide-as-possible block encryption is the implicit data
authentication based on its inherent redundancy. This way less
plaintext is randomized at an illicit ciphertext change, so we trade
speed for security.
2. Computing L*x^64, L*x^128, L*x^196 and then shift-XOR for other
powers does not seem to help much. L*x^128 takes just as much time to
compute directly than to compute it from L*x^64. Unless, we use
something like Karatsuba multiplication, which gives only a little
speedup, but it is much more complex.
On Wed, Mar 24, 2010 at 11:40 AM, Hal Finney <hal.finney@xxxxxxxxx> wrote:
> On Tue, Mar 23, 2010 at 10:07 AM, Laszlo Hars <laszlo.hars@xxxxxxxxxxx> wrote:
>> 5. EME2 is too slow for sequential implementations in modern disk drives,
>> where data is streamed at 12 Gb/s, or more. However, table B.2.1 of the
>> Standard is not applicable for parallel implementations; in Galois fields
>> multiplication with a power of x is not a shift-XOR operation, if products
>> with smaller powers are not available. The user would need some guidance how
>> complex is a 128x128 bit Galois multiplication in this special setting. Keep
>> in mind that typical applications work on 4KB sectors, so one has to
>> multiply with up to x^255, which is not of any simple structure when taken
>> modulo of the fixed 128 degree polynomial. In HW implementations large
>> lookup tables are expensive.
> One thing to look into is that you could still use 512 bytes as the
> EME2 block size even though the hardware sectors are 4096 bytes. Then
> you could have more parallelism and perhaps could cycle a LFSR fewer
> times to generate the multiplications by x powers.
> Another possibility along these lines would be to do a full Galois
> multiplication to multiply by x^128, then to do the LFSR 128 times, so
> it takes half as long. Or extend this idea to multiply by x^64, x^128
> and x^192, then shift 64 times, and so on.
> No doubt as Matt says, Doug Whiting would be a good person to ask.