[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Prolegomena to any future decimal discussion



Many people on this mailing list are interested in the relative merits
of the BID and DPD encodings for decimal numbers. In an attempt to
focus the discussion, I tentatively suggest that we try to separate
the BID - DPD difference into two more or less independent aspects:

  1. Encoding of digits

  2. Choice of base / digit size

I believe these really can be considered separately. I further claim
that most of the controversy and uncertainty lies (or should lie) in
the second issue. Choice of digit size is a perennial question when
considering binary-based emulation of decimal, and it's not clear that
there is a definitive "best" solution. It is exactly because of such
uncertainties that some of us are unhappy about enshrining just one
encoding in the standard at this stage, despite the superficial
appeal of doing so.

In order to isolate the two differences, consider also a third
encoding, which I'll call ZBD (Zuras Binary Declets). Some such
encoding was proposed by our chairman at one stage as a possible
compromise. But I hasten to add that I am not necessarily advocating
it, nor attempting to overturn the arrangement in the existing draft
with both BID and DPD, which I'm fairly happy with. I'm using ZBD as
an intermediate point of comparison only.

For this general discussion, I'll neglect a number of details such as
the position of the exponent and combination fields and indeed how the
combination field is encoded. Characterize the three encodings crudely
as:

 BID: coefficient is encoded as a single binary integer.

 ZBD: coefficient consists of base-1000 ("millennial") digits represented
      as 10-bit fields using the usual binary numeric encoding.

 DPD: coefficient consists of base-1000 digits represented as 10-bit
      "declets" using Mike Cowlishaw's modified Chen-Ho encoding
      (see "http://www2.hursley.ibm.com/decimal/DPDecimal.html";).

ZBD versus DPD
--------------

The key property of DPD is that there is a simple and quick logic
circuit to produce the three individual decimal digits for a declet,
and vice versa. This makes it attractive if the underlying arithmetic
(i) is being done in hardware, and (ii) is based on individual decimal
digits. Since, as I understand it, this is exactly how IBM's hardware
works or will work, DPD is obviously a perfect fit there. Conversion
between ZBD's millennial digits and individual decimal digits would
require more logic, though the net performance difference may be
small, especially since the conversions need only be borne once each
on loads and stores.

For a software implementation, DPD seems clearly inferior to ZBD. To
perform almost any arithmetic operation or inequality comparison on
DPD, one needs to decode the declets into either individual decimal
digits or millennial digits, and to re-encode any result in the case
of arithmetic operations. The nature of the encoding, which splices
bits in quite complex ways, means that table lookups, one per declet,
appear the most practical solution. This introduces a notable
bottleneck even on very simple operations. With ZBD, on the other
hand, several important operations such as aligned adds and
comparisons can be done efficiently on the millennial digit sequence
directly; cf. Michel Hack's earlier message about arithmetic on BCD.

Similar remarks apply to hardware based on millennial digits or pure
binary, but once again the difference between DPD and ZBD is not
dramatic because the conversion cost in hardware is not very large
and need apply only to loads and stores.

What about issues besides performance? DPD is significantly more
complex intellectually. (Can you remember all the details of the DPD
encoding, or sight-read DPD numbers?) And DPD retains no vestige of
lexicographic ordering, while ZBD with the appropriate field layout
has some potentially useful ordering properties.

In conclusion, DPD is a well-optimized choice for the performance of
one particular kind of hardware implementation, though ZBD is probably
not much worse. For performance of software and other kinds of
hardware implementation, DPD is worse than ZBD, quite significantly so
in the case of software. DPD has some other undesirable properties
too.

BID versus ZBD
--------------

In software implementations of decimal arithmetic, such as Mike
Cowlishaw's excellent "decNumber" library, numbers are typically
represented positionally using some power of 10 as a base, with the
individual digits encoded in binary. BCD is one extreme (base 10), a
BID-like coefficient with a single "digit" is the other extreme (say
base 10^7, base 10^16 or whatever), and ZBD is an intermediate choice
(base 10^3). Last time I looked at decNumber, the digit size was a
settable parameter with the "out of the box" value 10^4 on my
platform. (Perhaps this has changed recently to fit better with the
DPD encoding?)

A little thought about the performance implications of the choice of
base makes it clear that the decision isn't obvious or clear-cut. What
turns out fastest depends to a considerable extent on the expected
workload and the detailed microarchitectural characteristics of the
implementation machine. For example, BCD makes string conversion very
fast, while BID makes nonoverflowing multiplies very fast. There's a
slew of similar tradeoffs across a wide range of operations. So we
can't expect a definitive answer to the question "is BID or ZBD
faster?" The answer will always be: "well, it depends". All we can do
is look at current machines, current workloads, and the trends in both
that we perceive.

Of course, our choice of BID was based on just such an analysis, after
extensive experimentation and profiling. There is a lot to say about
this issue, but since by my own admission we can't expect a definitive
answer, I'll confine myself to a few relevant observations/claims
about workloads, machines and algorithms:

 o Many important operations in business apps, e.g. aligned adds and
   nonoverflowing multiplies, can be performed on BID using little
   more than simple binary arithmetic, giving very good performance
   (in hardware or in software).

 o If forced to optimize one class of operations at the expense of
   another, there is a good case for optimizing arithmetic at the
   expense of conversion to and from decimal digits, since arithmetic
   is likelier to be part of a performance-critical kernel. (See the
   earlier remarks by Vincent Lefe`vre.)

 o On most mainstream machines, binary arithmetic, including at least
   32x32 and increasingly 64x64 multiplication, is available, quite
   fast and well pipelined.

 o Although the fashionable pipeline depth may have receded a little
   since its apotheosis in the later Pentium 4 family, mispredicted
   branches are still rather expensive, so intricate flow control can
   be surprisingly costly.

 o Employing careful algorithm design (e.g. refined variants of
   reciprocal multiplication), many "inherently decimal" operations,
   such as decimal rounding or digit extraction, can be done quite
   well using binary multiplication.

Considerations such as these led us towards BID, and our experience
so far with software implementation of BID has been generally very
positive.

John.

754 | revision | FAQ | references | list archive