[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
Denormalized number handling
At the last meeting I was asked to summarize how denormals now called
subnormals are handled in hardware. For reference I talked with Marty
Schmookler and wrote this short description. Let me know if you have any
comments. Sorry if it is assumes detailed knowledge of hardware design,
maybe I'll write up a longer description sometime.
There are several solutions on how to handle denormalized numbers in
hardware. But most of the literature on this subject comes from U.S.
patents which also makes it difficult to implement without violating some
other companies claims.
First there are partial software or millicode solutions on how to handle
denormalized numbers. These solutions simply trap to millicode or software
when a denormalized input or result is detected. Other solutions assume
denormalized numbers are represented in the register file in a normalized
form using an extended exponent range. This satisfies most problems with
inputing denormalized numbers but still requires some solutions for
outputing late detected denormalized results. This solution is similar to
Intel's 80 bit format which utilizes an extended exponent range. These
techniques are used in a load/store type architecture where loads transform
denorms into a normalized internal format and stores are forced to
denormalize results. Still even in these designs there is a problem with
intermediate results which underflow. The results need to be denormalized
and rounded (unless you're in an 80 bit format) and they have the
additional problem that then the result is normalized to be placed back
into the register file.
There are techniques for handling denormalized numbers as inputs within
hardware without prenormalization. U.S. Patent 5,764,549 by Ted Williams
and a recently filed application by myself shows techniques for modifying a
multiplier or a fused multiply/adder to correct for denormal inputs.
Examining a fused multiply/adder, there are 3 operands which can be
denormals and each needs to be corrected. Ted's patent assumes that input
registers have no implied one and instead two partial products are added
into the partial product tree in case the numbers are normalized. These
two partial products are added late into the CSA tree so as not to cause
added delay. It is questionable whether the counter tree is constructed
such that two terms can be late. My application has a couple variations.
Variation 1 assumes that the multiplicand can be determined to be denormal
prior to selecting between two Booth decodes (one with an implied one, and
the other with a zero). And the multiplier is assumed to be a normalized
number but one additional partial product is added in late to the counter
tree if it is denormal. The other variation simply corrects for both
multiplier and multiplicand with additional partial products.
In addition to correcting the product, the addend must be corrected and the
exponents. The critical timing path is the exponent difference and several
exponent differences must be create as possible shift amounts to correct
for a denormal operand. The denormal exponent is zero but must act as
though it is equal to one, so there must be a plus one correction. Given
there is three operands there is usually a -1, 0, or +1 of the exponent
differences and the case of both multiplier and multiplicand being denormal
is a simple case where alignment doesn't matter. While the exponent
difference is being calculated the addend exponent can be compared to see
if it is denormal, and prior to alignment, the addend's most significant
bit can be corrected if necessary. This completes the corrections
necessary for denormal input.
How do we correct a denormal result? There are a couple of slow techniques
which are possible such as trapping to millicode or software. Or one way I
did on a mainframe was to flush the pipeline and re-issue the instruction
in a slow non-pipelined mode so that the data could be sent back to the top
of the pipeline to unnormalized the normalized intermediate result. Other
designs have created a separate denormalizer unit which is used to shift
the normalized intermediate result right and then round the result.
There are also fast techniques for avoiding shifting too far. There are
several patents on this subject but the basic idea is don't normalize
beyond the denormal point. Some designs employ an LZA and it can be
corrected by either determing the maximum shift amount for a denormal and
comparing and selecting between it and the adder LZA. Or by altering the
adder LZA. The adder LZA is typically computed by creating a vector which
then has an LZD performed on it to determine the shift amount. In this LZA
vector design, a denorm maximum vector can be created and logically ORed to
the LZA vector prior to performing the LZD. This will prevent the shift
amount from being greater than the denormal shift maximum.
So, there are several techniques for handling denormals in hardware. Some
are slow and some are fast and complex, but almost all are patented which
is evident from my references listed below.
U.S. patents on denorm handling:
patent application filed 1/18/2001 by me shows the correction scheme used
in FMAC with negative correction
5,903,479 - IBM POK (me), shows denorm result creation by flushing pipeline
and reissueing denorm instruction in non-pipelined mode
5,646,875 - IBM Austin, shows a denormalizer unit
5,267,186 - AMD using a denormalizer unit to punt denorms to
5,058,048 - AMD
5,943,249 - IBM Rochester - denorm shift amount restrictor for
multiplication
5,757,687 - HP - Bounding alignment shifts in a FMAC
5,347,481 - Hal - leading ones correction for multiplication
5,764,549 - IBM Austin (Schmookler) - FMAC LZA OR vector to bound shift
amount
5,513,362 - Matsushita - Bound normalizer by denorm shift amount, example
is for after subtraction
5,966,085 - Lockheed - extends IEEE format to 36 bits to avoid denorms ,
reminds me of Intel double extended
5,963,461 - Sun Moscow - creating denorm exp and norm exp for
multiplication in parallel
Regards,
Eric
eServer Processor Development
IBM Corp.
eschwarz@xxxxxxxxxx