Return-Path: owner-stds-754@majordomo.ieee.org Delivery-Date: Tue Aug 13 12:35:01 2002 Received: from relay1.EECS.Berkeley.EDU (relay1.EECS.Berkeley.EDU [169.229.60.163]) by mailspool.CS.Berkeley.EDU (8.9.3/) with ESMTP id MAA28652; Tue, 13 Aug 2002 12:35:01 -0700 (PDT) Received: from EECS.Berkeley.EDU (localhost.Berkeley.EDU [127.0.0.1]) by relay1.EECS.Berkeley.EDU (8.9.3/8.9.3) with ESMTP id MAA21106; Tue, 13 Aug 2002 12:34:59 -0700 (PDT) Received: from ruebert.ieee.org (ruebert.ieee.org [140.98.193.10]) by EECS.Berkeley.EDU (8.9.3/8.9.3) with ESMTP id MAA21098; Tue, 13 Aug 2002 12:34:54 -0700 (PDT) Received: (from daemon@localhost) by ruebert.ieee.org (Switch-2.1.0/Switch-2.1.0) id g7DJUb615250 for stds-754-resent; Tue, 13 Aug 2002 15:30:37 -0400 (EDT) Importance: Normal Sensitivity: Subject: Denormalized number handling To: stds-754@ieee.org X-Mailer: Lotus Notes Release 5.0.3 (Intl) 21 March 2000 Message-ID: From: "Eric Schwarz" Date: Tue, 13 Aug 2002 15:28:56 -0400 X-MIMETrack: Serialize by Router on D01MLC96/01/M/IBM(Release 5.0.11 |July 29, 2002) at 08/13/2002 03:30:29 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-stds-754@majordomo.ieee.org Precedence: bulk X-Resent-To: Multiple Recipients X-Listname: stds-754 X-Info: [Un]Subscribe requests to majordomo@majordomo.ieee.org X-Moderator-Address: stds-754-approval@majordomo.ieee.org At the last meeting I was asked to summarize how denormals now called subnormals are handled in hardware. For reference I talked with Marty Schmookler and wrote this short description. Let me know if you have any comments. Sorry if it is assumes detailed knowledge of hardware design, maybe I'll write up a longer description sometime. There are several solutions on how to handle denormalized numbers in hardware. But most of the literature on this subject comes from U.S. patents which also makes it difficult to implement without violating some other companies claims. First there are partial software or millicode solutions on how to handle denormalized numbers. These solutions simply trap to millicode or software when a denormalized input or result is detected. Other solutions assume denormalized numbers are represented in the register file in a normalized form using an extended exponent range. This satisfies most problems with inputing denormalized numbers but still requires some solutions for outputing late detected denormalized results. This solution is similar to Intel's 80 bit format which utilizes an extended exponent range. These techniques are used in a load/store type architecture where loads transform denorms into a normalized internal format and stores are forced to denormalize results. Still even in these designs there is a problem with intermediate results which underflow. The results need to be denormalized and rounded (unless you're in an 80 bit format) and they have the additional problem that then the result is normalized to be placed back into the register file. There are techniques for handling denormalized numbers as inputs within hardware without prenormalization. U.S. Patent 5,764,549 by Ted Williams and a recently filed application by myself shows techniques for modifying a multiplier or a fused multiply/adder to correct for denormal inputs. Examining a fused multiply/adder, there are 3 operands which can be denormals and each needs to be corrected. Ted's patent assumes that input registers have no implied one and instead two partial products are added into the partial product tree in case the numbers are normalized. These two partial products are added late into the CSA tree so as not to cause added delay. It is questionable whether the counter tree is constructed such that two terms can be late. My application has a couple variations. Variation 1 assumes that the multiplicand can be determined to be denormal prior to selecting between two Booth decodes (one with an implied one, and the other with a zero). And the multiplier is assumed to be a normalized number but one additional partial product is added in late to the counter tree if it is denormal. The other variation simply corrects for both multiplier and multiplicand with additional partial products. In addition to correcting the product, the addend must be corrected and the exponents. The critical timing path is the exponent difference and several exponent differences must be create as possible shift amounts to correct for a denormal operand. The denormal exponent is zero but must act as though it is equal to one, so there must be a plus one correction. Given there is three operands there is usually a -1, 0, or +1 of the exponent differences and the case of both multiplier and multiplicand being denormal is a simple case where alignment doesn't matter. While the exponent difference is being calculated the addend exponent can be compared to see if it is denormal, and prior to alignment, the addend's most significant bit can be corrected if necessary. This completes the corrections necessary for denormal input. How do we correct a denormal result? There are a couple of slow techniques which are possible such as trapping to millicode or software. Or one way I did on a mainframe was to flush the pipeline and re-issue the instruction in a slow non-pipelined mode so that the data could be sent back to the top of the pipeline to unnormalized the normalized intermediate result. Other designs have created a separate denormalizer unit which is used to shift the normalized intermediate result right and then round the result. There are also fast techniques for avoiding shifting too far. There are several patents on this subject but the basic idea is don't normalize beyond the denormal point. Some designs employ an LZA and it can be corrected by either determing the maximum shift amount for a denormal and comparing and selecting between it and the adder LZA. Or by altering the adder LZA. The adder LZA is typically computed by creating a vector which then has an LZD performed on it to determine the shift amount. In this LZA vector design, a denorm maximum vector can be created and logically ORed to the LZA vector prior to performing the LZD. This will prevent the shift amount from being greater than the denormal shift maximum. So, there are several techniques for handling denormals in hardware. Some are slow and some are fast and complex, but almost all are patented which is evident from my references listed below. U.S. patents on denorm handling: patent application filed 1/18/2001 by me shows the correction scheme used in FMAC with negative correction 5,903,479 - IBM POK (me), shows denorm result creation by flushing pipeline and reissueing denorm instruction in non-pipelined mode 5,646,875 - IBM Austin, shows a denormalizer unit 5,267,186 - AMD using a denormalizer unit to punt denorms to 5,058,048 - AMD 5,943,249 - IBM Rochester - denorm shift amount restrictor for multiplication 5,757,687 - HP - Bounding alignment shifts in a FMAC 5,347,481 - Hal - leading ones correction for multiplication 5,764,549 - IBM Austin (Schmookler) - FMAC LZA OR vector to bound shift amount 5,513,362 - Matsushita - Bound normalizer by denorm shift amount, example is for after subtraction 5,966,085 - Lockheed - extends IEEE format to 36 bits to avoid denorms , reminds me of Intel double extended 5,963,461 - Sun Moscow - creating denorm exp and norm exp for multiplication in parallel Regards, Eric eServer Processor Development IBM Corp. eschwarz@us.ibm.com