IEEE 754R minutes from July 18, 2002

Attendance and minutes review

The IEEE 754R committee met May 23, 2002 at Network Appliance. In attendance were:

The previous meeting minutes were summarily approved.

Coq and formal specifications for 754R

Marc Daumas presented on "past and future formalizations of the IEEE 754, 854, and 754R standards." The presentation closely followed a slide handout; since the guest rule was in effect, there was no grilling until after the presentation was complete.

Kahan began the questioning with the following statement: "I approve of what you're doing. I regard it as necessary. My hope is that continuing in that direction will ultimately lead to specifications and implementations which are not merely verifiable, but also maintainable." He then stated that language support is a greater problem than hardware correctness, and that formal verification can supplement, but not supplant, testing. Kahan supported his statement with several examples:

Ariane 5:

On June 4, 1996, the Ariane 5 rocket went out of control shortly after takeoff, turning cartwheels over French Guiana until the onboard safety systems caused it to self-destruct. In their presentation of the inquiry report, the inquiry board concluded that the development program "did not include adequate analysis and testing."

The Ariane 5 control software was written in Ada, where overflow is treated as a fatal exception which causes a computation to abort to a handler. Because the Ariane 5 rockets were more powerful than those of the Ariane 4, the software to recalibrate the gyros, which was copied from the Ariane 4 system, suffered an overflow. Since there was no defined exception handler, the system wrote debugging information to memory, overwriting information needed by other parts of the system, and transferred control to the second computer. The backup computer suffered the same exception and transferred control back to the first computer.

Had Ada supported the IEEE standard, the overflow would have generated resulted in an exceptional value and a raised flag, but no disastrous change in control flow.

Software is never fully tested if it is at all interesting. It's unfair to say we didn't know what the software should do in the exceptional circumstance; in any interesting piece of software, there are cases like that. We simply don't want the software to die messily when in such circumstances.

T800 verification:

In his paper, "Formal Methods Applied to a Floating Point Number System," Geoff Barrett verified that an Occam implementation intended to mimic the Inmos T800 Transputer's floating point hardware did indeed meet a Z specification intended to capture the requirements of IEEE 754. There were two mistakes. The first mistake was a transcription error of the type Daumas mentioned in his presentation. "The most negative binary integer" was uttered instead of "-0" somewhere in the specification, so that the routing in add-subtract went the wrong way when signed zeros were added. The second mistake was more vicious. Since the Transputer was designed to do very rapid context switching, so the engineers wanted to minimize the amount of system state. Since they regarded overflow, divide by zero, and invalid exceptions as "mistakes," they replaced three distinct flag bits with a single bit. So the T800 did not conform to the standard.

Stack overflow on the 8087:

In Palmer and Morris's book on the 8087, they acknowledged that you could hardly write the software to naturally handle stack overflow without contortions. Worse, software that would have worked on the 8087 would not work on the 80287, and software for the 80287 would not work on the 80387. Microsoft decided to simplify their lives by enabling the invalid operation trap. On stack over/underflow, the system will jump to a handler to tell you to rewrite your program. The same trap occurs on invalid operations like 0/0; the handler deals with this situation by re-executing the instruction and then clearing the flag so that the invalid trap handled will not occur again immediately when control is returned to the program. So even though the IEEE flags are supported in the 80x87 hardware, they are not accessible from software under Windows.

Pentium FDIV bug:

The algorithm on which the infamous buggy Pentium FDIV was based was proved correct.

The proof was incorrect, and that incorrectness had already been published in the literature. The author of the original paper on the algorithm introduced a cryptic footnote which, to the knowledgeable and observant reader, meant "you shouldn't take this too literally." The FDIV implementors followed the proof as presented in the paper and claimed their hardware design was now proven correct. In fact, the design was correct; but the proof wasn't.

The trouble was not in the design, but in its transcription to hardware. The table specified in the design was not correctly transferred to the PLA; some of the entries in the table were incorrectly identified as don't-cares. Unfortunately, the mistake affected table entries which were infrequently exercised, and the test battery designed for an older implementation failed to expose the problem. The designers, having "proved" the correctness of the design, believed they did not need a new test battery.

Moral: Even after an algorithm is proved formally correct, it must be tested. Testability is a design feature; for instance, the algorithms for the 8087 were designed to be tested. Some 14000 arguments were needed to design for correctness. On the 8087, all the transcendentals were tested with these test vectors, and nobody ever found a bug not induced by some erroneous optimization introduced later.

Kahan expressed additional concerns about the feasibility of verifying some algorithms. There is a tradeoff between cost and ease of proof; algorithms which can be proven using pessimistic inequalities are often overkill. For example, Kahan noted that the algorithms in fdlibm use tricks to modify the coefficients of a Chebyshev approximation in order to compensate for error due to the rounding of the stored coefficients. If you use such a trick, the error due to coefficient rounding becomes negligible compared to other error sources, but proving the algorithm gets good results becomes trickier.

Daumas mentioned that for small arguments, the problem of checking correctness is largely solved. Kahan replied that the only time such techniques work is when you can afford large enough tables. For quad precision, for example, it is unlikely that large tables of low-order approximants will suffice.

Daumas noted that HOC and Coq are systems that check proofs; a human must generate the proof. He compared it to a colleague you can call any time who will be picky about your proofs, and who will not tell you to shut up because he is tired of listening. Kahan agreed that, if they are used to verify a human-derived proof, automatic tools would enhance confidence in the proof. However, he warned, such verification should be considered not as a guarantee, but rather as corroborating evidence of correctness. Kahan also noted that proofs involving integrals, integral inequalities, and limits seemed likely to be difficult to express; Daumas responded that Coq does have some support for limits. Kahan gave an example non-transcendental problem that he thought might be difficult to verify; Daumas subsequently provided a solution, which he sent to the mailing list.

Kahan and Daumas also briefly debated applicability of automated verification techniques to simulated extra precision. Kahan thought Demmel and Hida's paper on accurate summations might provide difficult examples. Daumas noted that he has worked some on applications to online arithmetic. Kahan responded that he cannot afford to spend that long.

Zuras also asked several questions. His first question was whether automated tools could be used to determine whether a set of test vectors exercised all the hypotheses of a proof. The question as posed was never answered, but it did provoke further discussion of testability and the design of the 8087. Zuras noted that Daumas had mentioned a growing set of proved facts about 754, and asked if any of those facts involved FMA operations. Daumas responded that there were no such verified facts yet, but that it should be simple to do the verifications based on current work. Finally, Zuras noted that Daumas works with Muller, and asked whether they had made any progress on proving and verifying correct rounding. Daumas thought that it would likely be a difficult task.

Flagging cancellation

Matula asked Kahan: would it be worthwhile to have a flag to signal when massive cancellation occurs? He noted that something like this could be used to monitor inner product accumulation to tell when more precision is needed, for instance.

Kahan replied that the IBM 360 and the Sigma 7 both had variants of such flags. Nobody could figure out how to use them. Besides, he said, cancellation is not the bad news, but only the bearer of bad news. As to extra precision in inner products, that takes us to the XBLAS. When we are using extended precision tricks, we don't want to avoid cancellation; we want to court it!

The sensible thing to do, said Kahan, is to re-do uncertain computations in higher precision and see how many figures agree. The technique is not perfect, but the probability of success is high, and it is cheap compared to error analysis. Programming languages that made such recomputations easy, he said, would contribute far more to the safety of typical programs than any attempts to promulgate proof techniques (which will be used by specialists).

Draft review

We began with our monthly discussion of table 4. At this point, we specify that the first six predicates shall be provided, the next five should be provided, and the remainder are included in the document for informational purposes, though they can be implemented by standard logical combinations of the first eleven. Our attempted guiding rationale in discussing the predicates is to say what they are, not what they aren't. Jason suggested putting the "not" cases and making them separate, possibly to the extend of putting them in a separate table, with all the verbiage about predicate negation there. We discussed Jim Thomas's two-part proposal from a couple months ago; the first part was approved last time, and the second part was withdrawn, so the discussion was brief.

Alex Liu pointed out that there is no reference to signaling NaN in this section, even in the original text. For library implementors who want things to be quiet even in the face of signaling NaNs, this is a problem. Hough observed that there is also no way to make adds and subtracts quiet.

At this point, we again turned to the troubles of signaling NaNs. Hough asked whether we still thought we might use signaling NaNs to indicate unknowns; if so, he said, then we ought to revisit some of our discussions on negating signaling NaNs, for instance. Kahan pointed out that the origin of signaling NaNs was a political compromise. There were members on the original committee who thought infinity was too drastic for overflow and zero too drastic for underflow, and so the signaling NaN was proposed as a kludge to allow them to implement whatever they wanted. Other people found different uses, but not many. The biggest use is to represent uninitialized data.

Zuras asked whether we could just use quiet NaNs for uninitialized data. The only operations which are not NaN preserving are those which would ignore the operand anyhow. Hough argued that uninitialized data is still a compelling reason to maintain signaling NaNs. Kahan thought the right idea, even if the implementation is difficult, is that when signaling NaNs are used as arithmetic operands, they are first either replaced with a quiet NaN or they are replaced with some quantity determined by the programmer. Kahan then repeated Hough's point: if it is to be used for uninitialized data, signaling NaNs should generate a signal other than "invalid operation."

Thomas observed that the situation sounds like an extreme case of the tail wagging the dog, and pointed out that we do not want to make changes without a compelling reason.

Underflow

Hough began with the statement that he does not particularly believe in counting mode. If most of the computations are in double, and extended is available, why would counting mode warrant hardware support? Riedy mentioned determinants, and Zuras objected that the LU factorization is more expensive than the product in that case; Riedy said "Sparse," and Zuras said "Oh." Kahan confirmed that scaling time could be dominant for sparse determinants, and mentioned that when computing Klebsch-Gordon coefficients, scaling also takes up much of the time.

Hough asserted that the original point of a split definition of underflow (underflow when any subnormal is generated vs when accuracy is lost due to subnormalization) is to support counting mode. Kahan disagreed. The reason for trapping on any subnormal, he replied, is that there are people who never want to see a subnormal number in their programs. They want to flush to zero or to some threshold. Speed is one reason, as owners of older Alphas knew; another reason is computational convenience. For example, a speaker at LBL recently talked about computing fast spherical harmonics. As part of his computation, he generates Legendre functions by a backward recurrence. The numbers generated become very tiny; when they do, their exact value matters little, so long as they are nonzero. This gentleman had a legitimate reason for avoiding gradual underflow.

Hough thought that it might be worthwhile to allow two user exceptions: underflow and exact subnormal. He thought it would be a mistake to combine the two. Scott agreed with the generic opinion that the current multi-definition of underflow causes a lot of confusion to a lot of people. And now, continued Hough, since we are proposing a menu, the logical solution seems to be to provide two separate extensions. We have already accepted the suggestion that exceptions will be defined from the perspective of the user program. Kahan asked: if people who want flush-to-zero set both traps, and people who want the default disable both traps, what would someone who wanted counting mode, wrapping, etc want? Would they turn on the first exception and not the second? Hough replied that he thought it depended on who it is that wants counting mode, and whether they wanted subnormals or not. Several noted that flush-to-zero is generally for performance anyhow.

Schwarz stated that the fewer alternatives there were, the happier he would be. He only wants one behavior for non-trapped mode. Designers at IBM are used to subnormals, he said, and can implement them at high speed. Hough asked how they cope with gradual underflow, and whether it would be possible to post something to the mailing list describing the alternatives. Schwarz thought it would be worthy of a paper which has not yet been written, but agreed that he would look through his library and provide a bibliography.

Schwarz's summary to the mailing list is far more comprehensible than my notes on the brief discussion of FMA implementation that followed, and I refer the interested reader there. As I understand it, the IBM machines take advantage of FMA data flow and tags in the register file. The Athlon machines internally represent all numbers in a normalized format, and invoke microarchitectural traps when producing subnormal results and when writing subnormals to memory. The penalty for the microarchitectural traps on the Athlon is a modest number of cycles (14 instead of 4 cycles to produce subnormal outputs; 8-10 instead of 2 cycles to store a subnormal); but the real penalty is that the pipeline is flushed.

Kahan stated that he would like to see implementations where the pipeline is not flushed, but merely stalled. Das Sarma replied that it is possible, but difficult, and that denormals are so extremely rare in benchmarks that it is not worth penalizing the common case. Riedy asked if they would run a benchmark that penalized slow gradual underflow, if such a benchmark existed. Das Sarma replied that they would.

Static exception handling

Hough's reason for bringing up underflow in the first place was his work on a proposed facility to statically modify an operation at compile time. He suggested that we might write, for instance

  x * <-round-to-zero-> y
        invalid: substitution(expression)
                 substitutionmag()
                 goto label

The point is that this special, static way of dealing with this is right there. That's the essence of the proposal, and it leads to other things -- like what the names of the exceptions are, why signaling NaN should not be among them, and why we should have a different name for an exact subnormal result. More details are available from postings to the mailing list.

End notes and scheduling

Upcoming meetings are

Riedy and Bindel will speak on their alternate exception handling proposal in August. In September, Kahan will discuss implementation of gradual underflow.

754 | revision | FAQ | references