[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*To*: r754@xxxxxxxxxxxxxx*Subject*: Re: This month's agenda & my rant on reductions...*From*: Ivan Godard <igodard@xxxxxxxxxxx>*Date*: Fri, 12 Aug 2005 19:38:05 -0700*Cc*: stds-754@xxxxxxxx*Domainkey-signature*: a=rsa-sha1; s=sbc01; d=pacbell.net; c=nofws; q=dns; h=message-id:date:from:user-agent:x-accept-language: mime-version:to:cc:subject:references:in-reply-to:content-type:content-transfer-encoding; b=gNS2zWpguyD+jxvbnohkYYCKBPPff3EYWS+ROpJOG2/5/vcJ61vO2UQJWQd75UhMh NgnZRUc4UaU40b3+UF4zA==*In-reply-to*: <20050812222745.2BB38491B84@xxxxxxxxx>*References*: <20050812222745.2BB38491B84@xxxxxxxxx>*Sender*: stds-754@xxxxxxxx*User-agent*: Mozilla Thunderbird 1.0.5 (Windows/20050711)

From your description of the bio problem, it sounds like nearly all the
primitive operations would trap nearly all the time. Is that so? If so,
I don't see why the reductions proposals would improve matters. Our
Mill hardware supports reductions directly, but that wouldn't help on
this one. We too would promptly get a NaN/inf and that's all she wrote,
despite the special performance hardware. Of course, if you are presuming that a reduction delivers its result with
only one rounding for the whole thing then the bio problem is
solved. But the rejected proposal did not call for that, and you would
be burnt at the stake if you proposed it :-)Ivan r754@xxxxxxxxxxxxxx wrote: The IEEE-754 meetings this month will be at Sun in Menlo Park from 1:00 to 5:00. Please come to the lobby of building 14 on that campus. Dave has already given directions & phone numbers. The Tuesday 8/16 subcommittee meeting will be held in the San Gregorio Room in building 16. We will be discussing the pre- ballot study drafts as well as issues arising from the failure of many proposals to pass last month. (More on that below.) The draft review will be held Wednesday 8/17 in the Pulgas Water Temple Room in building 16. The topic will be the incorporation of the text of those issues that passed last month. The general meeting will be held Thursday 8/18 in the Sequoia Room in building 14. We will spend most of the meeting on active ballot issues as well as an hour set aside for John Crawford to present some decimal performance data for discussion. Now, I would like to mention the difficulty that arises from the failure of many of last month's _expression_ proposals to pass. These are not the whims of the subcommittee or a laundry list of things we think are cool. These are part of a larger whole that is our attempt to deal with many of the more difficult problems we have been unable to resolve in the last 5 years. There is a valid criticism that many are vague or under specified but that is intentional. We are attempting to leave room for implementers to innovate these functions for performance & accuracy in ways we have not yet conceived & cannot legislate. Let me illustrate with a single example that I ran across this week. I was involved with the BioInformatics conference held at Stanford this week. It turns out that BioInformaticians are either biologists that use computers in their research or computer scientists that are interested in developing software for those biologists to use. Anyway, in the 3 days I was at the conference, just by chance, I talked to a graduate student at USF & a new professor at Saint Andrews in Scotland, both of whom had similar problems. They were modelling the _expression_ of certain classes of genes using a variant of a Markov model to create a search model that could search large databases & recognise similar genes in new contexts. A cursory discussion suggested that they were having problems with overflow & underflow generating spurious infinities & NaNs. The graduate student was kind enough to sit down with me & we debugged her problem over the course of a day & a half or so. While it turned out the be a little more complex than I initially suspected, limited dynamic range was the problem. You see, in Markov modelling, one creates a finite state machine that operates on a graph that is generally a DAG. (There are some slight provisions for looping but only from special nodes to themselves.) The arcs of the graph are labeled with the probability of taking that path & the nodes are labeled with the probability of arriving there. There are also output characters (in this case, amino acids) that are annotated with the probability of using them. The figure of merit one tries to optimize is the probability of strings you would like to accept (in this case, from a training set of proteins that are all similar in some meaningful way) over the probability of those same strings arising just by chance (the null hypothesis). The algorithm is to run what amounts to a dynamic programming matrix over the graph from beginning to end & then from end to beginning, adjusting the probabilities until some convergence criterion is met on the training set of proteins. Then you run it on some huge database expecting that similar genes will be labeled with high probabilities. (For reference, there are 20 amino acids. In the absence of any knowledge of its use, the probablity of any given chain of 250 of them in a protein must be assumed to be (1/20)^250 or about 10^-325. Ratios of such probabilities, and longer ones that arise, get into the 10^(+/-20000) range.) In the course of this training, the calculation that wants to be done is ratios of sums of products of probabilities. (Sound familiar to anyone?) The added complication is that the biologists recognise that the dynamic range of the ratios of probabilities is so huge as to blow out the range of double precision numbers. Their solution to this problem, to store some of the probabilities as logarithms, only partially attenuates the problem. They are forced to convert back to probability ratios in the course of updating their figure of merit & that's when over/underflows bite them in the ass with the delivery of NaNs & infinities. I was able to suggest a slight modification of the calculation that stays in logs of probabilities (what they call log-odds) but that too is only a partial solution. They are still left with the problem that computing with the logs in this way is inherently much slower & more innacurate than just multiplying the probabilities would be. The REAL solution is to provide them with the ability to take long products & sums & not worry about overflow or underflow. And THIS is exactly what was shot down last month in the form of the reductions proposal as well as others. For us, part of the justification for including these functions was that, by including them, we could eliminate the need for counting mode. This is the only known application for it & we felt it was much better to provide the functions themselves than to mandate a mode that is only a hack when it comes to solving the problem. Without the reductions we are forced to consider counting mode again. The application that Prof Kahan mentioned, namely Klebsh-Gordan or Wigner Coefficients, did not strike many members of the committee as important enough to justify such reductions. Perhaps that is so. But I have now discovered that there are people out there who need these functions in their 'daily lives'. These people are NOT numerical analysists & I think we do them a disservice if we force them to learn our profession. Beware: Soon the biologists will have the power to take their revenge on us in nasty & disgusting ways. :-) Well, I guess its time to get off my soapbox & let someone point out that our laundry list is arbitrary or vague. Well I urge you to attend the subcommittee meeting Tuesday where I would like to amend the reductions proposal to meet your objections & propose a modified version again next month. Because I'm certain you can't make this problem go away by ignoring it. Dying might help. But ignoring won't work. Dan |

**Follow-Ups**:

**References**:

- Prev by Date:
**Re: minutes** - Next by Date:
**Re: [...] my rant on reductions...** - Previous by thread:
**Re: [...] my rant on reductions...** - Next by thread:
**Re: This month's agenda & my rant on reductions...** - Index(es):