Re: motion elementary functions
> From: John Pryce <j.d.pryce@xxxxxxxxxxxx>
> Subject: Re: motion elementary functions
> Date: Wed, 14 Oct 2009 17:44:32 +0100
> To: stds-1788 <stds-1788@xxxxxxxxxxxxxxxxx>
Folks,
I started out intending to just answer John's
questions & make a few comments.
I ended up writing a rather long argument for a
particular form of an interval xx^yy.
If this seems premature, I apologise.
But it seemed to be called for to answer John's
questions & well as acknowledge the concerns of
Hossam & Michel.
Since I intended this to be short, the long
version that it turned into seems more like the
stream of consciousness of a mad mathematician
than something I would allow to appear in print.
(I generally try to hide that from others. :-)
>
> Dan, Hossam, P1788 members
>
> On 13 Oct 2009, at 14:02, Dan Zuras Intervals wrote:
> > We had great difficulty deciding on the correct definition
> > of x^y for 754. In the end, including all of pown, powr,
> > & pow was the compromise I LEAST desired. And I, like
> > Hossam as I remember, argued for pow, the most general
> > of the three.
>
> I am VERY sympathetic to the idea of having one pow, which is to be
> the most general, in some sense. In principle... But I have doubts.
> Three different definitions of power are being put forward:
As you note below, there are 3 different definitions
but they all define the same function. At least, all
3 functions are the same when their domains of definition
overlap.
The question for us then is to extend these definitions
to intervals in a way that is meaningful for us.
More on this below.
>
> (a) Naive: x^n for integer n, defined for n > 0 as x*x*...*x (n
> times), and extended to n <= 0 for nonzero x, in the unique way that
> makes the index law x^(m+n) = x^m x^n hold.
> (b) Exp-based: x^y defined as exp(y log(x)).
> (c) Rational: the more subtle definition x^r for rational r, based on
> (a) and on the n-th root function.
>
> Let Q_o (o for odd) be the set of rationals that have odd denominator
> when reduced to lowest terms. The set Z of integers (zero included)
> is a subset of Q_o because n=n/1 in lowest terms.
While this is the correct domain for the denominator of a
rational exponent when x < 0, we must include all Z when
x >= 0. After all, we need to do 4^(1/2) = 2.
>
> Domains of definition:
>
> . . .
>
> I don't think this quite matches what Dan said. But for x<0
> > It can even be evaluated efficiently as -exp(y*log(|x|)
> seems true.
You are quite correct: It does not match what I said. :-)
Never take a mathematician's words out of context or
eliminate any qualifying conditions. :-)
I believe I said that for x < 0 it can be efficiently
evaluated as -exp(y*log(|x|) BUT FOR A SIGN AMBIGUITY.
Let me further clarify THAT below.
>
> It is easy to prove
> Theorem.
> These three functions agree. That is, at any (x,y) for which more
> than one of them is defined, they take the same value.
Some care must be taken here.
While it is true that these function agree on their
domain of overlap, it is also true that in the domain
of x < 0, for every result that is negative, there is
another result that is arbitrarily nearby that is
positive & arbitrarily close to its absolute value.
This is the sign ambiguity I mentioned in qualification.
>
> Corollary.
> We can define a function Pow(x,y) on the union of the domains of
> these functions, to equal whichever of them is defined at that point.
With some care, yes I believe we can. And even a
little farther perhaps.
>
> Hossam, Dan:
> Is this Pow(x,y) the same as the pow(x,y) that you want P1788 to use?
It is quite close to the one I have in mind.
(I changed my mind in the end. :-)
Rather than foist it on you all, though, I
would like to have a chance to justify it.
In the domain of interval arguments & results
there is another definition that you might
feel is better justified. (I chose this one. :-)
I'm not sure. (I decided much later on. :-)
>
> (***) Or, horrors, do you actually want to use the definition -exp
> (y*log(|x|) for ALL x<0 and all y??
No. But, with some care, it might turn out to be
a practical part of any accurate implementation.
>
> My doubts
> ---------
> Recall Michel's sage words:
> On 3 Sep 2009, at 11:42, Michel Hack wrote:
> > The various examples where domain violations are mishandled if the
> > wrong choice is made with regard to returning or not returning NaI
> > demonstrate that the choice is needed. The problem now is that
> > this requires somebody to make the choice, i.e. understand what is
> > going on. In other words, the programmer has to THINK (oh my, how
> > old-fashioned can you get).
While Michel correctly fears the programmer
overthinking things, I think we can define xx^yy
is such a way as to minimally tax the programmer's
brain. :-)
It won't be zero but it will be smaller than it
would be if the answer he expects is not in the
resulting interval.
>
> This all-singing-all-dancing Pow(x,y) is in the PL/I tradition of
> "give a meaning to more or less everything, and worry later what
> inconsistencies may arise". But there are genuine subtleties. Take an
> example like Hossam's:
Michel's warning is well taken.
Still the traditional fear associated with this
issue assumes that the various definitions can
disagree. Mathematically, they cannot. But
numerically, they can.
Once again, in our context this translates to an
issue of possible extra width in a result interval.
Forgive me if I treat this as an implementation
issue at this point in the standards process.
There is a true standards issue here. Namely just
how much accuracy will we demand (shall) from our
interval xx^yy? But I think we should postpone
that issue until we have settled the more pressing
issue of what is the correct answer. :-)
(BTW, you should know that Jean-Michel's people at
INRIA & others elsewhere have recently published a
few extraordinary papers that suggest that an
efficient CORRECTLY ROUNDED double precision x^y
may soon be practical. I never thought I'd see it
in my lifetime. It is a quite remarkable result.)
>
> . . .
>
> No doubt I have misunderstood Hossam and Dan. So please put me right.
>
> Best wishes
>
> John
As it happens, you understand things quite well.
It is just that we have not fully described the thing. :-)
Now, I must admit that I was not prepared to go into
this discussion at this time, but since the question
has been raised, I'll try to write something off the
top of my head.
Forgive me for any errors or ill written prose. :-)
First, let me describe the rational function x^y in a
way similar to John's. This is our true domain's
domain in the sense that all of our data points are
ultimately rational. Still, this approach will turn
out to be cumbersome in the same way that epicycles
are a cumbersome way to describe the motions of the
planets.
I will consider a rational x even though this is not
strictly necessary.
So, where among the rationals, can x^y be defined
such that the result is also rational? I believe it
is in the following:
For x = a^d/b^d, GCD(a,b) = 1, y = c/d, GCD(c,d) = 1
we have that x^y = a^c/b^c.
Cumbersome, no? :-)
Still, all the integer powers live here: a^c.
Powers of the perfect integer roots: (a^d)^(c/d) = a^c.
And so on. Examples: 2^2 = 4, 4^(1/2) = 2,
(-8)^(1/3) = -2, ...
This last should give us pause. Notice that for x < 0,
the only possible examples require that d be odd. For
only then may x = a^d/b^d be negative.
Since the only reason for this cumbersome approach is
to see how it is continued into the Reals & ultimately
into the Intervals, it is instructive to think about
continuity issues in this rational domain.
Again, it crops up when x < 0.
Lets say I'm computing x^(c/d) for some large c & d
with both c & d odd. Then x^(c/d) will be negative.
But for every such odd c & d, there will be another
c' & d' arbitrarily close in ratio such that c' is
even & d' is odd. For example x^(2c/(2d+1)). In
this case the answer is positive & quite close to the
previous result in absolute value.
(Actually, I should be more careful here: such an
example will only work of x is a negative rational
power of both d & d'. Or if you move to a nearby x'.
But, as it is not important for our purposes that x
be rational, let's gloss over that. OK? :-)
So what does this example teach us? We now know that,
for x < 0 at least, the function x^y is not continuous
EVEN AMONG THE RATIONALS.
So any hope we might have of DEFINING a continuous
function for all x & y is doomed to disagree with
someone somewhere.
Among the REALS.
There is still hope for the intervals. :-)
OK, let's abandon the cumbersome rationals & see how
we would generalize well known Real functions to get
to a consistent x^y.
The natural place to start is e^y. It is a perfectly
well defined & well behaved function among the Reals.
It has a nice inverse function in ln(x) which I will
write as log(x) in this context for reasons that will
soon become clear.
Given the laws of these functions: (e^x)*(e^y) = e^(x+y),
log(x*y) = log(x) + log(y), and of exponentiation in
general: (x^a)^b = x^(a*b), we have a natural
generalization to x^y in the form of x^y = e^(y*log(x)).
So far so good. But notice that since we are taking
log(x) we have introduced the restriction that x > 0.
Let's live with that for the moment & explore the
complex domain.
Among the complex numbers, e^z is still a perfectly well
defined & well behaved function. It has an essential
singularity at the projective infinity. It is also
aperiodic in the real part of its argument & periodic
in the imaginary part due to e^(i*2*Pi) = 1 & the
identity e^z = e^(z + i*2*k*Pi) for all integer k.
But now its inverse function is strange. For all
w = e^z = e^(z + i*2*k*Pi) the inverse function, log(w)
has the property that all of log(w) = z + i*2*k*Pi for
all k are valid inverses.
This sort of function is called a many valued function
& shows up in pretty much every foxhole found in the
complex domain. :-)
In this case 'many' is infinite but sqrt(z) is a 2
valued function.
So how are we to deal with this ambiguity?
Well the traditional way is to define what is known
as the Principal Valued Function of log(w) which is
written Log(w) (only rarely Ln(w)) & is defined such
that the imaginary part of the result is in (-Pi,Pi].
(The Imag(w) = Pi case is an arbitrary convention.)
With all that crap out of the way, how should we
define a complex x^y? In the obvious way, really,
as e^(y*log(x)). Or should it be e^(y*Log(x))?
The former is the formally correct definition among
the complex numbers but suffers from the fact that
is has infinitely many results unless y is an integer.
(Actually, it has infinitely many results everywhere
but when y is an integer, they are all the same. :-)
If you want a bit of fun, figure out the answer to
i^i. You will discover that they are all Real. :-)
So even in the complex domain x^y is typically defined
using the principal value of the log.
OK, this probably looked like a digression & about
now you are asking yourself why we care.
Since x^y is defined everywhere in the complex plane
for both arguments, it is natural to consider defining
x^y among the reals as the real projection of that
complex x^y function.
And this helps us how?
Because for a negative first argument we have that
(-x)^y = e^(y*log(-x)) = e^(y*(log(|x|) + i*(Pi + 2*k*Pi)))
= e^(y*log(|x|))*e^(i*y*(1 + 2*k)*Pi).
Is this an improvement?
Only a bit.
The value of x pretty much doesn't matter here. But for
most values of y we're screwed in that the value of
(-x)^y is still a complex number (i.e. has a non-zero
imaginary part) & therefore an inappropriate answer for
a function intended to return real results.
But if you look carefully you will notice that for those
rare RATIONAL y such that y*(1 + 2k) is an integer for
ANY k, there is a pure real result. It is negative if
that integer is odd, positive if its even.
It requires a bit of thought but you will find that
out of all the infinitely many complex answers to this
problem there are no other possible pure real results.
So a Real valued x^y can be defined everywhere in x
with the proviso that for x < 0 it can only be defined
for only the very rare y. But in that case it has
exactly one pure real answer, either positive or negative.
It is also important to note that for ALL y there exist
infinitely many y' arbitrarily close to y for which
(-x)^y' is defined (either + or -).
This is because the rationals are said to be dense.
So the only sense in which these y's are rare is that
there are a hell of a lot more Reals than rationals.
That is the Reality (of Complexity) of the situation.
(Forgive me, I couldn't resist. :-)
What is the Intervality of xx^yy?
(OK, it doesn't work here. So sue me. :-)
Well there are two ways we could go. For x > 0, we
have that x^y is single valued & continuous.
(There is an essential singularity at 0^0 but let's
pass on that for the moment.)
So for all xx > 0 & all yy, we can consistently define
xx^yy using e^(y*log(|x|)) & get correct, continuous,
& enclosing intervals.
But what if some part of xx peeks into the negative?
In this domain, xx^yy must be considered to be a two
valued function (it is said that the function splits
into two manifolds) both of which can be computed
using (+/-)e^(y*log(|x|)).
Now we COULD arbitrarily decide to define one of these
as the principal value (I would choose the negative)
& return that as the result.
It would be perfectly well defined, continuous, but
only arbitrarily single valued.
You may choose to define it this way.
But if you do that you will come up against Michel's
criticism of the all-singing-all-dancing Pow(x,y) &
fail to give the answer that some people legitimately
expect.
But we have a power that the floating-point guys lack:
In the case of a function that has more than one answer
we have the power to return an interval that contains
BOTH answers.
Thus, for xx > 0 we could define the interval valued
xx^yy using the obvious extension of the real valued
x^y. But for intervals xx with at least some values
of xx < 0 we could define the INTERVAL valued xx^yy
not as the extension of the REAL valued x^y but as
the union of all [-e^(y*log(|x|)),e^(y*log(|x|))] for
all x < 0 \in xx & all y \in yy.
It is mathematically correct. It is easy to implement.
It is continuous (on each manifold). It is defined
everywhere. (The y's need no longer be considered
rare.) And it encloses the answer. All of them.
I don't care who you are or how you define your x^y,
your answer will be in the interval returned by this
function.
So, this is what I would propose.
BTW, you should know that when I started to write this
I was going to argue for the arbitrarily single valued
function. But its been a long argument (sorry about
that :-) & in the course of writing it I changed my
mind.
Its OK if you argue with yourself. But you'd better
win the argument. :-)
At this point its hard to describe this as just off
the top of my head any more. But that's all the
argument I have in me at the moment.
Would anyone else care to blow it to bits? :-)
I'm going to duck now...
Enjoy,
Dan
P.S. - I realized later that this definition suffers
from the inability to narrow a contractive map beyond
some point when xx contains negative values. This is
the moral equivalent of not being able to separate
the roots in a dual valued square root. I don't know
how you guys solve that problem but if it involves
having both positive & negative square root functions
lying around, we might need the same thing for xx^yy.
So in the end we might have 3 functions anyway. Damn. :-)
P.P.S. - We can get away with defining the singleton
result [0,0]^[0,0] = [1,1] & pay attention to the
details of non-zero width intervals with 0 as an
endpoint. But we may have to define xx^yy with both
containing 0 in their interiors as [entire] in order
to preserve the containment principal of contractive
maps. I might be wrong. I'm tired after all this
writing so could someone check this for me? Thanks.