Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: motion elementary functions



> From: John Pryce <j.d.pryce@xxxxxxxxxxxx>
> Subject: Re: motion elementary functions
> Date: Wed, 14 Oct 2009 17:44:32 +0100
> To: stds-1788 <stds-1788@xxxxxxxxxxxxxxxxx>

	Folks,

	I started out intending to just answer John's
	questions & make a few comments.

	I ended up writing a rather long argument for a
	particular form of an interval xx^yy.

	If this seems premature, I apologise.

	But it seemed to be called for to answer John's
	questions & well as acknowledge the concerns of
	Hossam & Michel.

	Since I intended this to be short, the long
	version that it turned into seems more like the
	stream of consciousness of a mad mathematician
	than something I would allow to appear in print.
	(I generally try to hide that from others. :-)

> 
> Dan, Hossam, P1788 members
> 
> On 13 Oct 2009, at 14:02, Dan Zuras Intervals wrote:
> > 	We had great difficulty deciding on the correct definition
> > 	of x^y for 754.  In the end, including all of pown, powr,
> > 	& pow was the compromise I LEAST desired.  And I, like
> > 	Hossam as I remember, argued for pow, the most general
> > 	of the three.
> 
> I am VERY sympathetic to the idea of having one pow, which is to be  
> the most general, in some sense. In principle... But I have doubts.  
> Three different definitions of power are being put forward:

	As you note below, there are 3 different definitions
	but they all define the same function.  At least, all
	3 functions are the same when their domains of definition
	overlap.

	The question for us then is to extend these definitions
	to intervals in a way that is meaningful for us.

	More on this below.

> 
> (a) Naive: x^n for integer n, defined for n > 0 as x*x*...*x (n  
> times), and extended to n <= 0 for nonzero x, in the unique way that  
> makes the index law x^(m+n) = x^m x^n hold.
> (b) Exp-based: x^y defined as exp(y log(x)).
> (c) Rational: the more subtle definition x^r for rational r, based on  
> (a) and on the n-th root function.
> 
> Let Q_o (o for odd) be the set of rationals that have odd denominator  
> when reduced to lowest terms. The set Z of integers (zero included)  
> is a subset of Q_o because n=n/1 in lowest terms.

	While this is the correct domain for the denominator of a
	rational exponent when x < 0, we must include all Z when
	x >= 0.  After all, we need to do 4^(1/2) = 2.

> 
> Domains of definition:
>
> . . .
>
>      I don't think this quite matches what Dan said. But for x<0
> >    It can even be evaluated efficiently as -exp(y*log(|x|)
>      seems true.

	You are quite correct: It does not match what I said. :-)

	Never take a mathematician's words out of context or
	eliminate any qualifying conditions. :-)

	I believe I said that for x < 0 it can be efficiently
	evaluated as -exp(y*log(|x|) BUT FOR A SIGN AMBIGUITY.

	Let me further clarify THAT below.

> 
> It is easy to prove
> Theorem.
> These three functions agree. That is, at any (x,y) for which more  
> than one of them is defined, they take the same value.

	Some care must be taken here.

	While it is true that these function agree on their
	domain of overlap, it is also true that in the domain
	of x < 0, for every result that is negative, there is
	another result that is arbitrarily nearby that is
	positive & arbitrarily close to its absolute value.

	This is the sign ambiguity I mentioned in qualification.

> 
> Corollary.
> We can define a function Pow(x,y) on the union of the domains of  
> these functions, to equal whichever of them is defined at that point.

	With some care, yes I believe we can.  And even a
	little farther perhaps.

> 
> Hossam, Dan:
> Is this Pow(x,y) the same as the pow(x,y) that you want P1788 to use?

	It is quite close to the one I have in mind.
	(I changed my mind in the end. :-)

	Rather than foist it on you all, though, I
	would like to have a chance to justify it.

	In the domain of interval arguments & results
	there is another definition that you might
	feel is better justified.  (I chose this one. :-)

	I'm not sure.  (I decided much later on. :-)

> 
> (***) Or, horrors, do you actually want to use the definition -exp 
> (y*log(|x|) for ALL x<0 and all y??

	No.  But, with some care, it might turn out to be
	a practical part of any accurate implementation.

> 
> My doubts
> ---------
> Recall Michel's sage words:
> On 3 Sep 2009, at 11:42, Michel Hack wrote:
> > The various examples where domain violations are mishandled if the
> > wrong choice is made with regard to returning or not returning NaI
> > demonstrate that the choice is needed.  The problem now is that
> > this requires somebody to make the choice, i.e. understand what is
> > going on.  In other words, the programmer has to THINK (oh my, how
> > old-fashioned can you get).

	While Michel correctly fears the programmer
	overthinking things, I think we can define xx^yy
	is such a way as to minimally tax the programmer's
	brain.  :-)

	It won't be zero but it will be smaller than it
	would be if the answer he expects is not in the
	resulting interval.

> 
> This all-singing-all-dancing Pow(x,y) is in the PL/I tradition of  
> "give a meaning to more or less everything, and worry later what  
> inconsistencies may arise". But there are genuine subtleties. Take an  
> example like Hossam's:

	Michel's warning is well taken.

	Still the traditional fear associated with this
	issue assumes that the various definitions can
	disagree.  Mathematically, they cannot.  But
	numerically, they can.

	Once again, in our context this translates to an
	issue of possible extra width in a result interval.

	Forgive me if I treat this as an implementation
	issue at this point in the standards process.

	There is a true standards issue here.  Namely just
	how much accuracy will we demand (shall) from our
	interval xx^yy?  But I think we should postpone
	that issue until we have settled the more pressing
	issue of what is the correct answer. :-)

	(BTW, you should know that Jean-Michel's people at
	INRIA & others elsewhere have recently published a
	few extraordinary papers that suggest that an
	efficient CORRECTLY ROUNDED double precision x^y
	may soon be practical.  I never thought I'd see it
	in my lifetime.  It is a quite remarkable result.)

> 
> . . .
>
> No doubt I have misunderstood Hossam and Dan. So please put me right.
> 
> Best wishes
> 
> John

	As it happens, you understand things quite well.

	It is just that we have not fully described the thing. :-)

	Now, I must admit that I was not prepared to go into
	this discussion at this time, but since the question
	has been raised, I'll try to write something off the
	top of my head.

	Forgive me for any errors or ill written prose. :-)

	First, let me describe the rational function x^y in a
	way similar to John's.  This is our true domain's
	domain in the sense that all of our data points are
	ultimately rational.  Still, this approach will turn
	out to be cumbersome in the same way that epicycles
	are a cumbersome way to describe the motions of the
	planets.

	I will consider a rational x even though this is not
	strictly necessary.

	So, where among the rationals, can x^y be defined
	such that the result is also rational?  I believe it
	is in the following:

	For x = a^d/b^d, GCD(a,b) = 1, y = c/d, GCD(c,d) = 1
	we have that x^y = a^c/b^c.

	Cumbersome, no? :-)

	Still, all the integer powers live here: a^c.
	Powers of the perfect integer roots: (a^d)^(c/d) = a^c.
	And so on.  Examples: 2^2 = 4, 4^(1/2) = 2,
	(-8)^(1/3) = -2, ...

	This last should give us pause.  Notice that for x < 0,
	the only possible examples require that d be odd.  For
	only then may x = a^d/b^d be negative.

	Since the only reason for this cumbersome approach is
	to see how it is continued into the Reals & ultimately
	into the Intervals, it is instructive to think about
	continuity issues in this rational domain.

	Again, it crops up when x < 0.

	Lets say I'm computing x^(c/d) for some large c & d
	with both c & d odd.  Then x^(c/d) will be negative.
	But for every such odd c & d, there will be another
	c' & d' arbitrarily close in ratio such that c' is
	even & d' is odd.  For example x^(2c/(2d+1)).  In
	this case the answer is positive & quite close to the
	previous result in absolute value.

	(Actually, I should be more careful here: such an
	example will only work of x is a negative rational
	power of both d & d'.  Or if you move to a nearby x'.
	But, as it is not important for our purposes that x
	be rational, let's gloss over that.  OK? :-)

	So what does this example teach us?  We now know that,
	for x < 0 at least, the function x^y is not continuous
	EVEN AMONG THE RATIONALS.

	So any hope we might have of DEFINING a continuous
	function for all x & y is doomed to disagree with
	someone somewhere.

	Among the REALS.

	There is still hope for the intervals. :-)

	OK, let's abandon the cumbersome rationals & see how
	we would generalize well known Real functions to get
	to a consistent x^y.

	The natural place to start is e^y.  It is a perfectly
	well defined & well behaved function among the Reals.
	It has a nice inverse function in ln(x) which I will
	write as log(x) in this context for reasons that will
	soon become clear.

	Given the laws of these functions: (e^x)*(e^y) = e^(x+y),
	log(x*y) = log(x) + log(y), and of exponentiation in
	general: (x^a)^b = x^(a*b), we have a natural
	generalization to x^y in the form of x^y = e^(y*log(x)).

	So far so good.  But notice that since we are taking
	log(x) we have introduced the restriction that x > 0.

	Let's live with that for the moment & explore the
	complex domain.

	Among the complex numbers, e^z is still a perfectly well
	defined & well behaved function.  It has an essential
	singularity at the projective infinity.  It is also
	aperiodic in the real part of its argument & periodic
	in the imaginary part due to e^(i*2*Pi) = 1 & the
	identity e^z = e^(z + i*2*k*Pi) for all integer k.

	But now its inverse function is strange.  For all
	w = e^z = e^(z + i*2*k*Pi) the inverse function, log(w)
	has the property that all of log(w) = z + i*2*k*Pi for
	all k are valid inverses.

	This sort of function is called a many valued function
	& shows up in pretty much every foxhole found in the
	complex domain. :-)

	In this case 'many' is infinite but sqrt(z) is a 2
	valued function.

	So how are we to deal with this ambiguity?

	Well the traditional way is to define what is known
	as the Principal Valued Function of log(w) which is
	written Log(w) (only rarely Ln(w)) & is defined such
	that the imaginary part of the result is in (-Pi,Pi].
	(The Imag(w) = Pi case is an arbitrary convention.)

	With all that crap out of the way, how should we
	define a complex x^y?  In the obvious way, really,
	as e^(y*log(x)).  Or should it be e^(y*Log(x))?

	The former is the formally correct definition among
	the complex numbers but suffers from the fact that
	is has infinitely many results unless y is an integer.

	(Actually, it has infinitely many results everywhere
	but when y is an integer, they are all the same. :-)

	If you want a bit of fun, figure out the answer to
	i^i.  You will discover that they are all Real. :-)

	So even in the complex domain x^y is typically defined
	using the principal value of the log.

	OK, this probably looked like a digression & about
	now you are asking yourself why we care.

	Since x^y is defined everywhere in the complex plane
	for both arguments, it is natural to consider defining
	x^y among the reals as the real projection of that
	complex x^y function.

	And this helps us how?

	Because for a negative first argument we have that

	(-x)^y = e^(y*log(-x)) = e^(y*(log(|x|) + i*(Pi + 2*k*Pi)))
	       = e^(y*log(|x|))*e^(i*y*(1 + 2*k)*Pi).

	Is this an improvement?

	Only a bit.

	The value of x pretty much doesn't matter here.  But for
	most values of y we're screwed in that the value of
	(-x)^y is still a complex number (i.e. has a non-zero
	imaginary part) & therefore an inappropriate answer for
	a function intended to return real results.

	But if you look carefully you will notice that for those
	rare RATIONAL y such that y*(1 + 2k) is an integer for
	ANY k, there is a pure real result.  It is negative if
	that integer is odd, positive if its even.

	It requires a bit of thought but you will find that
	out of all the infinitely many complex answers to this
	problem there are no other possible pure real results.

	So a Real valued x^y can be defined everywhere in x
	with the proviso that for x < 0 it can only be defined
	for only the very rare y.  But in that case it has
	exactly one pure real answer, either positive or negative.

	It is also important to note that for ALL y there exist
	infinitely many y' arbitrarily close to y for which
	(-x)^y' is defined (either + or -).

	This is because the rationals are said to be dense.
	So the only sense in which these y's are rare is that
	there are a hell of a lot more Reals than rationals.

	That is the Reality (of Complexity) of the situation.
	(Forgive me, I couldn't resist. :-)

	What is the Intervality of xx^yy?
	(OK, it doesn't work here.  So sue me. :-)

	Well there are two ways we could go.  For x > 0, we
	have that x^y is single valued & continuous.

	(There is an essential singularity at 0^0 but let's
	pass on that for the moment.)

	So for all xx > 0 & all yy, we can consistently define
	xx^yy using e^(y*log(|x|)) & get correct, continuous,
	& enclosing intervals.

	But what if some part of xx peeks into the negative?

	In this domain, xx^yy must be considered to be a two
	valued function (it is said that the function splits
	into two manifolds) both of which can be computed
	using (+/-)e^(y*log(|x|)).

	Now we COULD arbitrarily decide to define one of these
	as the principal value (I would choose the negative)
	& return that as the result.

	It would be perfectly well defined, continuous, but
	only arbitrarily single valued.

	You may choose to define it this way.

	But if you do that you will come up against Michel's
	criticism of the all-singing-all-dancing Pow(x,y) &
	fail to give the answer that some people legitimately
	expect.

	But we have a power that the floating-point guys lack:
	In the case of a function that has more than one answer
	we have the power to return an interval that contains
	BOTH answers.

	Thus, for xx > 0 we could define the interval valued
	xx^yy using the obvious extension of the real valued
	x^y.  But for intervals xx with at least some values
	of xx < 0 we could define the INTERVAL valued xx^yy
	not as the extension of the REAL valued x^y but as
	the union of all [-e^(y*log(|x|)),e^(y*log(|x|))] for
	all x < 0 \in xx & all y \in yy.

	It is mathematically correct.  It is easy to implement.
	It is continuous (on each manifold).  It is defined
	everywhere.  (The y's need no longer be considered
	rare.)  And it encloses the answer.  All of them.

	I don't care who you are or how you define your x^y,
	your answer will be in the interval returned by this
	function.

	So, this is what I would propose.

	BTW, you should know that when I started to write this
	I was going to argue for the arbitrarily single valued
	function.  But its been a long argument (sorry about
	that :-) & in the course of writing it I changed my
	mind.

	Its OK if you argue with yourself.  But you'd better
	win the argument. :-)

	At this point its hard to describe this as just off
	the top of my head any more.  But that's all the
	argument I have in me at the moment.

	Would anyone else care to blow it to bits? :-)

	I'm going to duck now...

	Enjoy,

				Dan


	P.S. - I realized later that this definition suffers
	from the inability to narrow a contractive map beyond
	some point when xx contains negative values.  This is
	the moral equivalent of not being able to separate
	the roots in a dual valued square root.  I don't know
	how you guys solve that problem but if it involves
	having both positive & negative square root functions
	lying around, we might need the same thing for xx^yy.

	So in the end we might have 3 functions anyway.  Damn. :-)

	P.P.S. - We can get away with defining the singleton
	result [0,0]^[0,0] = [1,1] & pay attention to the
	details of non-zero width intervals with 0 as an
	endpoint.  But we may have to define xx^yy with both
	containing 0 in their interiors as [entire] in order
	to preserve the containment principal of contractive
	maps.  I might be wrong.  I'm tired after all this
	writing so could someone check this for me?  Thanks.