John's asinh split versus convert-to-sortable...
Folks,
I think if you look at John's split based on asinh,
you will find that it already contains, at level 1,
all the properties you are seeking in a convert-to-
sortable split. And none of the faults associated
with a split defined at level 2 only.
To review, I will formulate it as follows:
split(x,y):
u = asinh(x); v = asinh(y);
t = (u + v)/2;
return s = sinh(t);
where John's fully general split with the scale
parameter L is L*split(a/L,b/L). On can use these
definitions:
asinh(z) = ln(z + sqrt(1 + z^2))
sinh(z) = (exp(z) - exp(-z))/2
To turn that into:
split(x,y):
u = x + sqrt(1 + x^2);
v = y + sqrt(1 + y^2);
t = sqrt(u*v);
return s = (t - 1/t)/2;
Since x -> -x means u -> 1/u, t is the geometric mean,
& t -> 1/t means s -> -s, we have split(-x,-y) =
-split(x,y). Not a bad start.
Now to optimizations. Note that:
(sqrt(1 + x^2) + x)*(sqrt(1 + x^2) - x) = 1.
Or equivalently:
sqrt(1 + x^2) + x = 1/(sqrt(1 + x^2) - x)
The left hand side is good for positive numbers & the
right hand side is good for negatives. Both are good
for 0 & numbers too small for x^2 to overflow. For
|x| < 1 we have:
x + sqrt(1 + x^2) = 1 + x + x^2/(1+sqrt(1+x^2)) for small x > 0
x + sqrt(1 + x^2) = 1/(1-x+x^2/(1+sqrt(1+x^2))) for small x < 0
For large |x| we have:
x + sqrt(1 + x^2) = x*(1 + sqrt(1 + 1/x^2)) for large x > 0
x + sqrt(1 + x^2) = -1/(x*(1 + sqrt(1 + 1/x^2))) for large x< 0
I'll set aside problems with over/underflow in t =
sqrt(u*v) (which happen in u*v, of course) other than
to say one can always go to:
t = sqrt(u)*sqrt(v)
which is not ideal but easier to express than the
usual prescaling crap that also solves the problem.
Then John observed (in communication with me) that:
t - 1/t = (t^2 - 1/t^2)/(t + 1/t)
= (u*v - 1/u*v)/(t + 1/t)
= ((x+sqrt(1+x^2))*(y+sqrt(1+y^2)) -
(sqrt(1+x^2)-x)*(sqrt(1+y^2)-y))/(t + 1/t)
= 2*(x*sqrt(1+y^2) + y*sqrt(1+x^2))/(t + 1/t)
which goes a long way towards solving the cancellation
problem when t ~ 1.
We have:
(1) split(x,y) = (x + y)/2 + O(higher terms)
when x & y are small, near one another, or near
one another in absolute value,
(2) split(x,y) = +/-sqrt(|x|*|y|) + O(higher terms)
when x & y are of wildly different magnitude of
either sign where the sign of the result is the
sign of the larger of the two operands in absolute
value,
(3) split(-x,-y) = -split(x,y) exactly, &
(4) in ANY floating-point system F there are about as many
numbers between x & split(x,y) as there are between
split(x,y) & y. The exception is when one subinterval
contains zero & even THEN the split favors that
direction WITHOUT knowing anything about the exponent
range or radix.
So I claim that John's approach has all the advantages you
seek & none of the disadvantages. And there are 2 more
advantages you might consider.
(1) It is entirely definable at level 1 as either
split(x,y):
u = asinh(x); v = asinh(y);
t = (u + v)/2;
return s = sinh(t);
or
split(x,y):
u = x + sqrt(1 + x^2);
v = y + sqrt(1 + y^2);
t = sqrt(u*v);
return s = (t - 1/t)/2;
so long as one confines oneself to bounded intervals.
(2) Whichever formulation you choose, there are plenty
of optimizations possible for the implementer to choose
from. Or to invent on their own to distinguish their
implementation from others in the market.
It gives them something useful to do. :-)
Dan
P.S. - My personal opinion is that we SHOULD define this
split in 1788 but we should make it optional. We should
define it because it would be useful for everyone to get
the same answers. We should make it optional because it
may be difficult to implement both correctly & fast. If
someone can djinn up a quick & correct algorithm then we
can go ahead & make it mandatory if we include the
algorithm in the standard as an app note.