Re: (long) 'severe' problems, was Re: Decorated intervals ARE intervals...
On Jul 4 2011, Dan Zuras Intervals wrote:
I shall merely pick up on a couple of points that I think are less
clear than you imply.
Machines today get the performance they do by moving data
around quickly. From main memory to cache, from cache to
registers, from registers to ALUs. Each machine architecture
approaches the problem slightly differently but, as my old
friend Willy used to say, when you want to increase your
bandwidth you have to either increase your band or increase
your width. There are no other choices & you cannot 'clever'
your way around the problem.
Yes, but .... Some applications are bandwidth-limited, but others
are latency-limited, and that's MUCH harder. Increasing the width
may increase latency, and increasing the band may hamper scalability,
or even prevent it when the speed of light starts to bite.
Typical machines today are somewhere in between &, sad to
say, closer to the former than the latter. But this is
mostly for reasons of cost & as the cost comes down things
get more parallel. I think we are about to enter an era
of very wide & very parallel machines with truly obscene
performance.
I don't, for the reasons I hint at above. I have been proclaiming
heresy in this area for several decades, and pointing out that the
low-hanging fruit for reducing latency-dependency (vectorisable codes)
was picked in the 1970s and there has been limited success since then.
Indeed, there are some problems that are provably latency-dependent.
That is, of course, closely related to the frequent communication and
synchronisation requirements of many parallel codes, which is the main
reason that they don't give better performance even though they are
highly parallel. And reducing those requirements is HARD.
Of course, increasing the transfer requirements by a factor of two
makes a negligible difference to the latency, which is a wrinkle that
many people miss!
OK, about that word 'severe'. In any case the worst hit,
either in space or time, is bounded on the upside by a
factor of 2. And our user has already agreed to a loss
of at least a factor of two because she wants assured
computing over speed or capacity. Choosing from among
the 3rd & 4th alternatives avoids that second hit & is
about as good as it gets.
One can argue about the factor, but I agree that it's not a major
issue. There are many scientific codes where people already accept
larger performance hits by using C++ instead of Fortran. When one
is talking about future parallelism, one is talking about 16-way
and up - and 256-way and up in the performance context. A competent
interval implementation is no more a hit than a competent complex
(versus real) one - a fixed, small factor not above 4 in the worst
case (for CPU) and 2 (in memory). Where latency is the limit, the
factor may be only a little above 1. Not a big deal.
Regards,
Nick Maclaren.