Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: (long) 'severe' problems, was Re: Decorated intervals ARE intervals...



On Jul 4 2011, Dan Zuras Intervals wrote:

I shall merely pick up on a couple of points that I think are less
clear than you imply.

	Machines today get the performance they do by moving data
	around quickly.  From main memory to cache, from cache to
	registers, from registers to ALUs.  Each machine architecture
	approaches the problem slightly differently but, as my old
	friend Willy used to say, when you want to increase your
	bandwidth you have to either increase your band or increase
	your width.  There are no other choices & you cannot 'clever'
	your way around the problem.

Yes, but ....  Some applications are bandwidth-limited, but others
are latency-limited, and that's MUCH harder.  Increasing the width
may increase latency, and increasing the band may hamper scalability,
or even prevent it when the speed of light starts to bite.

	Typical machines today are somewhere in between &, sad to
	say, closer to the former than the latter.  But this is
	mostly for reasons of cost & as the cost comes down things
	get more parallel.  I think we are about to enter an era
	of very wide & very parallel machines with truly obscene
	performance.

I don't, for the reasons I hint at above.  I have been proclaiming
heresy in this area for several decades, and pointing out that the
low-hanging fruit for reducing latency-dependency (vectorisable codes)
was picked in the 1970s and there has been limited success since then.
Indeed, there are some problems that are provably latency-dependent.

That is, of course, closely related to the frequent communication and
synchronisation requirements of many parallel codes, which is the main
reason that they don't give better performance even though they are
highly parallel.  And reducing those requirements is HARD.

Of course, increasing the transfer requirements by a factor of two
makes a negligible difference to the latency, which is a wrinkle that
many people miss!

	OK, about that word 'severe'.  In any case the worst hit,
	either in space or time, is bounded on the upside by a
	factor of 2.  And our user has already agreed to a loss
	of at least a factor of two because she wants assured
	computing over speed or capacity.  Choosing from among
	the 3rd & 4th alternatives avoids that second hit & is
	about as good as it gets.

One can argue about the factor, but I agree that it's not a major
issue.  There are many scientific codes where people already accept
larger performance hits by using C++ instead of Fortran.  When one
is talking about future parallelism, one is talking about 16-way
and up - and 256-way and up in the performance context.  A competent
interval implementation is no more a hit than a competent complex
(versus real) one - a fixed, small factor not above 4 in the worst
case (for CPU) and 2 (in memory).  Where latency is the limit, the
factor may be only a little above 1.  Not a big deal.


Regards,
Nick Maclaren.