Thread Links			Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Re: Motion P1788/M0008.01 SIMD challenge

To: stds-1788 <stds-1788@xxxxxxxxxxxxxxxxx>
Subject: Re: Motion P1788/M0008.01 SIMD challenge
From: John Pryce <j.d.pryce@xxxxxxxxxxxx>
Date: Mon, 12 Oct 2009 08:11:02 +0100
Delivered-to: mhonarc@xxxxxxxxxxxxxxxx
In-reply-to: <LISTSERV%200909261135337825.94D6@xxxxxxxxxxxxxxxxx>
List-help: <http://listserv.ieee.org/cgi-bin/wa?LIST=STDS-1788>, <mailto:LISTSERV@LISTSERV.IEEE.ORG?body=INFO%20STDS-1788>
List-owner: <mailto:STDS-1788-request@LISTSERV.IEEE.ORG>
List-subscribe: <mailto:STDS-1788-subscribe-request@LISTSERV.IEEE.ORG>
List-unsubscribe: <mailto:STDS-1788-unsubscribe-request@LISTSERV.IEEE.ORG>
References: <LISTSERV%200909261135337825.94D6@xxxxxxxxxxxxxxxxx>
Sender: stds-1788@xxxxxxxx

Nate, Juergen, P1788 members

I wish to restate my "SIMD challenge" (which I first posed on 20September - I copy relevant passage below and has been discussed byNate, Juergen and others since). Apologies for slowness.


I am concerned about several issues.

A. That whatever mechanisms P1788 chooses for exception handling,should work well in a parallel environment.B. That these mechanisms have a mathematical foundation that isspecified with enough precision, and simple enough, to enable one toprove the correctness -- or incorrectness, as the case may be -- ofcode that uses these mechanisms.C. That, in particular, it be possible to prove Nate's assertion thata decoration model and a flagspace model are equivalent at a certainlevel of abstraction.

Issue A is just one of many design issues that are being discussed. Ithink it is a high priority but the P1788 group may put its prioritylower than I do. Issue B is absolutely crucial -- what is the pointof a standard, for which one can only prove correctness in the casethat nothing went wrong? Issue C is not fundamental, but we shallmake wrong decisions if we do not have clarity about the relationbetween these two models.



About A.
--------

Nate gave some C++ code that showed how what looks like "decorationson each individual interval" can be implemented to access a singleflag held in a static class variable. It took me a while to get it tocompile and run -- my ignorance of C++ memory allocation rules was toblame. But it works. Ned Nedialkov suggested a simpler scheme. So farso good.

But it doesn't solve my SIMD challenge. Briefly stated, this is thatyou have an interval extension yy=ff(xx) of function y=f(x). It couldhave several input and output arguments, equivalently xx and yy canbe vectors. We want to evaluate at many xx's "simultaneously" usingvectorised elementary functions. Possibly this is to exploit a vectorarchitecture, but even if not, it is expected to run fast. Forinstance this is the central part of the vector interval add inBIAS, namely BiasAddVIVI in Bias1.c:

  for (i = 0; i < P; i++) (pR++)->inf = (pA++)->inf + (pB++)->inf;
  for (i = 0; i < P; i++) (pR++)->sup = (pA++)->sup + (pB++)->sup;

where P is the vector length. (Loop initialisations and rounding modechanges omitted.) Even on a single core, non-vector architecture Ibelieve this could generate very fast code.


SIMD challenge:

Arrange things so that there is a _flag vector_ of length P, torecord exceptions in each individual yy=ff(xx) evaluation.


My rationale for wanting this is that
- A single flag gives grossly inadequate information.

- The decoration solution is a bad design: whether "something wentwrong" in the p'th "thread", p=1 to P is not held in one place but isscattered over the components of the p'th output yy(:,p).

My take on this challenge is that it cannot be solved with themechanisms Nate has described so far, because there is a missingpiece of information that one MUST give the run time system, so itknows what to do. Namely, the value of P, the vector length.

Thus -- ignoring, for now, any issues of how this is bound to alanguage -- at an abstract level we need a structure of the form(1) User program tells run time system "I am starting a SIMDcalculation of length P".(2) User program executes SIMD code for yy=ff(xx) on vectors oflength P.

(3) User program tells run time system "End of SIMD calculation".

In response to (1), run time system must do several things including:
 (1a) Allocate memory for a flag vector F of length P.

(1b) Tell the vector elementary function library (e.g. the Bias1.cfunctions of BIAS) that F is where exceptions are to be recorded.(1c) Check that all input vectors to (2), supplied by the userprogram, are indeed of length P.

In response to (3), (1b) must be negated somehow. Also, there must besome way for the user program to access the information in F after(3) was done. For instance, F may be created on the heap; and, whentold (3), the run time system passes a pointer to F back to the userprogram. There are many other possible mechanisms.


Clearly, (1) and (3) are an abstract description of a _scope_.

- Could such a scheme be implemented by a version of decorations, ina C++ class or a Fortran module? I think I see how to do so. Nate,what think you?

- Does such a scheme fit well with the "flagspace" languageextension? Flagspace (so far) is less well specified thandecorations, but I believe the fit is excellent.

  * Flagspace is a scope.
  * If one allows an (optional) argument of the flagspace to
    give the value P, then the runtime system has what it
    needs to perform (1a,b,c). Possible syntax
      flagspace FS1 { ... body ... }// without argument
      flagspace FS2[1000] { ... body ... }// with argument
      // This causes a flag vector of length 1000 to be created.
  * The flagspace authors propose methods to pass the flag
    values back to the enclosing scope on exit. Just what is
    needed! However one generally expects no connection
    between flag vector lengths (if any) in one scope and in
    the surrounding scope. This needs to be considered.
  * Flagspace scopes can be nested. Yes, I think this is
    potentially an important feature to give to my SIMD
    concept. I can certainly see it being useful in code
    for PDEs, multidimensional FFTs, etc.

Various semantic issues must be decided, such as: Is the vectorinterval library (e.g. Bias1.c) ONLY available for use within a scopedelimited by (1) and (3)? But these are secondary.



About B. (mathematical foundation, leading to provable correctness)
--------

For my favourite example, "if certain exceptions are NOT raised afterevaluating yy=ff(xx) then a fixed point theorem can be applied", thisdoesn't seem too scary. It looks straightforward to prove usingeither a decoration or a flagspace model -- though the devil (or God)will be in the details.


What about other people's favourite examples?


About C. (decoration & flagspace are essentially equivalent)
--------

Juergen and Nate seem to agree this is so, but what is the level ofabstraction at which this theorem -- and its proof -- must be stated?

- Implementing flagspace via decoration seems to rely _essentially_on memory aliasing, which looks like a level 3 or 4 notion. I fear aproof at this level will be language-dependent and _very_ messy.

- Implementing decoration via flagspace? At present I do not see howthis should be done.


Comments please.

Best wishes

John
-----------------------------------------------
John Pryce wrote on 20 Sep 2009

Really, I would like to handle the scenario where "global flag" and"huge SIMD calculation" are combined. Namely, I have
-  a vector function f: D \subseteq R^n -> R^n
-  a box xx in R^n
and you want to check that
-  f is defined and continuous on the whole of xx, and
-  f maps xx into itself,
which proves that a fixed-point theorem can be applied.
With a "FLAG" approach it works very simply:
- Assume f has been coded up as a function, as this is the typicalcase.
- Evaluate the interval version ff of f:
     yy = ff(xx)   where xx, yy are interval vectors of length n.
But I now have P = a million xx vectors at once, and I wish toevaluate the P output vectors yy in SIMD mode, assuming that thecode for ff can indeed be vectorised in that way (which islanguage- and compiler-dependent). And I want the FLAG reportedindividually for each component. A picture (using indexing from 1,not 0)
        1  2     ...        P
xx(1)   x  x  x  ...  x  x  x    }
...       ...    ...    ...      }  inputs, an n by P array
xx(n)   x  x  x  ...  x  x  x    }

v1      x  x  x  ...  x  x  x    }
v2      x  x  x  ...  x  x  x    }  intermediate variables in ff,
...                              }  each is a P-vector

yy(1)   x  x  x  ...  x  x  x    }
...       ...    ...    ...      }  outputs, an n by P array
yy(n)   x  x  x  ...  x  x  x    }

FLAG    x  x  x  ...  x  x  x    }  a P-vector
Column p (for parallel) of this picture represents calculating (inMatlab notation)
   yy(:,p) = ff(xx(:,p)),    for p=1:P.

This is done a row at a time, e.g.
  v1 = xx(1,:) .* xx(3,:);
  v2 = sin(v1);
  ...
This needs a vectorised library of interval arithmetic operations(and elementary functions) like the level-1 functions in BIAS.These operations set FLAG componentwise where appropriate.
I continue to believe that for some applications, this gives acleaner conceptual scheme than does leaving FLAG information in thedecorations of individual output components yy(i,p).
In discussion at PPAM we seem to have agreed this FLAG scheme isefficiently implementable, with a "regional" FLAG vector.(Contributions from Juergen W v Gudenberg, Vladik Kreinovich).
Snag: I don't at present see how, using standard Fortran or C++constructs, one can automatically ensure FLAG is defined, and is avector of the same length as the numeric vectors -- if not, onerisks indexing outside array bounds. However Juergen is thinkingabout a possible new C++ construct he calls "flagspace",syntactically similar to "namespace". Juergen, could it solve thisproblem?

References:
- Re: Motion P1788/M0008.01_Exception_Handling
  - From: Nate Hayes

Prev by Date: Is there a second? Re: motion elementary functions
Next by Date: Re: motion elementary functions
Previous by thread: Re: Motion P1788/M0008.01_Exception_Handling
Next by thread: P1788 Motions on grouper.ieee.org not up-to-date
Index(es):
- Date
- Thread